Search this blog

26 March, 2009

Garbage collection, again

Recently I've discovered this forum, Molly Rocket. A friend of mine told me to have a look, people were trashing one of my articles! Kinda cool I thought, but unfortunately it turned out that it was mostly about misunderstandings due to my bad writing.
Well, that's not the point, the point is that in that forum, there are plenty of smart people, helping less experienced ones. I stumble in a post about garbage collection, and write the usual stuff about it and its merits, compared to explicit allocations and reference counting. And... of course, I get told that those are just the usual arguments, not enough to persuade the big guys.
Ok so, let's write again... about garbage collection!

So let's picture the usual scenario. It's C++ and so, you start writing code to manage memory (at this point, you're dealing with memory, so hopefully you already have a coding standard, maybe some tools to enforce it, and you know that by default, C++ is wrong).

You probably start writing your custom allocators, to help debugging, usual stuff. You wrap the default allocator with some padding before and after to detect stomps, tracing functionality to detect leaks and fragmentation, handling alignment and so on.

Explicit allocation do no compose, they don't really work with OO, so you implement reference counting, maybe deriving your classes from a reference counted base, adding smart pointer classes (and hoping that you don't have to interface it with another similar system in a third part library).

Ok you're set, you start writing your game. And fragmentation is a problem! Ok, no fear, everyone faced that, you know what to do. You start to make separate memory pools, luckily you already knew about that, and tagged all your allocations with a category: rendering, animation, ai, physics, particles. It was so useful to enforce memory budgets during the project!
So now it's only a matter of redirecting some of this categories to different pools, possibly different allocation strategies.
Off we go... and it works! That's how everyone is solving the problem...

But! It's painful!
And this is the best scenario were you have total control and you did all the right choices, and you don't have to link external libraries that use other strategies.
You have to size all your pools for the worst case scenario. And then streaming comes in the equation. Streaming rocks right?
You need to have more and more fine control over allocations, splitting heaps, creating class pools.

You realize that what it really counts is objects lifetime. The most useful thing is to classify allocations as per frame (usually a linear allocator that automatically frees everything at the end of the frame, double buffered if the memory has to be consumed by the GPU...), short lived (i.e. temporary objecs), medium lived (i.e. level resources) and permanent.

You realize that if you go on and on, and split allocations so every class has its own pool, and you size the pools for the worst case you're wasting a lot of memory, and in the end you don't need to manage allocations anymore. You can simply use a circular pool, and overwrite old instances with new ones, if the pool is correctly sized, living instances won't get overwritten ever!

Something wrong. And what's with the idea of object lifetime anyway? Is there a better solution? A more correct, generic answer? Something that should be used as a better default? Well well...

8 comments:

MrCoder said...

here ye!

peirz said...

Pff, painful. It's a bit of work, but it can be taken from project to project, there's nothing magical or complicated about any of this, and most importantly, once you're done there's no hidden performance cost, no uncertainty when GC kicks in, or hard to debug behavior with destructors kicking in half a year after the last reference to the object got removed -- ie. long after something last happened in the logical vicinity of the bug :)

.... and ofcourse one could make a similar list for GC :) On the usage side, there's non-deterministic destruction, having to mess around with finaly{} blocks, adding IDispose because you need deterministric destruction -- and then having to add flags to make sure the destructor doesn't run twice etc. On the implementation side, there's updating live references and support for moving C++ objects in memory, detecting unref'ed objects despite circular references, etc.

Also, object pooling, and strategies to do so, are orthogonal from actual memory allocation & recollection :p

DEADC0DE said...

Peirz: no it's not a bit of work, it's a considerable amount of work. And it doesn't scale well too, the more subsystems you have. That's why in the end most games try to do zero allocations after loading after all.

Just as a sidenote, if we converted those designs to GC right now, we would have no problems, even with current GC implementations. You call the collector after loading, then do no memory allocations, thus no problems with destructors or pauses (anyway, most modern GC do NOT pause, and in that respect are way, way way better than mallocs, that have to do work immediately, while a GC thread can throttle itself to run only when other systems do not need all the CPU!)

And the pain is AFTER you already have all the framework and tools, and good designed ones too...

This is the catch when comparing implementations! You're comparing something we have years of experience and we refined the best possible techniques for it, with something we're not using at all yet.

I was trying to explain why the GC IDEA is not only good, but most likely the only sane one. I didn't go into the details of GC technology, implementation and their problems, on purpose!

I could go into the details and show that all those problems are easily solvable but of course there are things to consider!

People think GC = automatic memory management = I don't have to care. That is wrong! You have to care, otherwise you run into all kinds of problems.

Memory management is hard no matter what. But GC is a superior idea, because it is the design that naturally comes from the flaws of explicit allocations.

DEADC0DE said...

Peirz: also, about object pools, yes you're right that they are orthogonal techniques... That's why 1 - GC is a better default, then if you need more control, you can make pools and 2 - I still feel that the need of such minute control over memory is way more needed in an explicit allocation world, and even with lots of work and experience, you can never utilize memory optimally, because you'll end up wasting a lot of it (but I admit, this is harder to prove, different heaps and class pools for sure waste memory as they need to be sized for the worst case, but GC needs some slack space to operate too)

Max Burke said...

"Destruction" doesn't really exist for garbage collected languages because, as peirz pointed out, objects are reaped non-deterministically. Even though it is a pain when coming from the world of C++, disposing objects when they (and their resources) have reached the end of their useful life is the correct pattern as opposed to using finalizers.

The only garbage collectors that don't pause are concurrent collectors but they generally have other drawbacks, for example heap compaction in a concurrent environment is incredibly hard to get right, and the read barriers necessary for these collectors impose a somewhat heavy execution penalty.

If you push a GC hard enough you can get your system to pause while it runs especially in constrained execution environments like XNA for the Xbox 360, but generally if you notice your system hiccuping because of the garbage collector then your application is doing something wrong. In a way modern garbage collectors are poorly named because they don't collect garbage, they collect live objects.

Minimizing, or preventing, allocations is definitely a key, but there are a couple other things to keep in mind with garbage collection. GC performance isn't tied just to heap size, or the number of allocations performed (as allocations are cheap), but rather to heap complexity, or the number of live references in play. The more references you have in the heap the slower it will be to collect. An array of C# POD structures compacts in the amount of time that it takes to memcpy the block into its new home. Conversely a tree of reference types requires individually copying every reference over to its new location.

Languages that use garbage collected runtimes will also tend to create garbage as they go. For example, calling System.Console.WriteLine with an integer causes the integer to be boxed (allocated on the heap). That said, with a generational garbage collector, like .NET on the PC, these allocations are all performed in the nursery and have effectively zero cost. With XNA/360 though they do have a bit of a penalty because its collector isn't generational.

On a side note I think Cheney (half space) collectors are really neat because, even though they only use one half of the heap at any given time, they improve spatial locality when objects are collected as all contained references get moved adjacent to an object.

I hope this doesn't come across as being in the "anti-GC" camp because I really do think they're fantastic, as are the languages that use them, but because I develop GCs/managed environments their drawbacks are something in which I'm somewhat well versed.

Microsoft has some great articles/blog posts that I think are required reading when considering programming in a managed environment for performance:

http://blogs.msdn.com/shawnhar/archive/2007/07/02/twin-paths-to-garbage-collector-nirvana.aspx

http://blogs.msdn.com/netcfteam/archive/2006/12/22/managed-code-performance-on-xbox-360-for-xna-part-2-gc-and-tools.aspx

http://blogs.msdn.com/ricom/archive/2006/08/22/713396.aspx

DEADC0DE said...

Max: What you wrote is all true and it's evident that you have experience with GC, but it's not that relevant in my option. I know about the problems but what I'm trying to say, is that there is no better option, simply explicit allocation does not work, or it does not work any better at least, so it's not worth to use it.

It doesn't on one hand, because it's anti OOP (and anti-functional programming to for what that matters), and on the other hand, because in practice it fails.

It fails because right now we already, right now, try our best not to allocate in runtime (after loading), or at worst to use fixed pools in that case.

So if that's the world we like to live in, would it not be better to make it GC? Anyway, with no allocations / fixed pools GC behaves just like explicit allocators, as neither is doing any work.

But at least you save yourself all the efforts of trying to balance memory in various heaps with different allocations strategies during loading! Isn't it at least a little bit of a relief, other than a more sane default?

From there to using GC for more advanced stuff, i.e. to enable us to actually allocate during runtime, do smarter streaming etc... Well, in that case, you'll need to take care, and all the problems you've mentioned apply.

Even if some of them are shared or worse right now, i.e. the heap complexity... doesn't that come into play when destroying reference counted objects too? and with more cache missess as you traverse a RC hierarchy updating counters... And about boxing and temporaries... we already avoid those things in C++, we take care, why we should not do so in C# or similar languages?

I said that GC is not about freeing the programming from the burden of thinking about memory. You have still to think about it a lot. But again, it's more sane.

Max Burke said...

I hope it didn't come across that I was disagreeing with your post because I don't, I agree with both it and your comments 100%. My comment was mostly directed to peirz, and in general toward those who think that GC environments suck because of pauses yet think they can throw anything they want in the heap, whenever they want.

GC's and managed environments really shine because they remove the ability for the average programmer to shoot themselves in the foot with regard to pointers not pointing to things they should. They also, as you mention, remove the onerous burden of administering separate memory pools, ref counting, etc.

As time goes on and as more people use managed languages for video games I'm sure we'll start to see a set of best practices arise in order to keep heap traffic to a minimum.

peirz said...

Just wanted to say thanks for the further comments, I was offline for a while but finally I've read this... food for thought!