Search this blog

27 May, 2008

Collecting garbage

Our is one of the most fast advancing fields in computer science. Still there are some misconceptions that are hard to be broken. One of those is about garbage collection. Most game programmers hate it. Still most game programmers work in frameworks that do at least reference counting (very basic one, most of them are not thread safe, nor able to handle cyclic references correctly), and many times, when two reference counted systems have to be bridged, we design our own, bad, slow implementations of garbage collection.

The best examples of garbage collectors are the ones built into java and c# virtual machines, those systems manage to be faster than explicit allocators on server workloads. They are tested, they have excellent performances. Why most people still think that garbage collection is slow?

Well I think that has various reasons behind it. First of all, of course, is that early GC implementations, conservative and stop-the-world, were not as refined as current ones. Then probably there's a misconception that is really related to desktop java programs. Java used to suck for such tasks, mostly due to slow graphics and slow IO, that were never related to the GC itself. Last, but not least (at all) is that GC on desktop systems have to live with other programs, and the OS itself, that does not understanda GC concepts. That usually means that the GC will allocate a huge heap that they will manage themselves, thus eating up (or seeming to eat) much memory for even the most trivial hello world program.

Moreover for typical game systems, we don't rely on heap allocators as well, for performance critical code. In game, we usually want to have close to no allocations/deallocations, we want to be in complete control of the memory budgets for each subsystem, and most of the times, we just craft a bunch of ad-hoc class pools were memory management is a performance problem. Class pools can be done in a GC system as well, so it shouldn't really matter if we use GC or manual allocation, at all.

But first of all, why should we strieve for a GC world? Why should we like it, were are the advantages?

Many people think that a key advantage is that GC ensure not to leak memory. What a small feature! In any decent C++ code you will have your own debug memory allocator that makes tracking and slashing leak bugs a trivial task. Also, while you can't leak memory, you could leak other resources, and it also makes RAII more difficoult to implement, as usually you don't know when your destructors are going to be called!
GC does not help with other memory related bugs, like memory stomping, that's a language feature (not allowing out-of-bounds access and pointer arithmethic), it's true that such a feature is usually available in languages that use GC (because it makes the design of the collector much easier, otherwise, only conservative ones could be engineered), but still its not strictly a GC feature. GC do protect you from double deletes and using accessing deleted memory, still those bugs are not too hard to detect for a debugging allocator.

The real benefit is not that. It's about composibility. Simply, manual allocation does not compose (in a somewhat similar way to explicit locking). If you allocate a class, and then pass it to another subsystem, or you have to completelly transfer its ownership (and still that should be clearly documented, it's not the standard policy), and forget about it (don't store a pointer to it), or you have to manage the subsystem as well, making sure it will not use that class after you destroy it.
That's why we prefer to use reference counting. Garbage collection is only a usually more powerful, and faster, implementation of that idea, that is, decoupling memory managment from the rest of the code, to make composition easier.

There is also another benefit, knowing where all the pointers are (non conservative GC) also lets you move classes around in memory (moving GC). That's why GC usually have much lower fragmentation than the best non-fragmenting heap allocators. and it's another big win as fragmentation is usually a nasty problem. That also lets the system improve cache and page friendliness (this is done automatically by most GC, but noone prevents you to implement a system were you can expilictly hint about a given memory ordering in a moving GC) and it's one of the key features that let some GC systems to be faster than manual heap allocators, heap allocations are not simple as they might seem...

Update: an interesting read here and here
Update: I know that there are people that believe that manual allocation is still better because you can code a reference counting system or even a (usually incredibly bad) GC system for it. So you get more choice, and more power. That's a typical C++ mantra, sung without any real world experience. The problem is that as manual allocation is bad, everyone will need at least RC, so everyone will code its own system, that's because they have the power to choose, and they will choose, in different ways. Different and incompatible. Different and incompatible ways that will eventually meet and produce tons of slow, bad, ugly code in order to let them talk each other. That's it, typical C++ code.

26 May, 2008

Artist-coders iteration

  1. Artists gather references
  2. Coders gather knowledge
  3. Artists make prototypes - visual targets in DCC applications
  4. Coders find a procedural way to create the same effect
  5. Artists iterate with the existing technology
  6. If existing tech is not enough, coders improve it, and we return to point 5
Everything trying always to communicate, and work together. Most coders think that artists should be limited or constrained so they can't end up using too many resources or doing too crazy things.
That's completely wrong, the real key is to always explain how things work, what are the limits, and to make tools that can easily and interactively profile the cost of a scene while it's being worked.
Artists have the great gift to find new and incredible ways to tweak parameters to make our technology reach limits that we did not even foresee. They will usually try only to achieve a given visual target, but they are more than able to optimize on performaces as well, if we make easy for them to have a feedback about that.

On the other hand many times artist won't trust coders as well. They will always ask to have more control, just in case they need to tweak something. Usually they will be more happy to achieve an effect using textures and vertices, stuff they can manipulate, instead of algorithms and procedures. They have to learn that some effects are cheaper and way better and easier to do procedurally, instead of craft them by hand.

Cross-contamination is fundamental. Artists that know how to write scripts and shaders will find that many tasks are incredibly easier to do with coding. Coders that know 3d and 2d applications will find that many algorithms are easy to visually prototype in those applications. We really do need coder-artists and artist-coders. Procedural art is the only way we can author such huge amounts of contents that are required for next-gen games, and are also our best bet of rendering all that with decent performances (as ALU is cheaper than data access).

25 May, 2008

Devil is in the details

I'm not even halfway through "the exorcism of Emily Rose" so I still have to write. From some comments I've noticed that I wrote too much, too fast in my post about the rendering equation, so this is a (hopefully better, even if I doubt it, as I'm watching those scary movies) rewrite of its second part.

One of the things I'm confident of is that wathever models we provide the artits with, they are capable of tweaking them in unpredictable ways in order to make them fit the idea they want to express. That's great, but has two problems. The first one is that such tweaking could end up with suboptimal use of our precious, scarce, computing resources. The second, somewhat related to the other, is that bad models could be too complicated to fit, to find parameters that achieve the desidered look.

So what has the good rendering engineer to do in order to avoid those problems? To me the main thing is to always work together with the artists (a good idea is to look at what they're trying to do, ask them to make prototypes with their DCC tools, and then see how we can express in a more procedural way their art), known what they need, know the physics, and base our models both on artists needs and good maths. Good maths do not mean correct physics, we are far from that in realtime rendering, but reasonable physics, models that are based on solid ideas.

Now a simple example of how things can go wrong, it's something related to a small problem I had at work.

A nice trick, known since the software rendering times, is to simulate the specular reflections on a curved object by assuming that each point in that object sees the same scene. It's the basic environment mapping technique that has been around for years. We project an image of the surrounding environment on an infinite sphere or cube around the model, save it into a texture, and index that texture with the per-pixel reflection vector (camera to point vector reflected with the point normal). We can do it in realtime as well, and we usually do, i.e. in racing games.

That's intresting because we're shading the object by considering the light reflected from other objects towards it, so it's a rare example of simulating indirect global illumination (indirect specular, that's quite easy compared to indirect diffuse or worse, glossy).

But what is, that texturemap, encoding? Well, it's a spherical function, something that is indexed by a direction vector. It's an approximation of the spherical function that encodes the incoming radiance, the incoming light that scene objects are transmitting towards the point we want to shade.
Now for a purely specular mirror we have to notice that its BRDF is a Dirac puse, it's a spherical function that is one only along the reflection vector, and zero everywere else. That BRDF can be encoded in a two dimensional function (in general, BRDFs are four dimensional, i.e. they need two direction vectors, the incoming and outgoing one).

What happens if we convolve that BRDF with our approximated incoming light function, as the rendering equation does in order to compute the total incoming energy that is going to be scattered towards the view direction? Well we have a function that's zero everywhere and that the envmap texture value only along the reflection direction. That's exactly the same as taking only that sample from the envmap in the first place. So our envmapping algorithm is a reasonable approximation for a purely specular part of our material. Easy!

Now another thing that was easily discovered in those mystical early days is that if you replace your spherical reflection image with an image that encodes a phong lobe, you get a cheap way of doing phong shading (cheap when memory access was not so expensive compared to ALU instructions).
Why do that work? It does because what we're encoding in the envmap, for each direction is the convolution of the BRDF with the lighting function. In that case we are considering the light function as a Dirac impulse (a single point light), and convolving it with a phong lobe. Convolving something with a Dirac again results in an unchanged function, so we're storing the phong lobe in our texture map, and as Phong specular reflection model can be reduced to a bidimensional function, that precomputation works.

But we can think to be smarter, and not use Dirac impulses. We can take an arbitrary light configuration, convolve it with our specular model, index it with the reflection vector, and voilà, we have (an approximation of) the specular part of our shading. If we do the same, convolving this time the light function with a cosine lobe (Lambert model), and index that with the normal vector, we get the diffuse part as well.

This is a clever trick, that we use a lot nowdays, in some way it's the same thing we do with spherical harmonics too (SH are another way of storing spherical functions, they're really intresting but that's the subject for another post). You can use a cubemap indexed with the surface normal for the diffuse term and another indexed with the reflection vector for the glossy one. But care has to be taken when computing those cubemaps. They have to be the light function convolved with the term of the local lighting model we're considering, as we just said!

What is usually done instead is for the artists to use the gaussian blur in photoshop, or, if the cubemaps are generated in realtime, for the renderer to use a separable gaussian filter (as gaussians are the only circular filters that are separable).
But a gaussian is not a cos lobe nor a phong one! And I seriously doubt that artists are going to find a gaussian that is a good approximation of those as well. And even if they do that, filtering the cubemap faces is not the same as applying a convolution to the spherical function the cubemap is representing (the equivalent convolution will be distorted towards the vertices of the cubemap, as a cubemap has not the same topology of a sphere, its texels are not equally distant when projected on the sphere, so we have to consider that when applying our filter!).
Moreover, we can't have a pair of cubemaps for each different specular function, so we have to choose a convolution size and hack some kind of exponential AFTER the cubemap access!
That led to complains about the inability of finding the right blur sizes to achieve the correct look, that wasn't a failure of the cubemaps, but of how we were using them. That incorrect model could not be made to fit the look we wanted. In theory, that was a good approximation for diffuse and phong shading arbitrary light sources, in practice, the implementation details made our approximation different from what we were thinking, and in the end, bad.

Update: HDRShop is capable of doing the right convolutions on cubemaps, as I described there!

Best tech for next-gen games

I've got a new projector, here at home, so I'm trying to watch some horror movies. I have "the exorcism of Emily Rose" and "the Blair witch project 2", both of them I've already seen. Problem is that I'm really scared by horrors (that's why I watch them) so I used to always watch them with a friend of mine, when I was in Naples. Shame is that I can't call her now, so I'll try to distract myself from the movie by writing a couple of posts.

What's the best tech, the most useful piece of technology we can code for a next-gen game. Well, some will say things like realtime shadows, ambient occlusion, HDR, DOF, motion blur, rigid body physics etc... Very interesting and exciting stuff.

Stuff that requres time, to be done properly. And who knows what other interesting stuff we will invent, in the following years... Does all that really matter?

I don't think so, no, it does not matter too much. I've seen incredible games being made with relatively low tech engines an example being GT5. The truth is that we're not limited by the technology, we're limited by the time (and money). Next-gen projects take so much effort in order to be done that we're not really using properly even well known algorithms and shaders.
Designers and artists need time, and iterations, in order to fully use what we give them. Giving them more time, faster iterations, that will make the difference. And the same applies to coders as well.

The key to the success of a nextgen game is the iteration time. What's the key to lowering iteration times? Data-driven design. Databases. Databases are the best tech for next-generation games. Tools matter. More than anything else.

Some effort has been made, it's not rare to have some kind of scripting, probably an in-game console, the ability to tweak some parameters, probably to save them too. You will be really lucky if you are working with an engine that lets you hotswap shaders. Even more lucky if you can hotswap meshes and textures, after processing them with your asset pipeline.

But usually even if you have all those commodities, what I still have to see is an integrated system, decoupling code from data, dividing data in its most basic components and relation between components (i.e. meshes, shaders, textures, material parameters, are separate, but related, entities), and integrating with version control, build, DCC applications, networked update (hotswapping), reflection of code structures into data and code generation of data into code structures.

I would like to be able to launch the game, drop a track, make a model of a car with a DCC application, "build" it, load into the game, attach the car to the AI, tweak its material parameters, and save the resulting configuration into the version control system. That would really be a technological breaktrough.

P.S.
It would be a dream come true if I could even hotswap code, not only tweaking scripts, but actually having a dynamic linking system... On some platforms this is not allowed in a shipping game, but for development is doable...

P.P.S. Data driven does NOT mean overengineered. You should not reduce everything to single system, small, specialized systems are fundamental, procedural models, ad-hoc algorithms, are often better than collating together pre-existing generic pieces. The data-driven infastructure
should also make easier to write a new subsystem, and to integrate it with the tools. Generalizations should be made only if they save work, and only if and when they are needed, not apriori.

19 May, 2008

Reflection saved the day

An easy question: can fx composer 2 render to a given mipmap level (surface) of a rendertarget texture instead of using only the first one?

Ask this question on NVidia developer forums. Wait. Hope. No reply... Well, never fear! Let's just download the .NET Reflector and try to read the code by ourselves. After a little bit of wandering around (wow, they have quite a few assemblies!), we find FXComposer.Scene.Render.Direct3D.dll, that's a really nice starting point. Class FXRenderDevice_D3D, we're getting closer, SetRenderTarget method, cool, let's disassemble it and we find:

using (Surface surf = target.GetSurfaceLevel(0))

Unfortunately that makes me pretty sure that the answer is no, nope, you can't, doesn't matter if you use SAS or ColladaFX, better luck next time.

Reflection is great. You can use it to gather all kinds of information on your code, or to dynamically create it as well (via Reflection.Emit or CodeDOM, the latter requires the c# compiler). Cool stuff.

P.S. Shame on you NVidia I really do badly need to be able to render to a mipmap level and everyone involved with shadowmaps and/or posteffects (follow that link!) probably have the same needs as well! I'm really disappointed by the new FX composer, you added a lot of functionality but for coders, it's as good or worse than the old one. Also I don't believe that most studios do need all those scene/animation capabilities just because usually, when you have to author a material shader, you test it directly in yor DCC application anyway...

P.P.S. The next step, for the hacker inclinded guy, would be to (if assemblies are not signed) use a debugger to find where the SAS scripting calls the setrendertarget, modify it to parse a surface number, and pass it into the texture that will be set. Everything should be doable within .Net Reflector via the Reflexil and Deblector addins!