Search this blog

17 December, 2009

Lighting Compendium - part 1

Lighting is, still, one of the most challenging issues in realtime rendering. There is a lot of reserach around it, from how to represent lights to material models to global illumination effects.

Even shadows can't be considered solved for any but the simplest kind of lightsource (directional or sunlight, where using Cascaded Shadow Maps seems to be a de facto standard nowadays).

It looks like we have a pletheora of techniques, and choosing the best can be a daunting. But it you look a little bit closer, you'll realize that really, all those different lighting systems are just permutations of a few basic choices. And that by understanding those, you can end up with novel ideas as well. Let's see.

Nowadays you'll hear a lot of discussion around "deferred" versus forward rendering, with the former starting to be the dominant choice, most probably as the open world action-adventure-fps genere is so dominant.

The common wisdom is that if you need a lot of lights, deferred is the solution. While there is some truth in that statement, a lot of people accept it blindly, without much thinking... and this is obviously bad.

Can't forward rendering handle an arbitrary number of lights? It can't handle an arbitrary number of analytic lights, true, but there are other ways to abstract and merge lights, that are not in screen space. What about spherical harmonics, irradiance voxels, lighting cubemaps?

Another example could be the light-prepass deferred technique. It's said to require less bandwidth than the standard deferred geometry buffer one, and allow more material variation. Is that true? Try to compute the total bandwidth of the three passes of this method compared to the two of the standard one. And try to reason about how many material models you could really express with the information light-prepass stores...

It's all about tradeoffs, really. And to understand those, you have first to understand your choices.

Choice 1: Where/When to compute lighting.

Object-space. The standard forward rendering scenario. Lighting and material's BRDF are computed (integrated) into a single pass, the normal shading one. This allows of course a lot of flexibility, as you get all information you could possibly want to perform local lighting computation.
It can lead to some pretty complicated shaders and shader permutations as you keep adding lights and materials to the system, and it's often criticized for that.
As I already said, that's fairly wrong, as there's nothing in the world that forces you to use analytic lights, that require ad-hoc shader code for each of them. That is not a fault of forward rendering, but of a given lighting representation.
It's also wrong to see it as the most flexible system. It knows everything about local lighting, but it does not know anything about global lighting. Do you need subsurface scattering? A common approach is to "blur" diffuse lighting, scatter it on the object surface. This is impossible for a forward renderer, it does not have that information. You have to start thinking about multiple passes... that is, deferring some of your computation, isn't it?
Another pretty big flaw, that can seriously affect some games, is that it depends on the geometric complexity of your model. If you have too many, and too small triangles, you can incour in serious overdraw overheads, and partial-quads ones. Those will hurt you pretty badly, and you might want to consider offloading some of all your lighting computations to other passes for performance reasons. On the other hand, you get for free some sort of multiresolution ability, and that's because you can split easily your lighting between the vertex and pixel shaders.

Screen-space. Deferred, light-prepass, inferred lighting and so on. All based on the premise of storing some information on your scene in a screen-space buffer, and using that baked information to perform some of all of your lighting computations. It is a very interesting solution, and once you fully understand it, it might lead to some pretty nice and novel implementations.
As filling the screen-space buffers is usually fast, with the only bottleneck being the blending ("raster operations") bandwidth, it can speedup your shading quite a bit, if you have too small triangles leading to a bad quad efficiency (racap: current GPUs rasterize triangles into 2x2 pixel sample blocks, but quads on the edges have only some samples inside the triangle, all samples get shaded, but only the ones inside contribute to the image).
The crucial thing is to understand what to store in those buffers, how to store it, and which parts of your lighting compute out of the buffers.
Deferred rendering chooses to store material parameters and compute local lighting out of them. For example, if your materials are phong-lambert, then what does your BRDF need? The normal vector, the phong exponent, the diffuse albedo and fresnel colour, the view vector and the light vector.
All but the last are "material" properties, the light vector depends on the lighting (surprisingly), we store in the "geometry buffer", in screenspace the material properties, and then run a series of passes for each light, that provide the last bit of information and compute the shading.
Light-prepass? Well, you might imagine even without knowing much about it, that it chooses to store lighting information and execute passes that "inject" the material one and compute the final shading. The tricky bit, that made this technique not so obvious, is that you can't store stuff like the light vector, as in that case you would need a structure capable of storing in general, a large and variable number of vectors. Instead, light-prepass exploits the fact that some bits of light-dependent information are to be added together in the rendering equation for each light, and thus the more lights you have the more you keep adding, without needing to store extra information. For phong-lambert, those would be the normal dot view and normal dot light products.
Is this the only possible choice to bake in screenspace lighting without needing an arbitrary number of components? Surely not. Another way could be using spherical harmonics per pixel for example... Not a smart choice, in my opinion, but if you think about deferred in this way, you can start thinking about other decompositions. Deferring diffuse shading, that is the one were lighting defines shapes, and compute specular in object space? Be my guest. The possibilities are endless...
But where deferring lighting into multiple passes really shows its power, over forward rendering, is when you need to access non-local information. I've already made the example of subsurface scattering, and also on this blog I've talked (badly, as it's obvious and not worth a paper) about image-space gathering, that is another application of the idea. Screen-space ambient occlusion? Screen-space diffuse occlusion/global illumination? Same idea. Go ahead, make your own!

Other spaces. Why should we restrict ourselves to screen space baking of information? Other spaces could prove more useful, especially when you need to access global information. Do you need to access the neighbors on a surface? Do you want your shading complexity be independent of camera movements? Bake the information in texture space. Virtual texture mapping (also known as clipmaps or megatextures) plus lighting in texture space equals surface caching...
Light space is another choice, and shadow mapping is only one possible application. Bake lighting and you get the so called reflective shadow maps.
What about world-space? You could bake the lighting passing through a given number of locations and shade your object by interpolating that information appropriately. Spherical harmonic probes, cubemaps, dual paraboloid maps, irradiance volumes are some names...

Note about sampling. Each space has different advantages. Think how you can leverage them. Some spaces for example have some components that remain constant, while they would vary in others. Normalmaps are a constant in texture space, but they need to be baked every frame in screenspace. Some spaces enable baking at a lower frequency than others, some are more suitable for temporal coherency (i.e. in screenspace you can leverage on camera reprojection but other spaces you could avoid updating everything every frame). Hi-Z culling and multi-resolution techniques can be the key to achieve your performance criteria.

Ok, that's enough for now.

Next post I'll talk about the second choice, that is how to represent your lighting components (analytic versus table based, frequency versus spatial domain etc...) and how to take all those decisions, some guidelines to untangle this mess of possibilities...
Meanwhile, if you want to see a game that actually mixed many different spaces and techniques, to achieve lighting I'd suggest you to read about Halo 3...

19 November, 2009

Coding tactics versus design strategies

Today, while I was coming back home from work, I had a discussion with a colleage about one of our most important game tools: our animation system.

Said system is very big and has many features, it's probably one of our greater efforts and I doubt there is something more advanced out there. Now, it's even becoming a sort of game rapid prototyping thing, and it supports a few scripting languages, plus an editor, written in another language.

To make all those components communicate properly takes quite a bit of code, and so we needed to create another component that somewhats facilitates connecting the others.While we were discussing about the merits of techniques such as code generation versus code parsing, to link together different languages, it was clear that what was needed was indeed some sort of reflection, and that having said reflection would remove the need of also other parts of code (i.e. serialization).

So I went back home, and started thinking about why we didn't have that. Well, surely the problem had to be historical. Right now looking at our design, the "right" solution was obvious, but I knew that system started really, really small and evolved over the years.I realized actually that we didn't have in general a standard reflection system...

That's rather odd, as in many companies when you start creating your code infrastructure, reflection ends being one of the core components, and everyone uses that, it's more like a language extension, one of the many things you have to code to make C++ look less broken. We didn't have anything like that. We really don't have a core, we don't have an infrastructure at all!


Lack of strategy. We do have a lot of code. A lot. Many different tools, you won't believe how many, and I think that noone really can even view them all. We keep all those modules and systems in different repositories, really in different ogranizational structures with different policies and owners... It's huge and it can look messy.

To overcome the lack of a real infrastructure, some studios have their own standards, maybe a common subset of this huge amount of code that has been approved and tested as the base of all the products that studio makes. Some other studios do not do that, some other again do it partially.


Are we stupid? It looks crazy. I started thinking about how we could do better. Maybe, instead of choosing a subset of technologies that make our core and gluing them together with some bridge code and some test code, we could make our own copies of what we needed, and actually modify the code to live together more nicely. Build our infrastructure by copying and pasting code, modifying it, and not caring about diverging from the original modules.

But then what? It would mean that everything we modify will live in its own world, we can't take updates made by others, and we can't take other modules that depend on one of the pieces that we modified. And every game, to leverage on this new core, had basically yo be rewritten! Even cleaning up the namespaces is impossible! No, it's not a way that could be practical, even if we had the resources to create a team working on that task for a couple of years.


What went wrong? Nothing really. As bad as it might look, we know that it's the product of years of decisions, all of which I'm sure (or most of them) were sane and done by the best experts in our fields. We are smart! But... in the end it doesn't look like it! I mean, if you start looking at the code it's obvious that there was no strategy, the different pieces of code were not made to live together.


Is it possible to do better? Not really, no. We know that in software development, designing is a joke. You can't gather requirements, or better, requirements are something that you have to continuosly monitor. They change, even during the lifetime of a single product. How could we design technology to be shared... it's impossible!

Your only hope is to do the best you can do in a product, and then in another, and start to observe. Maybe there is some functionality that is common across them, that can be ripped out and abstracted into something shareable. Instread of trying to solve a general problem, solve a specific one, and abstract when needed. Gather. That's sane, it's the only way to work.


But then you get to a point were something started in a project, got ripped out because it was a good idea to do so, and evolves on its own across projects. Then another studio in the opposite side of the world sees that component, thinks its cool and integrates it. Integrates it together with its own stuff, that followed a similar path. The paths of those two technologies were not made to work together, so for sure the won't be orthogonal, they won't play nice. There will be some bloat. And the more you make code, promote it to a shareable module, and integrate other modules, the more bloat you get. It's unavoidable, but it's the only thing that you could do.


So what? We're looking at a typical problem. Strong tactics, good local decision, that do not lead over time to strong strategy. It's like a weak computer chess player (or go, chess is too easy nowadays). What's the way out of this? Well... do as strong computer chess programs do! They evaluate tactics over time. They go very deep, and if they find that the results are crap, they prune that tree, they trash some of their tactical decisions and take others. Of course computer chess can go forward in time and then back, wasting only CPU time.

We can't go back, but we can still change our pieces on the chessboard. We can still see that a part of the picture is going wrong and delete it... at least if we took the only important design decision out there: making your code optional. That's the only thing you have to do, you have to be sure to work in an environment where decisions can be changed, code can be destroyed and replaced. Two paths, two different technologies after ten years intersect. Good. They intersect too much? They become bloated? You have to be able to start a new one, that leverages on the experience, on the exploration done. But that is possible only if everything else does not deeply depend on those two.


Tactics are good. Tactics are your only option, in general. If you're small, have little code, have a few programmers, then you might live in the illusion that you can have a strategy. You're not, it's only that strong tactics at that size, look like a strategy. It's like playing chess on a smaller board, the same computer player that seemed weak, becomes stronger (even more clear again, with go) *. And of course that's not bad.

Some design is not bad, drawing the overall idea of where you could be going... Like implementing some smarter heuristics for chess. It's useful, but you don't live with the idea that it's going to be the solution. It can improve your situation by a small factor, but overall you will still need to bruteforce, to have iterations, to let things evolve. Eventually, over the years, relying on smart design decision is not what is going to make a difference. They will turn bad. You have to rely on the idea that tactics can become strategy. And to do that, you have to be prepared to replace them, without feeling guilty. You've explored the space, you've gathered information (Metropolis sampling is smart).

---

* Note: that's also why a lot of people, smart people, do not believe me when I say that stuff like iteration, fast iteration, refactoring, dependency elimination, languages and infrastructures that support those concepts, are better than a-priori design, UML and such. They have experience of too little worlds (or times). I really used to think in the same way, and even now it's very hard for me to just go and prototype, to ignore the (useless and not achievable) beauty of a perfect design drawn on a small piece of paper. We go to a company, or get involved into a project, or have experience of a piece of code. We see that there is a lot of crap. And that we could easily have done better! Bad decisions everywhere, those people must be stupid (well... sometimes they are, I mean, some bad decisions were just bad of course). Then we make our new system, we trash the old shit, and live happily. If the system is small enough, and the period of time we worked on it is small enough, we will actually feel we won... We didn't, we maybe took the right next move, a smart tactical decision. I hope that it didn't took too long to make it... because anyway, that's far from winning the match! But it's enough to make us care way too much about how to take that decision, how to make that next move, and not see that the real match does not care much about that, that they are not even fighting the big problem. It's really hard to understand all that, I've been lucky in my career, as I got the opportunity to see the problems at many different scales.

14 September, 2009

Fix for FXComposer 2.5 clear bug

The new Nvidia FXComposer is still mostly made of dog poo. Sorry, but it's an application that adds zero useful features, and a tons of bugs. Well not really, shader debugging would be useful to me, but I tried to debug my posteffect, made with SAS, and it failed miserably, so...

Unfortunately, FXComposer 1.8 is getting really old nowadays and sometimes it crashes on newer cards... so I'm forced to juggle between to two to find the one that has less bugs...

One incredibly annoying thing is that 2.5 on XP does not clear the screen, if you're using SAS at least. It doesn't work both on my Macbook 17'' and my Dell Pc at work (nothing weird, two of the most popular products in their categories, both with NVidia GPUs), so I had to find a workaround. Ironically, they seem to have so many bugs in SAS just now that for the first time since the beginning of FX composer, they released a little documentation about it... So now you kind of know how to use it, but you can't because it's bugged...

If you're having the same problem, here's my fix. I hope that NVidia engineers will make this post obsolete soon by showing some love for their product and fix this, as it's such a huge bug.

#define USEFAKECLEAR

[...]

struct FSQuadVS_InOut // fullscreen quad
{
float4 Pos : POSITION;
float2 UV : TEXCOORD0;
};

FSQuadVS_InOut FSQuadVS(FSQuadVS_InOut In)
{
return In;
}

#ifdef USEFAKECLEAR
struct FakeClear_Out
{
float4 c0 : COLOR0;
float4 c1 : COLOR1;
float4 c2 : COLOR2;
float4 c3 : COLOR3;

float d : DEPTH;
};

FakeClear_Out FakeClearPS(FSQuadVS_InOut In)
{
FakeClear_Out Out = (FakeClear_Out)0;
Out.d = 1.f;

return Out;
}
#endif

[...]

#ifdef USEFAKECLEAR
pass FakeClear
<
string Script =
"RenderColorTarget0=ColorBuffer1;"
"Draw=Buffer;";
>
{
ZEnable = true;
ZWriteEnable = true;
ZFunc = Always;

VertexShader = compile vs_3_0 FSQuadVS();
PixelShader = compile ps_3_0 FakeClearPS();
}
#endif

09 September, 2009

Calling for a brainstorm

Problem: Depth of Field. I want bokeh and we want correct blurring of both front and back planes.

Now I have some ideas on that. I think at least one good one. But I'd like to see, in the comments (by the way, if you like this blog, read the comments, usually I write more there than in the main post) your ideas for possible approaches.

Some inspiration: rthdribl (correct bokeh, but bad front blur, slow), lost planet dx10 (correct everything, but slow), dofpro (photoshop, non-realtime).

Words (and publications): gathering, scattering, summed area tables, mipmaps, separable filters, push-pull

I have developed something out of my idea. It currently has four "bugs", but only one that I don't know yet how to solve... The following image is really bad, but I'm happy with what it does... I'll tell you more, if you tell me your idea :)


Update: The following is another proof of concept, just to show that achieving correct depth blur is way harder that anchieving good bokeh. In fact you could have a decent looking bokeh even just using separable filters, the first image to the left just takes twice the cost of a normal separable gaussian, and looks almost as good as the pentagonal bokeh from the nonrealtime photoshop lens blur (while the gaussian filter, same size, looks awful)


Second Update: I've managed to fix many bugs, now my prototype has a lot less artifacts... I don't know if you can see it, but the DOF correctly bleeds out, even when the object overlaps an in-focus one. Look at the torus, the far side shows clearly the out-bleeding (as you can see an artefact still along the edges...). Now you should be able to see that the very same happens for the near part (less evident as it doesn't have problems, I dunno why, yet), and it doesn't shrink where the torus overlaps the sphere.


08 September, 2009

You should already know...

...that singletons are a bad, baaad, bad, idea: The Clean Code Talks - "Global State and Singletons"

It's interesting how people think that globals are bad, but singletons are not. They are the same thing, but still, internet noise can really destroy your brain.

You're told that globals are evil, and you're told that design patters are good. There is so much noise about that, they just become facts, with no need of reasoning around those.

Singletons are a design pattern (probably the only one you really know or seen actually used), so they're good.

Now try to remove that brainwashed ideas from your programmers minds... They won't believe you so easily, everyone loves design patterns (in my opinion, they're mostly crap), but you can use internet at your advantage. The video I've posted is on youtube, and done by google. That should be a very powerful weapon. Enjoy.