C0DE517E: 02.08

23 February, 2008

Next-gen and realism.

First of all, "real" in next-gen games is not usually real as in real world. Is more real as in film versions of the reality. And it's obvious, we want to express our view, not the bare truth. Also it's very fortunate, because we can get creative.

Physically based or not? Is Blinn still right? What looks good, is good in Computer Graphics?
Of course he is. Graphics is about look. But we should care about physics not because we are not able to achieve the correct look by fakes (and there are ps2 games that can prove that), but because fakes are usually hard.

Cheats will always be done. We don't have enough power to not cheat, even considering the simple models of light that are used in nowdays offline renderers. But now doing "the right thing" is a great tool that helps a lot. Ease of use, was the main reason behind Global Illumination in the first place, and now it's the same for the next generation unbiased renderers. It's just more convenient to work in the most accurate light model that you can simulate, because it will usually look right without much tuning.

But always remember also that you should empower artists, don't take it to the extremes. Artists are very good at tweaking stuff, so keep usability always in mind. A fully automated GI solution can be a nightmare for artists if they can't bend the lighting model to their needs, when they need to.

So start with physics, then add hack-ability on that.

In the end, Blinn's motto nowdays has also different meaning to me. It tells me to care about perception and phsicology (uncanny valley? Crysis had to address that problem for example, we are there), as we're getting close to the limit where those components are really important.

22 February, 2008

You CAN'T afford doing things in the wrong way.

Game development is hard. Design is, by its nature, continously subject to changes. Competition is very high. The required investments also. You work on the bleeding edge of the technology. You need to have skills in many different fields. Many times missing deadlines is not an option, or it's very expensive and you have to deliver an high quality product as well in order to be able to compete.

It's really hard enough. You don't want to add on top of that a bad development process. You have to do the things in the best possible way, to manage to get everything working. And sometimes the temptation of avoiding good practices is high, due to the pressure you have expecially towards the end of a project. Some other times your company/game/product simply does not start in the right way. And then everything will be messy and a waste of time/money.

Change is not avoidable. You have to be prepared for it. You really want to identify probles as soon as possilbe.

Plan every major feature. Gather requirements. Gather references. Plan possible implementations. Try to identify risks. Make fuzzy time estimates. Refine those estimate until you're confident enough about them.
If you don't have enough information to be confident about your fetaure plan, stop and GATHER IT. Make early prototypes. Don't blindly try to do something and hope that everything turns out right. It WON'T.
AGAIN: Don't do things blindly. Gather information. Make prototypes. This is not related to programming only. In example, you want to strive for a given lighting in your game. You're not sure about some quantities like the number of light sources, light types, shadow casting direction, shadow casting object, shadow receiving objects, etc... Then do a prototype with some test renderings directly in your 3d authoring application. Prototype possible solutions, see if it's reasonable to expect that your assumptions are right.
Investigate about how many areas a feature will impact.
Allocate your technical budgets for each major feature early (i.e. CPU time, GPU time, memory etc).
Don't enter production if the planning phase is not done. You will have enough changes to deal with anyway. Don't start the real product if you've not investigated and planned all the major new features. Prototype everything that has an high risk.
Be incremental. Work in small iterations. THINK. Develop. Test. Refine/refactor.
Always test. Test as soon as it's possible. Run batch tests. Make a working game as soon as it's possible. Do early QA tests on every newly implemented game feature. Gather information.
First priority is to have something working. You can't break the work flow. Spend extra time to make sure that your feature is working. Don't rush.

And, more day-to-day coding related:

Strive for code quality. Do reviews. Use linting tools. Adhere to good coding standards. Refactor.
Design for change. Design for flexibility. Try to have as few interdependencies as possible. Try to be always able to trash a given code and rewrite it. It will happen. It SHOULD happen, as code WILL rot, no piece of code is immortal. This is expecially true in games, with fast moving technology and always changing requirements/scenarios.
Keep the documentation updated. Run a internal WIKI. It has to be EASY to write the documentation. Documentation tasks should be allocated for any new major technological component. Comment your code. Comment your HACKS, you will do them at the end of the project, but at least you should comment them so you can remove them later.
Before starting a new project on an old code base, clean the code base. KILL any dead code. Kill last minute hacks. Search for old "todo" items.
Try to do the right thing. Don't cheat too much. Cheats are hard to maintain.
Think about your tools. Tools are hard. Tools are the key of productivity and are the main mean of interaction between the coding and graphics department. If you can't affod to do many in house tools, find a way to leverage on existing technologies and be minimal. Don't build huge tools that you can't affod to refine and maintain. A bad tool WILL kill a good feature.

If there are some items on that list that you were NOT doing during game development, then probably you're doing the wrong thing (or I wrote something stupid). If you do the wrong thing you will end up loosing money. If you loose too much money, your project will fail.

Faking nextgen

Another old article from the web, the making of Shadow of the Colossus (alt. link). Very intereting tricks, another example of how oldgen knowledge is always useful (and also think about WII and PSP...)

21 February, 2008

I missed that...

...the first time I've read about Capcom's MT engine, so it's not exactly new, but as I was reading again MT documents around the web for my previous article, I've found the following nice trick (here):

MSAA trickery. Another thing the Capcom presentation talks about is their use of MSAA trickery, for increased speed. On the consoles, where you have lower-level access to frame buffer formats, etc. you can do wacky stuff like pretending a 1280×720 (720p) no-AA screen is a 640×360 (ordered grid) 4xMSAA screen. This works, because these would have the same pixel dimensions. By drawing e.g. particles to the 640×360 4xMSAA screen instead of the 720p screen you would reduce your fragment shader computation to 1/4 (as the fragment shader is only executed once per pixel for multisampling), while still rendering to the same pixels as you would have if drawing into the 720p buffer. This is a way of trading fidelity for speed (or vice versa) and it is a very nifty trick.

20 February, 2008

Concurrency part 2

When you have sorted out the concurrency problems for CPU threads, and established a way to safely generate data in CPU for the GPU, set up your object pipelines, you will end up hitting the problem of multithreaded draw calls.

At the moment, such a thing is not possible in any mainstream platform, you have to issue all the draw calls from a single thread that owns the rendering device. The usual solution to this problem is to bake command buffers (display lists in openGl terminology) on the non-rendering threads and then pass them to the rendering one that draws them.

The problem with this approach is that you can't sort/optimize/wathever primitives between threads, all the draw calls are baked and just copied in the main ring buffer. Of course you can always organize your rendering objects in lists that are bound to a given renderbuffer/pass and process those lists in parallel, doing so you're pretty sure that you can't do more optimization than the ones you can perform on a single list. A problem arises when you have a few big lists and all the other ones are much smaller, in that case, processing one list per thread does not give you an optimal load balancing. So other solutions could be more employed, depending on the context.

A very common one is to have some higher level rendering data, usually meshes with materials and all the context needed to do draw that data. Those primitives/contextes are added by the various threads in a render queue that is then sorted (by rendertarget, passes, materials etc...) and generates state changes and draw calls. The state API is hidden and used only by the rendering thread. Using a lockfree stack helps.

Another interesting solution is the one employed by Capcom's MT engine (Lost Planet). It's like a cross-platform command buffer API, where the commands have hints on their ordering (rendertarget, pass, etc) and are issued in parallel by multiple threads, then sorted in each thread, gathered and merge-sorted togheter in the rendering thread and then converted in actual draw calls. This is somewhat an hybrid approach between an high level submission API and a native commandbuffer API, when you can still do every kind of inter-thread optimization, but in a very fast way, without hiding the state API and doing only a simple translation of the commands to the native API ones in the main thread.

Read more about this solution here. Notes and links to the MT engine here.

Visual Studio 2005 Intellisense

From the Visual C++ team blog, read here and here.

Short version: Intellisense is slow on VS2005. If you want to disable it, use the provided macros from the VC blog (that also provide some extra control over it) or better, delete FEACP.dll (front end auto complete parser), remove .ncb files from your project and create some directories with the same name as the deleted .ncb (so VS will fail to write ncb on exit). Then buy&install VisualAssist X

That's it.

19 February, 2008

Locoroco Equation

Locoroco is a GREAT game. In an attemp to do something similar to it, I made with a friend of mine a 2d softbody simulation. Nothing too sophisticated, particles, verlet integration, springs and dampers, very simple collision detection and an internal pressure force used instead of the more common internal "structural" springs (but no non-folding contraints so it can go bad, especialy as the volume computation used for the pressure force assumes a no self-intersections). Surely way more than what it's going on the original locoroco game. And way less stable as well...

Next step was placing on the softbody surface some markers for eyes, mouth etc. Again, probably locoroco is doing something simple, but as this is mostly a toy experiment, I wanted to find an interesting solution. What I wanted was to express the position of the anchor as a function of every point on the softbody (particles). This turned out to be a problem.

Basically what I wanted is, given N 2d points (particles), express another given one (anchor point) as a linear combination of the those ones. To do so you have to solve a very underdetermined system (only two equations and N unknowns, the weights of the linear combination). Now probably the mistake I made was thinking in terms of numerical analysis that is not the field I'm most expert of, but after a few internet searches, it seemed to me that most solvers of such systems were interested in finding the most sparse solutions, so I would have a lot of zeroes in the weight vector, basically unlinking the anchor from many particles, that's not what I wanted to do. Also I was uncomfortable with the idea of writing a QR factorization algorithm. It may seem odd, but if you do some "random" maths, then you're more free of doing in incorrectly, but if you have to implement an algorithm that is well known in the NA field, then to me you should really be an expert or rely on well tested libraries instead... but I didn't want to link my pet project with Lapack or similar libraries, it seemed to be really overkill to me...

After much thinking, I ended up with a simple solution. For each particle in the softbody shell (point), I selected the particle that was most orthogonal to it respect to the centroid, expressed the anchor point in the base formed by those three points (centroid and the two particles), and accumulate those weights in the array representing the solution (the weights for each particle). Then I divided the weights by the number of particles. I constructed N very sparse solutions, and then did a linear combination of them to obtain a single, complete one.

It worked. But I don't know how ignorant this solution is...

P.S. Have you ever come up with a solution to a problem, by looking at an obviously wrong solution someone came up with? I was thinking about that, it didn't happen this time, but many times I can find interesting solutions while trying to explain why someone else proposed solution is SO basically wrong. Naive attempts at solving an hard problem, can miss some key facts, but they usually bring a different view, not completelly unrelated to your problem, just wrong, but with an underlying idea. Sometimes you're able to pick that naive idea and translate it in a real solution, and it's very surprising to me when it happens.

Update: the 'right' solution is actually obvious when you look at the problem from the right perspective. If instead of considering it as a linear system or geometrically, you look at it as a constrained minimization, it's quite simple. The function to minimize is the weight variance, and the constraints are linear equations, one for each geometrical dimension. Cast in this way it's a quadratic programming problem, and I think it has actually a simple analytic solution.

18 February, 2008

Hidden overdraw

Overdraw is such an obvious and huge performance problem, that it not a problem at all anymore.
We have found every possible way to overcome it. We do a z-prepass, we have occlusion queries and hierarchical z-buffers (with "early-z" fail, discarding occluded triangles before shading them, causing problems with the shaders that want to modifty the fragment depth) built into every modern 3d card, ATI has also made a tool that optimizes vertex ordering of a mesh to reduce overdraw (the polygons are ordered front to back from every possible viewing direction, so it's something like from outside to inside) while also optimizing post-transform cache (as it should be always done).
And we know that alpha-blending is going to cause overdraw problems, so we draw our particle systems in small framebuffer and then compose them with the full resolution frame and z-buffer.
What we still can be missing is the "hidden" overdraw, the overdraw caused by fragments that eventually are rejected, but only too late (after pixel shading). This is most of the times caused by alpha-testing, that is not early-culled like z-testing (and it couldn't be).

The bottom line? Tassellate your polygons in those tree leaves and in those background props to fit more tightly the alphatested texture! Waste a few more vertices to save a lot of pixels (yes if you were wondering, true 3d stuff can be cheaper than alphatested polygons, in many cases)...

Old skills...

Some of the old days tricks seems to be still useful after all. It's not the first time that someone thinks about doing occlusion culling with a software rasterizer on a very small buffer and simplified geometry, but it seemed that this reuse of an old technique was going to die with hardware occlusion queries (that are way useful even for other uses). But know what? The PS3 SPUs (and the inability to find an use for them ;D ) are making this technique (and other SPU based pixel processing) actual again. Wonder when we will see again a C-Buffer...

Also, read this fine post from Marco Salvi about why not every post effect maps nicely in the GPU. No data sharing between pixels means that a lot of tricks can not be performed by the GPU. Box filtering is a really nice example, it could be done in costant time for any number of samples per pixel, on the CPU.

16 February, 2008

Art vs Code: In the beginning...

...there was the cracking scene. Cracker groups were busy removing copy-protections from videogames, and there was a competition among the best groups, with score given for fastest releases, and fanciest features (i.e. game "trainers"). To gain visibility, those groups started to add small "intros" to the game they cracked, small works of code, art and music.

Soon intros became a matter of competition their selves, and a new "scene" was born: the demoscene. And it still lives crafting fine graphic demos on almost every device that has a way to display graphics and in varied sizes (256 bytes, 4k, 64k and unlimited).

Nowadays, generative art is a very active field, with many tools crafted specifically for the purpose, university courses, and many excellent artists all over the world.

sanchtv. pitaru. kinesis. shiffman. waltz. generator.x. evolutionzone. flight404. toxi.

Why the videogame industry has not yet learned anything from that? Is there too much specialization in our industry? Why coders and artists generally do not talk together well? There are examples of how the opposite is very possible. But in games, artists craft pixels, coders craft code, communication between the two is a matter of tools. Even in the areas of less risk, for example, the frontend menu, there's no exploration of other possibilities. I think that we are seriously hampered by that.

P.S. for the lazy guys, some YouTube vids of demoscene products, it's way better to download the real thing tho: Lifeforce IntelDemo^Fairlight Namatomorpha Debris TrackOne etc...

08 February, 2008

C64 still rocks

I feel I'm writing posts that are too long. So this one, won't be (hopefully).

http://www.freedomirc.net/~megaboz/shredz64/

07 February, 2008

Kill your darlings

The idea of building software by gathering requirements, doing a monolithic design, and then implementing, is dead. And this is no news. Expecially, in games, where the requirements are constantly changing, due to the very nature of game design. Agile methods are not the solution to every problem, but they can't be ignored.

But I have to notice, that most of the times we, computer scientists, are too versed in the idea of building the optimal design from grounds up. We want to achieve the best possibile solution, the most clean, most elegant, most performant one by seeking the perfect design, and we can't start implementing things if we haven't reached this level of knowledge and enlightment about our problem.

The thing is that our design, whatever it was, is going to fail. And we will need to change it. And we don't want, because it was beautiful, we are emotionally linked to it, so we start patching, we start trying to reduce every problem so it fits in our original vision. And this is the beginning of the end...

We have to learn more about artists, about painters (or sculptors). We have to learn to sketch, then to paint in the major volumes, tweak, trash, redo, paint with large brushes always keeping the whole picture present, always having a "working" coarse grained version of our vision. Then refine. Change maybe our mind about some parts. Refine again. Start nailing in the details. It's very useful.

In the old days of computer paining, pixel art, this was not done. Many artists started from a corner, and pixel by pixel composed diretly the high-detail version of the picture. Now we have too many pixels, and possible colours, and noone works like that anymore. We have to learn to do the same.
It's the complexity of the problems and the fact that you can't really foresee the final product from the very start, the reason to work in a iterative fashion. In fact, that's the same difference between drawing and painting, and you can easily see those different ways of working in different artist, i.e. Shiele and Klimt versus Caravaggio and Picasso.

Prototype. Iterate. Refactor. Don't be afraid of deleting parts of your code.

Overengineering is not better than underengineering. It is as bad, plus more wasted time.

The value of failure

Failure is underestimated in computer science papers. I would love to see a paper that's entirely about how not to do a thing, instead of reading useless articles about a slight variation of a well known method, that is some rare cases (that the author tryies to disguise as general ones) outperforms other implementations. I love postmortem's "what went wrong" reports, most of our experience is built by our failures.

But sadly researchers most of the times have to produce papers, and so we end up having tons of uninteresting stuff to read and filter. I would not trust anything that does not tell me clearly where it can't be applicated. Because if an article does not point out the shortcomings of a given method, most of the times the authors are trying to hide something, or simply they didn't have the time to make more tests.
And of course, most of the times those hidden problems are going to hit you late in your attemp to implement the given method, mostly because algorithmic problems are harder to diagnose than implementation bugs.

Do not trust your papers.

- sidenote: Always seek for negative critique in your work. If someone likes what you are doing, it's not helpful. Even the most destructive critique, even if it's not motivated and if it's not suggesting a way to improve, at least should let you think for a moment about what you're doing, then you can discard it in the way you prefer, but it was helpful. A destructive critique, that is well motivated is one of the best possible gifts someone can give you. It shows you the faults in what you're doing. It lets you think about solutions. This was what I learned from an old friend of mine, a very good artist, Maurizio Gemelli (Kublay). We were always fighting about, almost anything. I'm not a good painter, and he used to trash my work roughly, but a day I came up with a decent portrait, and he still criticized it. When I asked him why he wasn't appreciating that it was still, overall, decent, he told me that for his works he was always seeking for someone that could give him negatve critiques, and this was both surprising and enlighting.

03 February, 2008

Live coding competitions?

Livecoding is quite amazing, but mostly restricted to music as of now:

AA-Cell performance at The Loft, Australia
ChucK and Audicle IDE demo
Alex McLean blog (Vocable)
TopLap group WIKI

but there are some programs capable of live coding for the graphics also (most notably VVVV). Wouldn't it be great if someone made live coding competitions events (like the 1 vs 1 live design compos of cut'n'paste) for rendering?

I really love scripting languages and exploratory programming, expecially when applied to graphics. Computation is art.

P.S. ok so obviously, this was another "cool links" like post ;)

Reverse Engineering

Reverse engineering rendering engines can be an excellent learning excercise (even if, it's not legal in some countries). Most of the times, the reversing process is itself intresting, as you delve into a renderer, and apply your knowledge to abstract structures, thus forcing yourself to work on the things you already know, and most of the time the map that you make is not the same that the original developer did, and thus you end up discovering new ideas.

My one and only tool for engine RE. (used with 3dsMax)

For example, when one of the artists I was working with first saw a strangely colored/dithered screen space texture in the video card memory, when running Crysis (before reading the Crytek siggraph paper, of couse), my mental process was more or less the following:

One of the channels, seems to be an ambient occlusion contribution.
The dither seems to be regular. Most probably, it's a 4x4 block where each pixel in the block is mapped to a given sampling direction / strategy.
De-interlacing the blocks in photoshop, confirmed that. But it seemed to me that each pixel was coupled with a given raytracing direction (that was wrong).
Real raytracing is too expensive, and the effect seems to be in post processing, they are probably reconstructing geometrical information from the Z(W)-Buffer. Some artifacts on narrow poles in the scene, confirmed that.
Raymarching on 2d heightfields is something that it's not really new, I.E. relief mapping
...I guess I can modify a relief mapping shader to do ambient occlusion, by sampling the hemisphere around the normal of every screen pixel, raymarching the ZBuffer...
After reading the siggraph paper about Crytek screen space ambient occlusion I discovered that this is not the Crytek way (I did not reverse the Crysis pixel shader), probably mine is worse, but it was an exciting journey...

Another nice tale comes from reversing Colin McRae DiRT, but maybe I'll write about that later on (the intresting thing I discovered in that one is, as far as I can tell, that they merge all the meshes with the same vertex declarations in order to be able to draw them very fast alltogether without materials, for example, when computing shadowmaps, and to draw in different pieces when the materials are needed, by binding to the merged vertex buffer some ad-hoc index buffers)

Pixel VS Vertex VS Screen

Many times, when asked to implement an effect (for example, fog), the rendering engineer has to choose in which space to operate. It's always a matter of balancing the pipeline (performance) and ease of implementation. I'll talk about "features" of a given space instead of pro and cons, as those features turn into pros or cons depending on what you're trying to do.

Object vertex shaders. Features:

Dependant on the mesh, decoupled from the pixels (this means that you can have different shading densities on your model!).
Does not automatically LOD (but it's usually easy to implement LOD, lots of possible choices)
Can alter the geometry (and topology, on DX10 hardware, via geometry shaders).
Caching is dependant on the vertex ordering.
Culling is not automatic (but with occlusion queries, it's not harder anymore than the one, highly refined, provided by the rasterizer with z-buffering, early-z, hierarchial-z...).
Outputs are interpolated on the mesh surface in a linear (and perspective corrected, for texcoord interpolators) fashion.
Can access neighbors data only on DX10 hardware (and on DX9.5, read, Xbox360).

This, unfortunately is becoming an infrequent choice, that puts too much pressure on the pixel units. Unified shader units, kinda solve that, but still, doing things per vertex, can be intresting, for example normal mapping is way too frequently used, but for some kinds of applications, where the normalmap is sparse on the model, and/or where you have to encode sharp edges (for example, cars in a racing title), adding detail in the mesh can be a better choice, because it adds complexity only where it's needed.

Material pixel shaders. Features:

Dependant on the pixels, decoupled from vertex complexity (usually)
Does automatically LOD (as LOD space usually is screen space, smaller objects use less pixels, thus less shading).
More LOD can be applied (but it's not easy, i.e. scaling shading features for far away objects, that still can fill a good percentage of the screen).
Overdraw is a big problem.
Can access to neighbors data (via differential functions, limited to the currently drawn primitive).
Powerful access to external data (via textures and samplers, this is also true for vertex shaders in unified shader APIs/hardware, i.e. DX10).

After-effect (screen space) pixel shaders. Features:

Not coupled with objects (no need to cull, but that means also no overdraw).
Not coupled with materials, no need to write the same code in all the material shaders.
Outputs AND Inputs are coupled with the screen pixels (reading data from a screen sized buffer is hugely more expensive than getting the same data from vertex shader interpolators, as inputs are usually way more than the outputs, and have to be written by material pixel shaders too, that easily becomes a bandwidth problem).
Can randomly access to neighbors data.
Low-frequency effects can be subsampled (kinda easily).
Can be difficult to LOD (usually only if you have access to dynamic branching, with all its limits).
Does not work with antialiasing.
Can only "see" the first layer of visibility - limited geometrical information can be reconstructed. (eye first-hit in raytracing terms, could be solved with more buffers and depth peeling but it's expensive).
Precision problems (in inputs, expecially with older hardware that does not have floating point textures and when reconstructing world space positions from zbuffer, tip: always use wbuffers, it's both faster and more accurate to reconstruct world space from them).

This one is becoming more and more a popular choice. I'm not a big fan of fully deferred shading, but many shading effects can be effectively moved from materials to post in a really efficient way.

02 February, 2008

Raytracing/GI links

Opensource unbiased renderer, based on the excellent PBRT

OMPF forum (intresting also as a general programming resource)

Metropolis-Hanstings and Mandelbrot fractal

Concurrency

The more I delve into the problem, the more I understand that there is no unique solution. But there are a couple of models that are very intresting and usable.

Actors and futures, as seen in some nice languages.

High-level parallelism, usually parallel version of functional iteration constructs on data. That's one part of the approach that is very frequently used in games (the other part being explicit threads for some main game components). A good example of this approach is the multithreaded transition of the source engine at Valve software (see Dragged Kicking and Screaming: Source Multicore)

Consumer-producer. Most of the times, in a render engine, you have two kinds of C-P situations to deal with:
Standard producer-consumer. I.E. a mesh data instance built in CPU for a crowd system. This is the consumer-producer problem, the GPU can will consume the data, but we don't know how many frames ahead (of course, there's usually a cap, that can be implemented by blocking the allocation function) the renderer thread can be. AllocateFrameData(framenumber, size) and FreeAllFrameData(framenumber) is a nice API to have.
Indexed producer-consumer. I.E. to decouple simulation from rendering. In that case, we can have a fixed buffer of N entries and a global timestamp. Simulation locks a buffer in write mode, at time T, overwriting data of the least recently written buffer that is not locked. Rendering locks a buffer in read mode at time T, and the nearest buffer to the requested time, that is not locked is returned to it. This is quite like a producer-consumer, but not all the produced data will be consumed, a fixed buffer is allocated, that does not lock (it could, if there are more concurrent threads than buffers, but not because the producer is too much ahead of the consumer) but where the data can be overwritten before being consumed.

Lockless (lockfree) data structures. Not generally applicable, but they can be a powerful tool, when they can be used. Even if they are not one of the most useful techniques, it's still something to be known, with all thier drawbacks. The indexed P-C described above can be made lockless for example, stacks are an easy data structure that can be lockless, reference counting also can be lockless, but not in every kind of implementation (AFAIK if you avoid having an extra level of indirection, and keep the counters in the classes by deriving them with a common reference counted base object, then you can't have implement the lock free version on most hardware platforms, that do not come with atomic double compare and swap, in that case a spinlock is the most reasonable solution, usually)

Good comments, bad code

What a good comment is? One that eases the understanding of a block of code.
What a good code is? One that does not requite any comment to be understood.

EDIT: As the brevity of this post seemed to cause some confusion, I'm not advocating the no comments practice. That's only lazyness. Even literate programming for me is nice (but I would not use it in a commercial project), the point is, you should comment every non-trivial issue, and you should code in a way that makes most issues, trivial. (bad) example follows:

// good comment bad code:
assetmanager.loaddata(filename, true); // async load of player mesh

// good code:
assetmanager.load(playerMeshFile, MODE_ASYNC);

// good code, good comment:
assetmanager.load(playerMeshFile, MODE_ASYNC); // preloading data in the frontend, as the user has already choosen its player

XML Abuse

Nowdays, XML is very popular. It seems that every data-driven approach, should be done using that magical format. And indeed, most of the times, it's a good idea, XML has parsers for any programming language (SAX can be a nightmare, but it's nice) and has some really nice tools for transforming data.
But be aware of what your data is. Most of the time, data-driven is not about pure data, but transformation of some data or binding of objects with other objects. This kind of information, is difficult to encode in XML, and most of the times you end up writing something that starts to look like a scripting thing.

Hints of XML abuse:

Macros in XML strings. I.E. attribute="{database_object{index}.type}" or filename="{getpath:{system.currentpath}}{getfile:{system.projectname}}"
Logic in XML tags. I.E. "if" tags, "variable"-definition tags, "call" tags, that evaluate other nodes with the current state of the parser...

Ask yourself, why am I using XML? Why am I parsing things, pulling them from data? Wouldn't be better to use a scripting language to push data into the runtime instead? Remember that scripting languages most of the times are more than suited for expressing data initialization, and they provide you the power of expressing transformations and logic, when you need them. Maya scene files, are a MEL script that spawns all the objects in the scene. Also scripting languages tend to be efficient, and usually come with a nice and fast binary compiler...

Functional programming in c#

Functional programming is becoming more popurlar and mainstream. C# is embracing it, mostly due to the recent inclusion of the LINQ technology (a SQL-like language extension to manage queries on objects), but still, it's a more than welcome addition. As usual with Microsoft, C# started as a copy of an existing technology (Java), but now has been refined so much that not only has in my opinion surpassed its master, but is one of the most beautiful "mainstream" languages nowdays.

A few links (very intresting, but mostly unuseful):
A nice and self-contained reading about building data with closures.
Y combinator, in C#
Immutable data structures in C#
More immutability
LazyLoader class
LINQ raytracer

A few links (very intresting, with practical use):
C# 3 raytracer
Microsoft F#

Note that I've placed the "immutable data structures" links in the unuseful section. There is quite a bit of research in the C# community about immutability, mostly in order to ease parallel programming. I don't really like those kinds of approaches to the problem, when you drop an immutable data structure in an inpure world, you still have to lock every time you need to "publish" a "new version" of your data. Those locks are simple, they can be even implemented in a lockless way, but then multiple concurrent updates are a trouble and can easily cause livelocks for complex structures. And you have to lock on reads as well if you're not confortable with the idea of reading possibly "old" data, but if that is not a problem, then you could even stick with imperative data structures, and make copies of them, that buffering solution is the most used in games (between the two main threads, simulation and rendering)

Most impure functional languages work the other way, they encourage a programming style that is impure in the implementation details, but that as a whole, do not have side effects, so you can drop impure code into a purely functional world.
I would rather have immutable interfaces to mutable data structures, i.e. C++ deep constness, to be capable of not locking if I'm sure that all the threads are feeded with a readonly version of the data structure.

Decoupling the scene tree

Decoupling your code into isolated modules, is a technique that should well be understood in those times of OO programming. All the classes in a module, should minimize the dependancies on the classes of other modules. Yet still, it's perceived in the wrong way, as a tool mostly to be used to improve code reusability and extensibility, or implemented in the wrong way, that's to say, in the wrong places.

Decoupling eases refactoring (local changes do not propagate), everything changes, expecially in games (requirements change, no design can be made totally upfront), eases testing (local testing, does not propagate) that again, eases refactoring, but in some cases, it can also be the best possible design to improve performance, and reduce (art) iteration time.

Many engines (expecially the ones you find in books) are centered around the scene tree, a collection of linked nodes that have a coordinate system, where the connections establish the relations between those systems, the child nodes are expressed in relation to the parent's system. Eventually, certain nodes have a volume (that usually includes the volume of its childs, to form a BVH), so they can be culled for visibility, and are renderable, usually by containing some links to meshes and materials.

The engine then traverses the graph, if a node is visible, then it gets rendered. This is the naive approach. Usually it's slow, as the traversal is not cache friendly, expecially as we're going to compute per node, many different things on different data (visibility, set material, draw meshes). And when you face that problem, the first solution is to add caches, so you can precompute the traversal of all the static elements. That makes your code bloated, but it's not the only problem.

Having the scenetree coupled with all kinds of engine concepts, makes changing it a nightmare. Usually, it's built from artist-authored data, so a file exported from a 3d application is loaded, processed by the engine pipeline to create the scenetree, and then the scenetree is serialized, as usually the exported data has to be heavily processed in order to create render optimized, platform specific objects, and so it's not something that we want to do at load time.

But what if we later want to change how a given object is rendered? We have to change a node implementation, let's say that we add a new kind of node, that is to be used in some given situations. To construct this new kind of node, we have to change the conversion pipeline as well, and to reprocess all the converted assets! Coupling turns change into a nightmare.

Also I wonder why we structured our rendering data structure, around a tree. Most objects do not need the coordinate system relationships, as this is only useful for animated rigid objects with joints, something that is not really the most common use case. Also, visibility computation, can well need different relationships, hierarchical (BVH) or not at all, depending on the algorithm we want to implement.

Rendering is better suited to be described by pipelines. Renderable objects are grouped in lists, update is called to sync render object state with current frame simulation (game) data, visibility function creates selects visible objects in that list, rendering function creates a list of basic renderable entities (mesh,material), then we sort those entities, then we build the command buffer.

Artist authored data usually comes in the form of a scene tree, but that should only be processed in order to create renderable objects, not used as the basis for our engine. Plus, doing so, we could easily choose if to serialize the renderable objects, or the converted scenetrees used to build them. If the second solution is taken, we can always change the renderable object type that is built for a given scene. Of course, serializing less comes with the extra cost of rebuiling more data, but it also makes the (painful) serialization process more important, as we can change more without impacting serialization. So be careful. Benchmark your IO speed and then design your serialization. Most of the times, you need only to save converted meshes (slow to process), textures, animations and materials. But NOT the containers of those things. So later, when the artists change a texture, you can simply reprocess THAT, and not the entire serialized scene.

Usually, a render engine can be seen in layers, the first two places are taken by the hardware and the native APIs, then you build your data abstractions (meshes, shaders) and your render API (crossplatform renderdevice), then you build your engine on top of those (render algorithms, materials, effects etc). Choose wisely at which layer you want your serialization to operate.

The bottom line is that having less "powerful", less "generic" nodes, and decoupling more gives us not only more flexibility, but also faster iteration, and faster performance, as a renderable object can process converted data and order it in the most cache/algorithm/etc friendly way it needs.

Search this blog