Search this blog

25 February, 2009

Game multithreading laws

Personal note: My eeepc still has no keyboard (now it's totally dead), so I'm writing from my girlfriend's laptop... I should learn not to use netbooks in the tub.
I'll cut it short this time. Explicit multithreading is too hard. Actually I think it's the hardest thing in computer science.
Parallel programming articles are always a fascinating read (I strongly advise, strongly, to read Joe Duffy's blog, and of course Herb Sutter's one), but the truth is, when it comes to real work, you want to minimize the exposure you have to it.
And yet, doing games, you want to make your CPU go over its peak gflops rating. How to? Those are my personal laws (not that there's anything revolutionary really, so don't be surprised if you're already following them):
  • Data parallellism is your only God. It will feed your long, starving pipelines, hide your latencies and vectorize your computation.
  • Embrace Stream computing, love Map/Reduce, study ParallelFX, OpenMp and CUDA and finally implement and use a ParallelFor primitive with a Thread Pool.
  • That shall be the only primitive you routinely use for multithreading.
  • Avoid explicit threads.
  • Avoid explicit locks/syncronization primitives.
  • Avoid all forms of data sharing.
  • Enforce the non sharing rule. Use smart pointers and reference counting base classes. Assert that shared smart pointers are acquired/released always on the same thread.
  • You don't need locks in your libraries, because you don't want to share. Re-entrancy is the key, if you can't achieve that, just say it, don't lock. Locks do NOT compose.
  • You don't need exotic parallel data structures or syncronization primitives, because you're not sharing.
  • The only time you have to think about syncronization shall be when communicating with the GPU.
  • You might not have more than explicit threads than the fingers of one hand.
  • Any explicit thread will only depend on one other (i.e. ai->game->rendering).
  • In runtime there will be only one syncronization point, when communication happens by passing an updated buffer of data to the thread that it needs it. That should be done with a queque. Locks contention is practically absent.
  • Communication is always mono-directional, one thread always writes stuff for the dependant thread that always reads it (or, one thread will own only one side of the communication queque).

Follow those rules, and you'll be happy looking at the guy who has implemented that super cool lockless actor based future system working all the weekends...

I will be writing more on this, focusing on how to make the render engine parallell, but it will take a while because my plan is to describe and then publish the code of an old (but not too old) engine I wrote for a series of articles that had to appear on a magazine, but never did...

03 February, 2009

Offline

Finally I've found some time to read a few publications from recent and not-so-recent conferences...

Cuda this, GPU that, it seems that most of the effort is spent in finding ways of adapting old algorithms to the GPU, even in fields were the GPU computation model (at least, of this generation GPUs, who knows about the future) does not apply very well.

Dunno, maybe since I've left university and started to work in the gaming industry, I've got too pragmatic... Still there are things worth reading, Approximating Dynamic Global Illumination in Image Space was really expected, SSAO ported to diffuse global illumination. Point-based approximate color bleeding by Pixar is more exciting, a realtime technique that gets implemented/mutated into the most highend, and thus stable, offline rendering engine.

If you planned to impress your friends with some realtime fluids computed on the GPU,
Real-Time Fluid Simulation using Discrete Sine/Cosine Transforms is a realtime, frequency space approach (like the uber famous Stable Fluids, by Jos Stam), with boundary conditions.

If you use spherical harmonics, and your game has day/night cycles,
Efficient Spherical Harmonics Lighting with the Preetham Skylight Model might be nice, even if you'd probably have to update the skydome map anyway, and you still should have plenty of time to do it in a slow way over a number of frames...

I've already encountered the work of
Ladislav Kavan (Dual quaternion skinning) and even exchanged a couple of mails with him while I was researching on quaternion skinning for my crowd renderer. Nice guy, and very intresting reseach. Animation is a field that I don't really know in depth, but for sure it's showing some good progress, and it's still something where a lot of improvement is possible, even right now, in practical applications. Physics based is the future!

Ok, so, here is where I wanted to talk about how I was surprised to find this after I knew this, and how I was happy to see that people are actually investing in single-ray SIMD raytracing structures, instead of fast but useless ray packets ones. Then I planned to crosslink two interesting posts from the level of detail blog, to show the work of Tobias Ritscher, to then conclude with Progressive Photon Mapping, explaining a lil bit the global illumination problem
(the title of the post was offline as in offline rendering), the magnificent work of Eric Veach versus the practical intuition of Jensen, and how he looks to me more like a coder than a reseacher, and explain the simplicity and beauty of a couple of his ideas (even if in general, I don't like photon mapping). As a postscript, I wanted to remark how bad are papers that don't provide any insight on the downsides of the presented algorithm. In PPM there are a few that appear to be pretty obvious (memory impact of having all that information on the hit points, that scale with the number of pixels in the image, dealing with aliasing and how it does compare to path tracing under less extreme conditions). But my EEEPC keyboard is failing, it started with the CTRL key a long ago, and I've replaced it with the right windows key... then a couple of function keys, and now the return and the arrows trigger the delete button... So I have to stop, anyway, it was really a boring post, I'll probably edit it or delete it later, on a decent computer...

02 February, 2009

DirectX texel offset reminder

DirectX evolved from a very bad library into a decently usable one. But under the hood, some bad habits still exist. Do you do post-processing effects? Or dynamically generate textures? Remember to align your vertices (subtracting half a pixel from them)!

This article is very helpful. Especially if you're trying to implement this...

Xbox 360, having a really cool DirectX 9.5 API, provides means of correcting that with a renderstate, and DirectX 10 solved it too, so that's only applicable to DirectX 9.