Personal note: My eeepc still has no keyboard (now it's totally dead), so I'm writing from my girlfriend's laptop... I should learn not to use netbooks in the tub.
I'll cut it short this time. Explicit multithreading is too hard. Actually I think it's the hardest thing in computer science.
Parallel programming articles are always a fascinating read (I strongly advise, strongly, to read Joe Duffy's blog, and of course Herb Sutter's one), but the truth is, when it comes to real work, you want to minimize the exposure you have to it.
And yet, doing games, you want to make your CPU go over its peak gflops rating. How to? Those are my personal laws (not that there's anything revolutionary really, so don't be surprised if you're already following them):
- Data parallellism is your only God. It will feed your long, starving pipelines, hide your latencies and vectorize your computation.
- Embrace Stream computing, love Map/Reduce, study ParallelFX, OpenMp and CUDA and finally implement and use a ParallelFor primitive with a Thread Pool.
- That shall be the only primitive you routinely use for multithreading.
- Avoid explicit threads.
- Avoid explicit locks/syncronization primitives.
- Avoid all forms of data sharing.
- Enforce the non sharing rule. Use smart pointers and reference counting base classes. Assert that shared smart pointers are acquired/released always on the same thread.
- You don't need locks in your libraries, because you don't want to share. Re-entrancy is the key, if you can't achieve that, just say it, don't lock. Locks do NOT compose.
- You don't need exotic parallel data structures or syncronization primitives, because you're not sharing.
- The only time you have to think about syncronization shall be when communicating with the GPU.
- You might not have more than explicit threads than the fingers of one hand.
- Any explicit thread will only depend on one other (i.e. ai->game->rendering).
- In runtime there will be only one syncronization point, when communication happens by passing an updated buffer of data to the thread that it needs it. That should be done with a queque. Locks contention is practically absent.
- Communication is always mono-directional, one thread always writes stuff for the dependant thread that always reads it (or, one thread will own only one side of the communication queque).
Follow those rules, and you'll be happy looking at the guy who has implemented that super cool lockless actor based future system working all the weekends...
I will be writing more on this, focusing on how to make the render engine parallell, but it will take a while because my plan is to describe and then publish the code of an old (but not too old) engine I wrote for a series of articles that had to appear on a magazine, but never did...