C0DE517E: Does it scale? Game threading laws part two

28 April, 2009

Does it scale? Game threading laws part two

Someone is selling you a solution to a problem. A problem that is general, it's not something you want to solve just now, but now and tomorrow and the day after.

In that scenario, we should always ask this question first: does it scale? Note that that is not a question we are in general too concerned with.

Usually we want to know something else, we want to know if it is fast. And we care about scalability only marginally, we know that it's a good property but only if it does not have any overhead, if it does not make what we are doing right now slower (or more complicated, if you see it in a more general way).

But let's talk about the situations where we do care about that. Tools for example, scalability of tools is fundamental... But tools are not sexy, so let's say, threads...

I recently blogged about that, in a nutshell suggesting that the "right" solution is the parallel-for in a thread pool, or in general to look at data parallelism and stream computing. And I said those were "laws" to empathize that you shouldn't be too much concerned about fancier tools. Why?

Because every time I see those fancy tools, as we're talking about perspectives about the future, frameworks to enable our computation to scale... I ask that question! Now beware, is not that people that work on parallel data structures, lock-free or wait-free algorithms, immutability, software transactional memory, actors, futures, uniqueness types... It's not that they're not concerned with scalability! They are, and a lot, parallel computing is all about that.

But here comes the tricky part... scalability is limited by the bottleneck that comes to you first. So you have to identify a bottleneck... in the future! That can be incredibly challenging, your best bet is to just simulate your future workloads... but simulating them on a future hardware that does not exist is complicated! Scalability is a big problem to tame.

Regarding to threading tho, we have a lot of evidence that the main problem will be (well really, already is) memory... And we have plenty of examples... GPUs are a great picture of what the future can be... REYES rendering was invented in the eighties, and already embraced parallelism via coherency...

Now sometimes, as a rendering engineer, you hear stuff like "raytracing is easy to parallelize because each ray is independent from the others". So does that scale? It would seem so, we have independent rays, so we don't need locks! Independent rays... true, but you have to be very worried when you hear that. Independent? Do you mean that I don't have any coherency? And what about the memory? I don't care too much about locks, I care about bandwidth and latency!

Of course that's again something well known, in fact, coherent raytracing is not an new idea at all, and now most raytracers work on ray packets (that hopefully will be replaced with something better, allowing coherent traversal with independent rays, i.e. n-ary trees) but that's not the point...

So back to threading... those "laws" might seem restrictive, but they are not., when you factor in scalability, and when you realize that at least in our context, is all about memory. Really, everything else does not matter much, so it boils down to one thing: data parallelism.

Everything else is not only complicated, but it does not work at all! Now I don't mean that STM does not work, of course you can map data parallelism to that, or implement actors for example in a data parallel way (that is a very smart idea if you're parallelizing your game... if your actors are simple and you have much more instances than types then you don't even need fibers or other kinds of lightweight threading for them, no generators or coroutines, just a parallel-for over a list of message queques)...
But the underlying idea, and the one your framework should embrace, is still modeled with data-parallel threading...

3 comments:

Anonymous said...: http://book.realworldhaskell.org/read/software-transactional-memory.html; April 29, 2009 at 6:12 AM
Anonymous said...: "Everything else is not only complicated, but it does not work at all!", yeah right on, anything which uses any type of global memory allocation can be dropped into that bucket. The scale of this logic gets scary given that a majority of programs use some type of global memory allocation...

One other thing which should be added is that job ordering becomes highly important when going massively parallel in order to factor out fine grain locking (or collisions in atomic operations) into coarse job scheduling.; April 29, 2009 at 9:16 AM
DEADC0DE said...: Well in practice is not hard. I've just finished a game that is one of most multicore friendly game we have across all our studios. And it wasn't hard at all.

Threading is hard. It's incredibly complicated, more than most people think. Luckily nothing of that matters for us. Even if tomorrow someone comes and makes STM perfect, with no overhead, and lets us have one actor per class, all with no overhead, it won't really matter much.

You can thread everything and run slower than ever, where do I have time even only to fetch the code to execute, if I'm always jumping around? It's not a problem of locking or not, it's a problem of memory... We're not writing server applications, we're not IO bound, there is nothing that matters more than that...

Luckily, we have plenty of data to process. We spawn a very few threads, two are the important ones, game and rendering. The communicate via a buffer and you can call it an immutable data structure, if you want (well that's a little bit unfair, immutable data structures are like that but with a sort of copy on write, neat stuff, but hardly relevant). We have a threadpool, a lot of data with some dependencies, and that's it... Of course we can do more, it's really a matter of rethinking a lot of processes in a more stream friendly way, but it's not a huge deal.

Again, there are some "advanced" techniques that are fancy and can be useful, fibers can be useful to further mask latency, if your data access pattern in a given computational kernel is not so linear, agents can be nice for some types of games, but that's not the main point...; April 29, 2009 at 9:35 PM

Search this blog

28 April, 2009

Does it scale? Game threading laws part two

3 comments: