Search this blog

28 April, 2009

Does it scale? Game threading laws part two

Someone is selling you a solution to a problem. A problem that is general, it's not something you want to solve just now, but now and tomorrow and the day after.

In that scenario, we should always ask this question first: does it scale? Note that that is not a question we are in general too concerned with.

Usually we want to know something else, we want to know if it is fast. And we care about scalability only marginally, we know that it's a good property but only if it does not have any overhead, if it does not make what we are doing right now slower (or more complicated, if you see it in a more general way).

But let's talk about the situations where we do care about that. Tools for example, scalability of tools is fundamental... But tools are not sexy, so let's say, threads...

I recently blogged about that, in a nutshell suggesting that the "right" solution is the parallel-for in a thread pool, or in general to look at data parallelism and stream computing. And I said those were "laws" to empathize that you shouldn't be too much concerned about fancier tools. Why?

Because every time I see those fancy tools, as we're talking about perspectives about the future, frameworks to enable our computation to scale... I ask that question! Now beware, is not that people that work on parallel data structures, lock-free or wait-free algorithms, immutability, software transactional memory, actors, futures, uniqueness types... It's not that they're not concerned with scalability! They are, and a lot, parallel computing is all about that.

But here comes the tricky part... scalability is limited by the bottleneck that comes to you first. So you have to identify a bottleneck... in the future! That can be incredibly challenging, your best bet is to just simulate your future workloads... but simulating them on a future hardware that does not exist is complicated! Scalability is a big problem to tame.

Regarding to threading tho, we have a lot of evidence that the main problem will be (well really, already is) memory... And we have plenty of examples... GPUs are a great picture of what the future can be... REYES rendering was invented in the eighties, and already embraced parallelism via coherency...


Now sometimes, as a rendering engineer, you hear stuff like "raytracing is easy to parallelize because each ray is independent from the others". So does that scale? It would seem so, we have independent rays, so we don't need locks! Independent rays... true, but you have to be very worried when you hear that. Independent? Do you mean that I don't have any coherency? And what about the memory? I don't care too much about locks, I care about bandwidth and latency!

Of course that's again something well known, in fact, coherent raytracing is not an new idea at all, and now most raytracers work on ray packets (that hopefully will be replaced with something better, allowing coherent traversal with independent rays, i.e. n-ary trees) but that's not the point...

So back to threading...
those "laws" might seem restrictive, but they are not., when you factor in scalability, and when you realize that at least in our context, is all about memory. Really, everything else does not matter much, so it boils down to one thing: data parallelism.

Everything else is not only complicated, but it does not work at all! Now I don't mean that STM does not work, of course you can map data parallelism to that, or implement actors for example in a data parallel way (that is a very smart idea if you're parallelizing your game... if your actors are simple and you have much more instances than types then you don't even need fibers or other kinds of lightweight threading for them, no generators or coroutines, just a parallel-for over a list of message queques)...
But the underlying idea, and the one your framework should embrace, is still modeled with data-parallel threading...

20 April, 2009

Economy is not that bad...

...if we still have money to waste on crappy tech: one, and two. My prediction: in a couple of months they'll be both pretty much dead.

Now some more interesting links, a lil bit of old and less-old school hacking, enjoy.

http://aggregate.org/MAGIC/
http://graphics.stanford.edu/~seander/bithacks.html
http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html
http://home.hejl.com/HD/

26 March, 2009

Garbage collection, again

Recently I've discovered this forum, Molly Rocket. A friend of mine told me to have a look, people were trashing one of my articles! Kinda cool I thought, but unfortunately it turned out that it was mostly about misunderstandings due to my bad writing.
Well, that's not the point, the point is that in that forum, there are plenty of smart people, helping less experienced ones. I stumble in a post about garbage collection, and write the usual stuff about it and its merits, compared to explicit allocations and reference counting. And... of course, I get told that those are just the usual arguments, not enough to persuade the big guys.
Ok so, let's write again... about garbage collection!

So let's picture the usual scenario. It's C++ and so, you start writing code to manage memory (at this point, you're dealing with memory, so hopefully you already have a coding standard, maybe some tools to enforce it, and you know that by default, C++ is wrong).

You probably start writing your custom allocators, to help debugging, usual stuff. You wrap the default allocator with some padding before and after to detect stomps, tracing functionality to detect leaks and fragmentation, handling alignment and so on.

Explicit allocation do no compose, they don't really work with OO, so you implement reference counting, maybe deriving your classes from a reference counted base, adding smart pointer classes (and hoping that you don't have to interface it with another similar system in a third part library).

Ok you're set, you start writing your game. And fragmentation is a problem! Ok, no fear, everyone faced that, you know what to do. You start to make separate memory pools, luckily you already knew about that, and tagged all your allocations with a category: rendering, animation, ai, physics, particles. It was so useful to enforce memory budgets during the project!
So now it's only a matter of redirecting some of this categories to different pools, possibly different allocation strategies.
Off we go... and it works! That's how everyone is solving the problem...

But! It's painful!
And this is the best scenario were you have total control and you did all the right choices, and you don't have to link external libraries that use other strategies.
You have to size all your pools for the worst case scenario. And then streaming comes in the equation. Streaming rocks right?
You need to have more and more fine control over allocations, splitting heaps, creating class pools.

You realize that what it really counts is objects lifetime. The most useful thing is to classify allocations as per frame (usually a linear allocator that automatically frees everything at the end of the frame, double buffered if the memory has to be consumed by the GPU...), short lived (i.e. temporary objecs), medium lived (i.e. level resources) and permanent.

You realize that if you go on and on, and split allocations so every class has its own pool, and you size the pools for the worst case you're wasting a lot of memory, and in the end you don't need to manage allocations anymore. You can simply use a circular pool, and overwrite old instances with new ones, if the pool is correctly sized, living instances won't get overwritten ever!

Something wrong. And what's with the idea of object lifetime anyway? Is there a better solution? A more correct, generic answer? Something that should be used as a better default? Well well...

25 March, 2009

Optimization again, from Steve Yegge

Everything Yegge writes is well worth a read, and most of the times, I agree with what he writes. The following is extracted from this talk.

“OK: I went to the University of Washington and [then] I got hired by this company called Geoworks, doing assembly-language programming, and I did it for five years. To us, the Geoworkers, we wrote a whole operating system, the libraries, drivers, apps, you know: a desktop operating system in assembly. 8086 assembly! It wasn't even good assembly! We had four registers! [Plus the] si [register] if you counted, you know, if you counted 386, right? It was horrible.

I mean, actually we kind of liked it. It was Object-Oriented Assembly. It's amazing what you can talk yourself into liking, which is the real irony of all this. And to us, C++ was the ultimate in Roman decadence. I mean, it was equivalent to going and vomiting so you could eat more. They had IF! We had jump CX zero! Right? They had "Objects". Well we did too, but I mean they had syntax for it, right? I mean it was all just such weeniness. And we knew that we could outperform any compiler out there because at the time, we could!

So what happened? Well, they went bankrupt. Why? Now I'm probably disagreeing – I know for a fact that I'm disagreeing with every Geoworker out there. I'm the only one that holds this belief. But it's because we wrote fifteen million lines of 8086 assembly language. We had really good tools, world class tools: trust me, you need 'em. But at some point, man...

The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?

This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.

Because we couldn't see how the system worked anymore!

Small systems are not only easier to optimize, they're possible to optimize. And I mean globally optimize.

So when we talk about performance, it's all crap. The most important thing is that you have a small system. And then the performance will just fall out of it naturally.”

P.S. That talk is about dynamic languages, it shows some pretty cool stuff. Without any doubt the progress JavaScript compilers are doing is incredible. There's plenty of neat stuff, and you can even do graphics with it.
But I have to say, I would reccomend nothing of those for games, especially console ones, especially outside scripting realm (that's to say, asset loading, and stuff that is mostly about parameters set with a logic, than code executed thousands of times per frame, and for those tasks, look no further than Lua, it's really the best solution as of now, even if javascript JSON is tempting).
I do believe that the future, the near future when we'll dump C++ for good and move to another language + C for low lever stuff, will not be about dynamic languages but about more modern static ones.

18 March, 2009

At last! Double-click highlighting in VS!

I’ve finally found a vs plugin that emulates the word highlighting of the excellent (and opensource) Notepad++.

Basically, whenever you doubleclick to select a word, all the occurrences of that word in the sourcecode are automatically highlighted.

Download and install it now (rockscroll)

Oooh, just noticed, this is my 100th post! Ok, so here's a bonus link:
http://www.vis.uni-stuttgart.de/~hopf/pub/Fosdem_2009_r600demo_Slides.pdf :P

Update: oh shit, RockScroll rocks so much indeed, that now I miss its code thumbnails when I have to use notepad++! Aaaargh!



Update: new project! http://code.google.com/p/metalscroll/