When you have sorted out the concurrency problems for CPU threads, and established a way to safely generate data in CPU for the GPU, set up your object pipelines, you will end up hitting the problem of multithreaded draw calls.
At the moment, such a thing is not possible in any mainstream platform, you have to issue all the draw calls from a single thread that owns the rendering device. The usual solution to this problem is to bake command buffers (display lists in openGl terminology) on the non-rendering threads and then pass them to the rendering one that draws them.
The problem with this approach is that you can't sort/optimize/wathever primitives between threads, all the draw calls are baked and just copied in the main ring buffer. Of course you can always organize your rendering objects in lists that are bound to a given renderbuffer/pass and process those lists in parallel, doing so you're pretty sure that you can't do more optimization than the ones you can perform on a single list. A problem arises when you have a few big lists and all the other ones are much smaller, in that case, processing one list per thread does not give you an optimal load balancing. So other solutions could be more employed, depending on the context.
A very common one is to have some higher level rendering data, usually meshes with materials and all the context needed to do draw that data. Those primitives/contextes are added by the various threads in a render queue that is then sorted (by rendertarget, passes, materials etc...) and generates state changes and draw calls. The state API is hidden and used only by the rendering thread. Using a lockfree stack helps.
Another interesting solution is the one employed by Capcom's MT engine (Lost Planet). It's like a cross-platform command buffer API, where the commands have hints on their ordering (rendertarget, pass, etc) and are issued in parallel by multiple threads, then sorted in each thread, gathered and merge-sorted togheter in the rendering thread and then converted in actual draw calls. This is somewhat an hybrid approach between an high level submission API and a native commandbuffer API, when you can still do every kind of inter-thread optimization, but in a very fast way, without hiding the state API and doing only a simple translation of the commands to the native API ones in the main thread.
Uhm, I don't quite know what platforms are major for you, but since command buffer is just memory resource, you can easily fill it from multiple threads provided you properly synchronize them. If you don't want to do it, you can fill separate command buffers and issue jumps in the main command buffer at the end of the frame - either way, it's possible (of course, the rendering order is not exactly specified if the lock timing changes, but if you want to have the perfectly consistent drawing order, you can't use concurrency for drawing even with the ideal HW).
Well I don't want to be rough, but maybe reading the full post could have helped, expecially where I say "...The usual solution to this problem is to bake command buffers..." and then "...The problem with this approach...".
And the only problem with multitasking anyway is exactly having to sych threads to access shared memory resources, so the idea of filling a shared command buffer from many threads by synching them is really lame (and you don't even need a command buffer if you want to sync, you can just acquire the rendering device to the thread that need to issue the drawcalls in that case).
Last but not least, that idea also, as you notice, will end up with a non deterministic rendering order, something that in my code, I would really like to avoid.
My point is, again, if you want a deterministic draw order, there's no way you can work without main render thread, that finally synchronizes everything, so the whole concurrency thing here is slightly moot.
I won't comment on other insults, you're sooo nice.
I'd like to know more about this important topic. It's OK that there is a little heat, it makes some light.
Graphics Hardware '08
How will artists get to use all those transistors?
"How will artists get to use all those transistors?"
that's an intresting question... I will write a post sometime
Post a Comment