The more I delve into the problem, the more I understand that there is no unique solution. But there are a couple of models that are very intresting and usable.
Actors and futures, as seen in some nice languages.
High-level parallelism, usually parallel version of functional iteration constructs on data. That's one part of the approach that is very frequently used in games (the other part being explicit threads for some main game components). A good example of this approach is the multithreaded transition of the source engine at Valve software (see Dragged Kicking and Screaming: Source Multicore)
Consumer-producer. Most of the times, in a render engine, you have two kinds of C-P situations to deal with:
Standard producer-consumer. I.E. a mesh data instance built in CPU for a crowd system. This is the consumer-producer problem, the GPU can will consume the data, but we don't know how many frames ahead (of course, there's usually a cap, that can be implemented by blocking the allocation function) the renderer thread can be. AllocateFrameData(framenumber, size) and FreeAllFrameData(framenumber) is a nice API to have.
Indexed producer-consumer. I.E. to decouple simulation from rendering. In that case, we can have a fixed buffer of N entries and a global timestamp. Simulation locks a buffer in write mode, at time T, overwriting data of the least recently written buffer that is not locked. Rendering locks a buffer in read mode at time T, and the nearest buffer to the requested time, that is not locked is returned to it. This is quite like a producer-consumer, but not all the produced data will be consumed, a fixed buffer is allocated, that does not lock (it could, if there are more concurrent threads than buffers, but not because the producer is too much ahead of the consumer) but where the data can be overwritten before being consumed.
Lockless (lockfree) data structures. Not generally applicable, but they can be a powerful tool, when they can be used. Even if they are not one of the most useful techniques, it's still something to be known, with all thier drawbacks. The indexed P-C described above can be made lockless for example, stacks are an easy data structure that can be lockless, reference counting also can be lockless, but not in every kind of implementation (AFAIK if you avoid having an extra level of indirection, and keep the counters in the classes by deriving them with a common reference counted base object, then you can't have implement the lock free version on most hardware platforms, that do not come with atomic double compare and swap, in that case a spinlock is the most reasonable solution, usually)
Standard producer-consumer. I.E. a mesh data instance built in CPU for a crowd system. This is the consumer-producer problem, the GPU can will consume the data, but we don't know how many frames ahead (of course, there's usually a cap, that can be implemented by blocking the allocation function) the renderer thread can be. AllocateFrameData(framenumber, size) and FreeAllFrameData(framenumber) is a nice API to have.
Indexed producer-consumer. I.E. to decouple simulation from rendering. In that case, we can have a fixed buffer of N entries and a global timestamp. Simulation locks a buffer in write mode, at time T, overwriting data of the least recently written buffer that is not locked. Rendering locks a buffer in read mode at time T, and the nearest buffer to the requested time, that is not locked is returned to it. This is quite like a producer-consumer, but not all the produced data will be consumed, a fixed buffer is allocated, that does not lock (it could, if there are more concurrent threads than buffers, but not because the producer is too much ahead of the consumer) but where the data can be overwritten before being consumed.
Lockless (lockfree) data structures. Not generally applicable, but they can be a powerful tool, when they can be used. Even if they are not one of the most useful techniques, it's still something to be known, with all thier drawbacks. The indexed P-C described above can be made lockless for example, stacks are an easy data structure that can be lockless, reference counting also can be lockless, but not in every kind of implementation (AFAIK if you avoid having an extra level of indirection, and keep the counters in the classes by deriving them with a common reference counted base object, then you can't have implement the lock free version on most hardware platforms, that do not come with atomic double compare and swap, in that case a spinlock is the most reasonable solution, usually)
No comments:
Post a Comment