Search this blog

06 December, 2008

Which future? (plus, a shader optimization tip)

Update: poll closed. Results:

Cpu and Gpu convergence - 36
One to one vertex-pixel-texel - 7
Realtime raytracing - 14
More of the same - 9
We don't need any of that - 14

We have a clear winner. To my surprise, the realtime raytracing hype did not have a big influence on this, nor Carmak with its ID Tech 5 and "megatextures" (clipmaps) or John Olick with his voxel raycaster... IBM Cell CPU and Intel's Larabee seem to lead the way... We'll see.
---
Which future? Well, in the long term, I don't know. I don't know anything about it at all. In general our world is way too chaotic to be predictable. Luckily.


I'm happy to leave all the guesswork to hardware companies (this article by AnardTech is very intresting).

But we can see clearly the near future, there are a lot of technologies and ideas, trends, promising research... Of course we don't know which ones will be really succesful, but we can see some options, and even make some predictions, having enough experience. Educated guesses.

But what is even more intresting to me, is not to know which ones will succeed, that's not too exciting, I'll know anyway, it's a matter of time. I'd like to know which ones should succeed, which ones we are really waiting for. That's why I'm going to add a poll to this blog, and ask you.

I'm not writing a lot, the game I'm working on is taking a lot of my time now, so this blog is not as followed as it used to be, so running a poll right now could be not the most brightest idea ever. But I've never claimed to be bright...

Before enumerating the "possible futures" I've choosen for the poll, a quick shader tip, so I don't feel too guilty of asking without giving anything back. Ready? Here it comes:

Shader tip: struggling to optimize your shader? Just shuffle your code around! Most shader compilers can't find the optimal register allocation. That's why for complicated shaders, some times even removing code leads to worse performance. You should always try to pack your data together, to give the compiler all the possible hints, not to use more than you need (i.e. don't push vector3 of grayscale data) and follow all those kind of optimization best practices. But after you did everything right, try also just to shuffle the order of computations. It makes so much difference that you'll better structure your code so you have many separate blocks with clear dependencies between them...

Ok, let's see the nominations now:

Realtime raytracing.
Raytracing is cool. So realtime raytracing is cool squared! It looks good on a ph.d. thesis, it's an easy way to use multithreading and an intresting way to use SIMD and Cuda. You can implement it in a few lines of code, it looks good in any programming language, lets you render all kinds of primitives, is capable of incredibly realistic imagery.
It's so good that some people started to think that they need it, even if they don't know really why... There's a lot of hype around that, and a lot of ignorance (i.e. the logarithmic-complexity myth, raytracing is not faster than rasterization not even in the asymptotic limit, there are some data structures, applicable to both, that can make visibility queries faster under some conditions, even if building such structures is usually complicated).

CPU and GPU convergence, no fixed pipeline. Larrabee, the DirectX11 compute shaders, or even the Playstation 3 Cell processor... GPUs are CPUs wannabe and viceversa. Stream programming, massive multithreading, functional programming and side effects. Are we going towards longer pipelines, programmable but fixed in their role, or are we going towards a new, unified computing platform that could be programmed to do graphics, among other things? Software rendering again? Exciting, but I'm also scared by its complexity, hardly I can imagine people optimizing their trifillers again.

One to one vertex-pixel ratio (to texel).
Infinite detail, John Olick (ID software, Siggraph 2008) and Jules Urbach
(ATI Cinema 2.0 and OTOY engine) generated a lot of hype as they used raytracing to achieve that goal. But the same can be done right now with realtime tassellation, this is also the way that DirectX 11 took. REYES could be intresting too, for sure the current quad based rasterization pipelines are a bottleneck with high geometrical densities.
Another problem is how the generate and to load the data for all that detail. Various options are available, acquisition from real world data and streaming, procedural generation, compression, each with its own set of problems and tradeoffs.

Just more of the same. More processing power, but nothing revolutionary. We don't need such a revolution, there's still a lot of things that we can't do due to lack of processing power and the rising amounts of pixels to be pushed (HD, 60hz is not easy at all). Also, we still have to solve the uncanny valley problem, and this should shift our attention, even as a rendering enginners, outside the graphics realm (see this neat presentation of NVision08).

And the last one, that I had to add to have something to vote by myself:

Don't care, we don't need more hardware.
The next revolution won't be about algorithms or hardware, but about tools and methodologies. We are just now beginning to discover what are colors (linear color rendering, gamma correction etc), we still don't know what normals are (filtering, normalmapping), and we're far from having a good understanding of BRDF, of its parameters, of physical based rendering. We make too many errors, and add hacks and complexity in order to correct them, visually. Complexity is going out of control, we have too much of everything and too slow iteration times. There's much to improve to really enter a "next" generation.


4 comments:

levelofdetail said...

It is my experience, at least with ATI hardware/drivers, that premature optimization at the shader level has no or negative performance impact. It is even worth trying to turn off fxc optimizations and examining registers used, alu:tex ratios, etc using something like GPUShaderAnalyzer. I would definitely recommend that you never try to vectorize yourself in the shader.

But I have heard of getting better compiled shaders by rearranging instructions on other hardware/drivers.

DEADC0DE said...

lod: Reordering is unpredictable on PC as the final shader depends on the driver and the graphics card. On consoles on the other hand is not.
I don't really understand your comment on vectorization. What do you mean? Properly arranging operations into vectors is a good practice, and also hardware vendors recommend it.

Robin said...

future graphics generations. I agree that the next step is algorithmic - realtime GI isn't about more flops, it's about directing the flops we've got to the right buckets. Surfels are just the beginning, and Surfels don't care whether they're rasterized or raytraced. It's all about the visibility form factor and the cosine term.

Ultrano said...

My uneducated guess:
Somewhat more of the same, but with improvements on memory-bandwidth.
- Larger caches
- single-cycle copy of vtx-attrib onto varyings.
- maybe tiles, to do more tri-setups than the current 1-per-cycle.
- more aggressive z-culling, keep the depth-buffer in cache, reduce units back from 3x3 to 2x2; facilitate fast rough occlusion queries
- introduce StaticMesh objects, that contain VB/shader/texture/OBB binds - and do rough occlusion-culling with the OBB before any vb/sh/tex setup and rasterizing.
- let textures in textureArrays have different sizes and formats.