Search this blog

27 March, 2011

Stable Cascaded Shadow Maps - Ideas

Stable CSM intro

A "stable" cascade is nothing else than a fixed projection of your entire world on a giant texture, of which we render a fixed window that fits around the projection of view frustum each frame, making sure that we always slide this fixed window by an integral number of texels each frame. 
As we have to be sure that the "window" will fit the frustum in all cases, to determine its size a way is to fit the frustum in a sphere and then size the window using the radius of such sphere.

Implementing CSM, especially on consoles is not that easy. For an open world game you'll notice that you need quite a lot of resolution to get decent results, and cascade rendering can become quickly a problem. On 360 from what I've seen, resolving big shadowmaps from EDRAM to the shared memory is very expensive too, so it becomes important to pack the shadowmaps aggressively. Some random "good" ideas are:
  • Render shadows to a deferred shadow buffer, enabling the possibility of rendering one cascade at a time. It also makes way easier to cross-fade cascades and possible rendering shadows at half-res (that is a good idea... upsampling with bilateral filtering or similar). It's possible to use hi-stencil and hi-z (on ps3, also depth range) in various ways to accelerate this.
  • Tune cascade shadow filtering to try to match filter size across different resolutions (that's to say, filter less far cascades).
  • Shadow a pixel using the best cascade that contains that pixel, instead of relying of the frustum split planes (this makes a bit harder to fade between cascades, but not too much). Use scissors or clipping planes to avoid rendering stuff that is already rendered in previous cascades into more coarse ones. Microsoft has a pair of nice articles.
  • Compute light near-far planes to be tight around the frustum but avoid culling objects before the near plane (a.k.a. "pancake": clamp depth in the vertex shader, it's not a big deal as the projection is orthographic but it can screw self shadowing of such clipped objects, you need to give a bit of "buffer" space to the near plane). The downside is that you get more raster pressure as the hi-z will not reject the objects that are compressed on the near plane... you can solve that either by giving a small linear range for the pancaked objects or marking and using stencil/hi-stencil where they get drawn.
  • Cull small objects aggressively from distant cascades. Avoid rendering objects in far cascades if they were rendered completely in the previous ones.
  • Pack shadowmaps! Do not render things behind the frustum and maximize the area in front of it! This and this articles have some good ideas. You can also pack two shadowmaps into a two-channel 16-bit target if double-depth fill is not giving you a big speedup.
Still after doing all this, you might end needing more performance...

Crysis

I'm playing Crysis 2. Nice game, starts a bit weak with a too forced story but it improves A LOT later on. Graphically is great as I'm sure you've all noticed, ok long story short, I still probably love Modern Warfare and Red Dead a bit more but it does not disappoint. Somewhat the art direction on Crysis 2 looks a bit "hyperrealistic" to me most of the times with very soft and exaggerated ambient fill, even more accentuated by the huge bloom. But well, technically is impressive and it is surely a good game.

Now of course if you're a rendering engineer, first thing you do with such a game is to walk slowly everywhere and check out the rendering techniques. And so did I. Some notes:
  • Lods pop noticeably, small objects are faded out pretty aggressively. Still during "normal" gameplay it's not too evident.
  • DOF is pretty smart. It seems to filter with a "ring" pattern that I guess is both an optimization and a way to simulate bokeh. It looks like what you get from a catadioptric mirror lens, but it's reasonable also because most lenses will have a sharp out of focus either before of after the focal plane, as the bokeh shape of one is the inverse of the other (so if a lens has a nice gaussian-like out of focus after the focal plane, it will get an harsh negative-gaussian one before). It also manages to blur correctly objects before the focal plane, kudos for that.
  • Huge screenspace bloom/lens flares.
  • Motion blur (camera only?)
  • Decent post-filtering AA, even if with some defects (ghosting of objects in motion), not the best I've seen but good.
  • Shadows. Stable CSM. A weird circular filter is applied to them. No fading between cascades. A dithering pattern that seems to be linked to the light space. Far cascades are updated every other frame.
Ok. So the last item caught my attention. How to do that? Well, it's not that hard if you think about it. If you observe the update of the CSM, you'll notice that even when you rotate the view your far cascades move only by a few texels, so we could just add a bit of space there and assume that updating these cascaded every other frame won't create problems.

Caching

But what if we want to be accurate? Well it turns out it's not really hard at all! We know what is the window we rendered last frame, and where we should render this frame. Most of the new frame is already rendered in the last one, we could just shift the data in the right place. 

It turns out, we don't even need that, if we want to apply this incremental update only once and then re-render, we can just shift our "zero" of the shadowmap uv and wrap. We still need to render the new data and resolve it, but that is only a few texels wide border! Even culling the objects to render only the ones that fall in that border is really trivial.

Really, we could do an incremental update for every cascade... forever! If it wasn't for two things: moving objects and the fact that we can't fix our cascade (light) near/far z, but we usually to maximize the resolution need to fit it each frame (or so).

We could alleviate the latter problem by having the "shifting" shader also re-range the "cached" last frame data into the new near/far range. The moving objects one can be solved by having them rendered into a separate buffer or a copy of the buffer. Both solutions though need more memory and bandwidth (resolve time on 360) so they can be good only if that is not already a major bottleneck (that's to say, if you packed your cascaded well).

26 March, 2011

Debugging DirectX9 is so stressful!

I've been working on console games for the past five years, so I don't know much about the PC tech these days. Now I'm on a console/pc title and I just had to debug the PC build.

Oh. My. God.

I got so stressed I actually almost got sick that night. And then yes, it turned out to be a really small bug that I could totally have debugged easier without hooking these tools anyway.

Pix for Windows is a joke, but still it's the best tool I tried... It's a bit better on DX10/11 (faster refresh)

NVidia perfHud is rather useless (even if I hear it's better than Pix for profiling, which I believe as Pix currently is unable to do any profiling at all) and the Intel GPA thing did not seem to really work at all (it took 20 minutes just to load the capture and it gave me some weird results, even if it looks better than Pix, it's promising I guess) - Update: newer versions of GPA seem to work fine, and actually it's now my preferred DX9 tool!

ApiTrace is a new tool which might be good... I had a look at one of the early versions which did not work for DX9, now it seems to have added support for all APIs...

ATI has a Gpu PerfStudio thing which is decent, but it deprecated DX9, the current version is for DX10/11 only.

For some things I would even say the old 3d reaper (or ripper) and DXexplorer are better tools!

I really fucking hope that the new Nvidia Parallel NSight (a.k.a. Nexus) and ATI Gpu PerfStudio 2 are great, I could not try them as they're dx10/11 only and I'm currently on dx9. Overall it really shows how much the industry is committed to PC these days...

15 March, 2011

DOF Test

Scaled to 300% to ease viewing
(click to enlarge)
It does motion blur as well. Guess how many ms on 360 :)

09 March, 2011

Do you have "failed builds"?

Sometimes we are so used to our industy workflows that we "accept" things that are terribly wrong without questioning them anymore. It's like when some medias start broadcasting false facts, or misusing words, and slowly the wrong becomes right.

What does it mean that a "build failed"? An entire build fails, catastrophically? Not even a single source file compiled? Or maybe it's only a bit of the frontend that did not compile? Or a single art asset? It's like saying that a car does not work just because the air conditioning does not turn on.

Ban the "broken build" concept. Ban "game crashes". The audio system failed? Well I guess we have a build of the game without the audio. The rendering crashes? Well I guess we have to disable that (and maybe use a minimal "debug" version instead i.e. animation skeletons and collision meshes).

A game is a complex collection of components. Then why if just one component does not work, we consider the entire thing "bad"? Decouple, my friend!

07 March, 2011

Tell the internet that you're not a moron...

...because it will assume you are. Especially if you work on a franchise iteration and you change anything.

Fight Night Champion went from 60fps to 30fps. The most generous reaction among professional reviewers is that it was a step back done in order to have better lighting and graphics. Most of the general internet public (or the part of it that is vocal, on the forums and comment sections of websites) just took it as a downgrade impacting both graphics and gameplay.
Screenshot stolen from:
http://imagequalitymatters.blogspot.com/2011/02/tech-analysis-fight-night-champion-back.html

Of course nothing of this is true. Fight Night Round 4 was already a game with very highly rated graphics, there would have been no need to impact the fluidity of the gameplay in order to have even better lighting. 

The lighting was designed from day zero to be able to run at 60fps, going to 30 in gameplay does not really bring us much as the worst-case performance scenario were the non-interactive sequences, that were 30fps in Round 4 too.

At a given point during pre-production, we started building tests for 30fps gameplay, first videos in after-effects (adding motion blur via optical flow), then after these proved to be interesting we went for a prototype in game and blind testing.

Most of our testers and producers likes the gameplay of the 30fps with motionblur version better than the 60fps one. Note that the game itself still runs at 60 (120hz for the physics). Even our users think the same, most did notice that now the punches "feel" more powerful and the game more "cinematic". 

The motion blur implementation itself is extremely good, blurring correctly out of the skinned characters silhouettes. To the point that when in some early screenshots we photoshopped in the blur effect, we were not really able to achieve an as good effect as the real in game one.

Still, when you release the technical details that no one really understands, people just assume that you're a moron and they know better. They like the feeling of the new game better, but they hate the 30fps rating... This is just an example, but it happens all the time for many feature that you change.

Bottom line? Change things, but be bold about them and take responsibility. Show what you've done and why, show people that you're not a moron, that you tried everything they are thinking to do plus more and made some choices for some real, solid reasons. Otherwise internet will just assume you're a moron...

P.S. This is just my view as a developer that cares about quality and makes choices in order to maximize quality. To say the truth I don't think that quality matters to a company per se. What it matters is the kind of qualities that sell. That's to say, you can do all the blind testing in the world and be 100% sure that a given choice is the best quality wise, then you go out and people just don't buy it, not always quality sells, some time the worst choice is the most popular (and by popular I mean in sales, not in the internet chatter, I.E. see how much hate there is for Call of Duty on the net and how much does it sell). Now for that side of the things, that's to say marketing, I don't know anything. Obviously FNC shipped at 30fps so marketing thought it was ok but I don't have any data nor experience. This other blog post might shed some light...