Search this blog

Showing posts with label Rendering tutorials. Show all posts
Showing posts with label Rendering tutorials. Show all posts

10 July, 2016

SIGGRAPH 2015: Notes for Approximate Models For Physically Based Rendering

This is a (hopefully temporary) hosting location for the course notes Michal's Iwanicki and I drafted for our presentation at the Physically Based Shading Course last year.

I'm publishing them here because they were mentioned a lot in our on-stage presentation, in fact, we meant that mostly as a "teaser" of the notes but we are still not able to bring them out of the "draft" stage, despite the effort of everyone involved (us and the course organizers, to whom goes our gratitude for the hard work involved in making such a great event happen).

It also doesn't help that in an effort to show an overall methodology, we decided to collate more than a year of various research efforts (which happened independently, for different purposes) into this big document. I still have to work more on my summarization skills.

02 July, 2016

Unity 101 - From zero to shaders in no time

Disclaimer:

I'm actually no Unity expert, as I started to look into it more seriously for a course I taught, but I have to say, it looks like it could be right now one of the best solutions for a prototyping testbed. 

This post is not meant as a rendering (engineering) tutorial, it's written for people who know rendering, and want to play with Unity, e.g. to prototype effects and techniques.

Introduction:

I really liked Nvidia's FXComposer for testing out ideas, and I still do, but unfortunately that product has been deprecated for years. 

Since then I started playing with MJP's framework by adding functionality that I needed (and later he added himself), and there are a couple of other really good frameworks out there by skilled programmers, but among the full-fledged engines, Unity seems to be the best choice right now for quick prototyping.

The main reason I like Unity is its simplicity. I can't even begin to get around Unreal or CryEngine, and I don't really care about spending time learning them. Unity on the other hand is simple enough that you can just open it and start poking around, which is really its strength. People often obsess too much on details of technology. Optimization and refinement are relatively easy, it's the experimental phase which we need to do quickly!

Unity basics:

There are really only three main concepts you need to know:

1) A project is made of assets (textures, meshes...). You just drag and drop files into the project window, and they get copied to a folder with a bit of metadata describing how to process them. All assets are hot-reloaded. Scripts (C# or JavaScript code) are assets as well!

2) Unity employs a scene-graph system (you can also directly emit draws, but for now we'll ignore that). You can drag meshes into the scene hierarchy and they will appear in the game and the editor view, and create lights, cameras and various primitives.



The difference between the two is that the game is seen through a game camera, the editor can freely roam, and when you are in game view you can change object properties (if you're paused), but these changes don't persist (aren't serialized in the scene), while when you are in the editor changes are persistent.



3) Unity uses a component system for everything. A C# script just defines a class (with the same name as the script file) which inherits from "MonoBehaviour" and can implement certain callbacks.
All the public class members are automatically exposed in the component UI as editable properties (and C# annotations can be used to customize that) and serialized/deserialized with the scene.



A component can be attached to any scene object (a camera, a mesh, a light...) and can access/modify its properties, and perform actions at given times (before rendering, after, before update, on scene load, on component enable, when drawing debug objects and so on and so forth...)



Components can really freely change anything in the scene, as there is a way of finding objects by name, type and so on, and can also create new objects and so on. The performance characteristics of doing some operations is sometimes... surprising, and in real games you might need to cache/pool certain things, but for prototyping it's irrelevant.

Shaders & Rendering:

From the rendering side things are similarly simple. Perhaps the most complex aspect for someone unfamiliar with it, is the shader system. 

As most engines, Unity has a shader system that allows for automatic generation of shader permutations (e.g. the forward renderer needs as permutation per light type and shadow) and it also needs to handle different platforms (it can cross-compile HLSL to GLSL).
It achieves that with a small DSL for shader description "ShaderLab", and the shader code is actually embedded into it. 
Unity has also other ways of making shaders without touching HLSL, and a "surface shader" system that allows to avoid writing VS ans PS, but these are not really that interesting for a rendering engineer, so I won't cover them :)

ShaderLab has functionality to set render state and declare shader parameters, with the latter automatically reflected in the Material UI, when a material binds to a given shader. I won't go into a detailed description of this system, because once you see a ShaderLab shader things should be pretty obvious, but I'll provide some examples at the end.

For geometry materials, the procedure is quite simple: you'll need ShaderLab shader (.shader) asset, a material asset that is bound to it, and then you can just assign it to a mesh (drag and drop) and everything should work.

Unity supports three rendering systems (as of now): VertexLit (which is really a forward renderer without multipass and up to eight lights per object - some parts of the Unity docs say this is deprecated, but it seems it's going to live at least as a shader type), Forward (multipass, one light at a time - this rendering mode actually coexists with VertexLit) and Deferred (as in "shading", there is also a legacy system that does "deferred lighting" but that's actually deprecated). 
The shader has to declare for which system it's coded and the way Unity passes the lighting information to the shader changes based on that.

For post-effects materials you'll need both a shader and a component. The component will be a C# script that gets attached to the camera and triggers rendering in the OnRenderImage callback. In the script one can create programmatically a material, binding it to the shader and settings its parameters, so there's no need to have a separate material asset. 
The rendering API exposed by Unity is really minimal, but it's super easy to create rendertargets and draw fullscreen quads. Unity automatically chains post-effects if there are multiple components overriding OnRenderImage, and callback provides a source and destination rendertarget, so the chain is completely transparent to the scripts.

Fore more advanced effects, there is support for drawing and creating meshes (including their vertex attributes), drawing immediate geometry (lines and so on, usually for debugging) and even doing "procedural draws" (draws with no mesh data attached, vertices are assumed to be pulled from a buffer) and dispatching compute shaders.

It's also possible to access the g-buffer when using the deferred renderer and sample shadowmaps manually, but there is no provision for changing the way either are created, and no real access to any of the underlying graphics API (unless you write C++ plugins).

Last but not least, on PC Unity integrates with RenderDoc and Visual Studio for easy debugging, which is really a nice perk.

All this is best explained with code, so, if you care to try, -->here<-- b=""> is a bunch of fairly well commented (albeit very basic / mostly wrong in terms of rendering techniques) sample shaders I hastily made to learn Unity myself before I started teaching the course.

15 November, 2015

Lecture slides for University of Victoria


An introduction to what is like to be a rendering engineer or researcher in a AAA gamedev production, and why you might like it. Written for a guest lecture to computer graphics undergrads at University of Victoria.

As most people don't love when I put things on scribd, for now this is hosted from my dropbox.

15 August, 2015

Siggraph 2015 course

As some will have noticed, Michal Iwanicki and I were speakers (well I was much more of a listener, actually) in the physically based shading course this year (thanks again to both Stephens for organizing and invitingus) presenting a talk on approximate models for rendering, while our Activision colleague and rendering lead of Sledgehammer showed some of his studio's work on real world measurements used in Advanced Warfare.

Before and after :)
If you weren't at Siggraph this year or you missed the course, fear not, we'll publish the course notes soon (I need to do some proof reading and adding bibliography) and the course notes are "the real deal", as in twenty minutes we couldn't do much more than a teaser trailer on stage.

I wanted to write though about the reasons that motivated me to present in the course that material, give some background. Creating approximations might be time consuming sometimes, but it's often not that tricky, per-se I don't think it's the most exciting topic to talk about. 
But it is important, and it is important because we are still too often too wrong. Too many times we use models that we don't completely understand, that are exact under assumptions we didn't investigate and for which we don't know what perceptual error they cause.

You can nowadays point your finger at any random real time rendering technique, really look at it closely, compare with ground truth, and you're more likely than not to find fundamental flaws and simple improvements through approximation.

This is a very painful process, but necessary. PBR is like VR, it's an all or nothing technique. You can't just use GGX and call it a day. Your art has to be (perceptually) precise, your shadows, your GI, your post effect, there is a point where everything "snaps" and things just look real, but it's very easy to be just a bit off and ruin the illusion. 
Worse, errors propagate non-locally as artists try to compensate for our mistakes in the rendering pipeline by skewing the assets to try to reach as best as they can a local minimum.

Moreover, we are also... not helped I fear by the fact that some of these errors are only ours, we commit them in application, but the theory in many cases is clear. We often got research from the seventies and the eighties that we should just read more carefully. 
For decades in computer graphics we rendered images in gamma space, but there isn't anything for a researcher to publish about linear spaces, and even today we largely ignore what colorspaces really are and what we should use, for example.

We don't challenge the assumptions we work with.

A second issue I think is sometimes it's just neater to work with assumptions that it is to work on approximations. And it is preferable to derive our math exactly via algebraic simplifications, the problem is that when we simplify by imposing an assumption, its effects should be measured precisely.

If we consider constant illumination, and no bounces, we can define ambient occlusion, and it might be an interesting tool. But in reality it doesn't exist, so when is it a reasonable approximation? Then things don't exactly work great, so we tweak the concept and create ambient obscurance, which is better, but to a degree even more arbitrary. Of course this is just an example, but note: we always knew that AO is something odd and arbitrary, it's not a secret, but even in this simple case we don't really know how wrong it is, when it's more wrong, and what could be done to make it measurably better.

You might say that even just finding the errors we are making today and what is needed to bridge the gap, make that final step that separates nice images from actual photorealism, is a non-trivial open problem (*).
It's actually much easier to implement a many exciting new rendering features in an engine than to make sure that we got even a very simple and basic renderer is (again perceptually) right. And on the other hand if your goal is photorealism it's surely better to have a very constrained renderer in very constrained environments that is more accurate than a much fancier one used with less care.

I was particularly happy at this Siggraph to see that more and more we are aware of the importance of acquired data and ground truth simulations, the importance of being "correct", and there are many researchers working to tackle these problems that might seem even to a degree less sexy than others, but are really important.

In particular right after our presentations Brent Burley showed, yet again, a perfect mix of empirical observations, data modelling and analytic approximations in his new version of Disney's BRDF, and Luca Fascione did a better job I could ever do explaining the importance of knowing your domain, knowing your errors, and the continuum of PBR evolution in the industry.

P.S. If you want to start your dive into Siggraph content right, start with Alex "Statix" Evans amazingly presentation in the Advances course: cutting edge technology presented through a journey of different prototypes and ideas. 
Incredibly inspiring, I think also because the technical details were sometimes blurred just enough to let your own imagination run wild (read: I'm not smart enough to understand all the stuff... -sadface-). 
Really happy also to see many teams share insights "early" this year, before their games ship, we really are a great community.

P.S. after the course notes you might get a better sense of why I wrote some of my past posts like:

(*) I loved the open problems course, I think we need it each year and we need to think more at what we really need to solve. This is can be a great communication channel between the industry, the hardware manufactures and the academia. Chose wisely...

28 March, 2015

Being more wrong: Parallax corrected environment maps

Introduction

A follow-up to my article on how wrong we do environment map lighting, or how to get researchers excited and engineers depressed. 
Here I'll have a look at the errors we incur when we want to adopt "parallax corrected" (a.k.a. "localized" or "proxy geometry") pre-filtered cube-map probes, a technique so very popular nowadays.

I won't explain the base technique here, for that please refer to the following articles:

Errors, errors everywhere...

All these are in -addition- to the errors we commit when using the standard cubemap-based specular lighting.


1) Pre-filter shape

Let's imagine we're in an empty rectangular room, with diffuse walls. In this case the cubemap can be made to accurately represent radiance from the room.
We want to prefilter the cubemap to be able to query irradiance in a fast way. What shape does the filter kernel have?
  • The cubemap is not at infinite distance anymore -> the filter doesn't depend only on angles!
  • We have to look at how the BRDF lobe "hits" the walls, and that depends on many dimensions (view vector, normal, surface position, surface parameters)
  • Even in the easy case where we assume the BRDF lobe to be circularly symmetric around the reflection, and we consider the reflection to hit a wall perpendicularly, the footprint won't be exactly identical to one computed only on angles.
  • More worryingly, that case won't actually happen often, the BRDF will often hit a wall, or many walls, at an angle, creating an anisotropic footprint!
  • Pre-filtering "from the center", using angles, will skew the filter size near the cube vertices, but unlike infinite cubemaps, this is not exactly justified in this case, it optimizes for a single given point of view (query position)
  • Moreover! This is not radiance emitted from some magic infinitely distant environment. If we consider geometry, even through a proxy, we should then consider how that geometry emits radiance. Which is a 2D function (spherical). So we should bake a 4D representation, e.g. a cube map of spherical harmonic coefficients...
The pre-filter kernel for parallax-corrected cubes should be seen more as -a- kernel we applied and know...
It doesn't have a direct, one-to-one relationship with the material roughness... We can try, knowing we have a prefiltered cube, to approximate what fetch or fetches best approximate the actual BRDF footprint on the proxy geometry.

This problem can be seen also from a different point of view:
  • Let's assume we have a perfectly prefiltered cube for a given surface location in space (query point or "point of view"). 
  • Let's compute a new cubemap for a different point in space, by re-projecting the information in the first cubemap to the new point of view via the proxy geometry (or even the actual geometry for what matters...).
  • Let's imagine the filter kernel we applied at a given cubemap location in the original pre-filter. 

How will it become distorted after the projection we do to obtain the new cubemap? This is the distortion that we need to compensate somehow...

This issue is quite apparent with rougher objects near the proxy geometry, it results in a reflection that looks sharper, less rough than it should be, usually as we underfilter compared to the actual footprint.
A common "solution" is to not use parallax projection as the surfaces get rougher, which creates lighting errors.

I made this BRDF/plane intersection visualization
while working on area lights, the problem with cubemaps is identical

2) Visibility

In most real-world applications, the geometry we use for the parallax-correction (commonly a box) is doesn't match exactly the real world geometry. Environment with all perfectly rectangular, perfectly empty rooms might be a bit boring. 
As soon as we place an object on the ground, its geometry won't be captured by the reflection proxy, and we will be effectively raytracing the reflection past it, thus creating a light leak.

This is really quite a hard problem, light leaks are one of the big issues in rendering, they are immediately noticeable and they "disconnect" objects. Specular reflections in PBR tend to be quite intense, and so it's not easy even to just occlude them away with standard methods like SSAO (and of course considering only occlusion would be per se an error, we are just subtracting light).

An obvious solution to this issue is to just enrich somehow the geometrical representation we have for parallax correction, and this could be done in quite a lot of ways, from having richer analytic geometry to trace against, to using signed distance fields and so on.
All these ideas are neat, and will produce absolutely horrible results. Why? Because of the first problem we analyzed! 
The more complex and non-smooth your proxy geometry is, the more problems you'll have pre-filtering it. In general if your proxy is non-convex your BRDF can splat across different surfaces at different distances and will horribly break pre-filtering, resulting in sharp discontinuities on rough materials.
Any solution to this that wants to use non-convex proxies, needs to have a notion of prefiltered visibility, not just irradiance, and the ability of doing multiple fetches (blending them based on the prefiltered visibility)

A common trick to partially solve this issue is to "renormalize" the cube irradiance based on the ratio between the diffuse irradiance at the cube center and the diffuse irradiance at the surface (commonly known via lightmaps). 
The idea is that such ratio would express somewhat well how different (due to occlusions/other reflections) how intense the cubemap would be if it was baked from the surface point.
This trick works for rough materials, as the cubemap irradiance gets more "similar" to diffuse irradiance, but it breaks for sharp reflections... Somewhat ironically here the parallax cubemap is "best" with rough reflections, but we saw the opposite is true when it comes to filter footprint...

McGuire's Screen Space Raytracing

3) Other errors

For completeness, I'll mention here some other relatively "minor" errors:
  • Interpolation between reflection probes. We can't have a single probe for the entire environment, likely we'll have many that cover everything. Commonly these are made to overlap a bit and we interpolate while transitioning from one to another. This interpolation is wrong, note that if the two probes reprojected identically at a border between them, we wouldn't need to interpolate to being with...
  • These reflection proxies capture only radiance scattered only at a specific direction for each texel. If the scattering is not purely diffuse, you'll have another source of error.
  • Baking the scattering itself can be complicated, without a path tracer you risk to "miss" some light due to multiple scattering.
  • If you have fog (atmospheric scattering), its influence has to be considered, and it can't really be just pre-baked in the probes correctly (it depends on how much fog the reflection rays traverses, and it's not just attenuation, it will scatter the reflection rays altering the way they hit the proxy)
  • Question: what is the best point inside the proxy geometry volume from which to bake the cubemap probe? This is usually hand authored and artists tend to place it as possible away from any object (this could be a heuristic indeed, easy to implement).
  • Another way of seeing parallax-corrected probes is to treat think of them really as textured area lights

A common solution to mitigate many issues is to use screen space reflections (especially if you have the performance to do so, fading to baked cubemap proxies only where the SSR doesn't have data to work.
I won't delve into the errors and issues of SSR here, it would be off-topic, but beware of having the two methods represent the same radiance. Even when that's done correctly, the transition between the two techniques can be very noticeable and distracting, it might be better to use one or the other based on location.

From GPU-Based Importance Sampling.

Conclusions

If you think you are not committing large errors in your PBR pipeline, you didn't look hard enough. You should be aware of many issues, most of them having a real, practical impact and you should assume many more errors exist that you haven't discovered yet.
Do your own tests, compare with real-world, be aware, critical, use "ground truth" simulations.

Remember that in practice artists are good at hiding problems and working around them, often asking to have non-physical adjustment knobs they will use to tuning down/skew certain effects.
Listen to these requests as they probably "hide" a deep problem with your math and assumptions.

Finally, some tips on how to try solve these issues:
  • PBR is not free from hacks (not even offline...), there are many things we can't derive analytically. 
  • The main point of PBR is that now we can reason about physics to do "well motivated" hacks. 
  • That requires having references and ground truth to compare and tune.
    • A good idea for this problem is to write an importance sampled shader that does glossy reflections via many taps (doing the filtering part in realtime, per shaded point, instead of pre-filtering).
    • A full raytraced ground truth is also handy, and you don't need to recreate all the features of your runtime engine...
  • Experimentation requires fast iteration and a fast and accurate way to evaluate the error against ground truth.
  • If you have a way of programmatically computing the error from the realtime solution to the ground truth, you can figure out models with free parameters that can be then numerically optimized (fit) to minimize the error...

24 January, 2015

Notes on G-Buffer normal encodings

Tonight on Masterchef, we'll try to cook a G-Buffer encoding that is:
  • Fast when implemented in a shader
  • As compact as possible
  • Makes sense under linear interpolation (hardware "blendable", for pixel-shader based decals)
    • So we assume no pixel sync custom blending. On a fixed hardware like console's GPUs it would be probably possible to sort decals to not overlap often in screen-space and add waits so read-write of the same buffer never generates data races, but it's not easy.
  • As stable as possible, and secondarily as precise as possible.
Normal encodings are a well studied topic, both in computer science and due to their relation with sphere unwrapping and cartographic projections of the globe. Each of the aforementioned objectives, taken singularly, is quite simple to achieve.

Nothing is faster than keeping normals in their natural representations, as a three component vector, and this representation is also the one that makes most sense under linear interpolation (renormalizing afterwards of course). 
Unfortunately storing three components means we're wasting a lot of bits, as most bit combinations will yield something that is not a normal vector, so most encodings are unused. In fact we're using just a thin surface of a sphere of valid encodings inside a cubic space of minus one to one components (in 32bit floating point we'll get about 51bits worth of normals out of the 3x32=96bits storage).

If we wanted to go as compact as possible, Crytek's "best fit" normals are one of the best possible representations, together with the octahedral projection (which has a faster encoding and almost the same decoding cost).
Best fit normals aren't the absolute optimal, as they chose the closest representation among the directions contained in the encoding cube space, so they are still constrained to a fixed set of directions, but they are quite great and easy to implement in a shader.
Unfortunately these schemes aren't "blendable" at all. Octahedral normals have discontinuities on half of the normal hemisphere which won't allow blending even if we considered the encoding square to wrap around in a toroidal topology. Best fit normals are all of different lengths, even for very close directions (that's they key for not wasting encode space), so really there is no continuity at all.

Finally, stability. That usually comes from world-space normal encodings, on the account that most games have a varying view more often than they have moving objects with a static view. 
World-space two-component encodings can't be hardware "blendable" though, as they will always have a discontinuity on the normal sphere in order to unwrap it.
Object-space and tangent-space encodings are hard because we'll have to store these spaces in the g-buffer, which ends up taking encode space
.

View-space can allow blending of two-component encodings by "hiding" the discontinuity in the back-faces of objects so we don't "see" it, but in practice with less than 12 bits per component you end up seeing wobbling on very smooth specular objects as not only normals "snap" into their next representable position as we strafe the camera, but there is also a "fighting" of the precise view-to-world transform we apply during decoding with the quantized normal representation. As we don't have 12bit frame-buffer formats, we'll need to use two 16-bit components, which is not very compact.

So you get quite a puzzle to solve. In the following I'll sketch two possible recipes to try to alleviate these problems.

- Improving view-space projection.

We don't have a twelve bits format. But we do have a 10-10-10-2 format and a 11-11-10 one. The eleven bits floats don't help us much because of their uneven precision distribution (and with no sign bit we can't even center the most precision on the middle of the encoding space), but maybe we can devise some tricks to make ten bits "look like" twelve.

In screen-space, if we were ok never to have perfectly smooth mirrors, we could add some noise to gain one bit (dithering). Another idea could be to "blur" normals while decoding them, looking at their neighbours, but it's slow and it would require some logic to avoid smoothing discontinuities and normal map details, so I didn't even start with that.

A better idea is to look at all the projections and see how they do distribute their precision. Most studies on normal encodings and sphere unwrapping aim to optimize precision over the entire sphere, but here we really don't want that. In fact, we already know we're using view-space exactly to "hide" some of the sphere space, where we'll place the projection discontinuity.

It's common knowledge that we need more than a hemisphere worth of normals for view-space encodings, but how much exactly, and why? 
Some of it is due to normal-mapping, that can push normals to face backwards, but that's at least questionable, as these normal-map details should have been hidden by occlusion, where they an actual geometric displacement. 
Most of the reason we need to consider more than an hemisphere is due to the perspective projection we do after view-space transform, which makes the view-vector not constant in screen-space. We can see normals around the sides of objects at the edges of the screen that point backwards in view-space.

In order to fight this we can build a view matrix per pixel using the view vector as the z-axis of the space and then encode only an hemisphere (or so) of normals around it. This works, and it really helps eliminating the "wobble" due to the fighting precisions of the encode and view-space matrix, but it's not particularly fast.

- Encode-space tricks.

Working with projections is fun, and I won't stop you from spending days thinking about them and how to make spherical projections work only on parts of the sphere or how to extend hemispherical projections to get a bit more "gutter" space, how to spend more bits on the front-faces and so on. Likely you'll re-derive Lambert's azimuthal equal-area projection a couple of times in the process, by mistake.

What I've found interesting though after all these experiments, is that you can make your life easier by looking at the encode space (the unit square you're projection the normals to) instead, starting with a simple projection (equal-area is a good choice):
  1. You can simply clip away some normals by "zooming" in the region of the encode space you'll need. 
  2. You can approximate the per-pixel view-space by shifting the encode space so per-pixel the view-vector is its center.
    • Furthermore, as your view vector doesn't vary that much, and as your projection doesn't distort these vectors that much, you can approximate the shift by a linear transform of the screen-space 2d coordinate...
  3. Most projections map normals to a disk, not the entire square (the only two I've found that don't suffer from that are the hemispherical octahedral and the polar space transform/cylindrical projection). You can then map that disk to a square to gain a bit of precision (there are many ways to do so).
  4. You can take any projection and change its precision distribution if needed by distorting the encode space! Again, many ways to do so, I'd recommend using a piecewise quadratic if you go that route as you'll need to invert whatever function you apply. You can either distort the square by applying a function to each axis, or do the same shifting normals in the disc (easy with the polar transform).
    • Note that we can do this not only for normal projections, but also for example to improve the sampling of directions in a cubemap, e.g. for reflections...
Lambert Equal Area

Same encode, with screen-space based "recentering"
Note how the encoding on the flat plane isn't constant anymore

- Encoding using multiple bases.

This recipe comes from chef Alex Fry, from Frostbite's kitchen. It was hinted at by Sebastien Lagarde in his Siggraph presentation but never detailed. This is how I understood it, but really Alex should (and told me he will) provide a detailed explanation (which I'll link here eventually).

When I saw the issues stemming from view-space encodings one of my thoughts was to go world-space. But then you reason a second and realize you can't, because of the inevitable discontinuities. 
For a moment I stopped thinking about a projection that double-covers rotations... But then you'll need to store a bit to identify which of the projections you are using (kinda like dual parabolic mapping for example) and that won't be "blendable" either. 
We could do some magic if we had wrapping in the blending but we don't, and the blending units aren't going to change anytime soon I believe. So I tossed this away.

And I was wrong. Alex's intuition is that you can indeed use multiple projections and you'll need to store extra bits to encode which projection you used... But, these bits can be read-only during decal-blending! 
We can just read which projection was used and project the decal normals in the same space, and live happily ever after! The key is of course to always chose a space that hides your projection discontinuities.


2-bits tangent spaces

3-bits tangent spaces

4-bits tangent spaces
There are many ways to implement this, but to me one very reasonable choice would be to consider the geometric normals when laying down the g-buffer, and chose from a small set of transforms the one with its main axis (the axis we'll orient our normal hemisphere towards during encoding) closest to the surface normal.

That choice can then be encoded in a few bits that we can stuff anywhere. The two "spare" bits of the 10-10-10-2 format would work, but we can "steal" some bits from any part of the g-buffer (even maintaining the ability to blend that component, by using the most significant bits, reading them in (as we need for the normal encoding anyways) and outputting that component remaining bits (that need to be blended) while keeping the MSB the same across all decals.

I'll probably share some code later, but a good choice would be to use the bases having one axis oriented as the coordinates of the vertices of a signed unitary cube (i.e. spanning minus to plus one), for which the remaining two tangent axes are easy to derive (won't need to re-orthonormalize based on an up vector as it's usually done).

Finally, note how this encode gives us a representation that plays nicely with hardware blending, but it's also more precise than a two-component 10-10 projection if done right. Each encoding space will in fact take care of a subset of the normals on a sphere, namely all the normals closest to its main axis. 
Thus, we know that the subsequent normal projection to a two component representation have to take care only of a subset of all possible normals (a hemisphere worth of them, for decals to be able to blend, plus the solid angle of the largest "cell" spanned by the tangent spaces on the sphere), and more spaces (and more bits) we have less normals we'll have to encode for each (e.g. with only one extra bit, the TS would divide the sphere in two halves, and we'll need in each to encode a sphere worth of normals to allow decals to blend, which would be a bad idea). 
So, if we engineer the projection to encode only the normals it has to, we can effectively gain precision as we add bits to the space selection. 

Conversely beware of encoding more normals than needed, as that can create creases as you move from one space to another on the sphere, when applying normal maps, as certain points will have more "slack" space to encode normals and some others less, you'll have to make sure you won't end up pushing the normals outside the tangent-space hemisphere at any surface point.

- Conclusions & Further references:

Once again we have a pretty fundamental problem in computer graphics that still is a potential source of so much research, and that still is not really solved.
If we look carefully at all our basic primitives, normals, colours, geometry, textures and so on, there is still a lot we don't know or we do badly, often without realizing the problems.
The two encoding strategies I wrote of are certainly a step forward, but still in 10 bits per components you will see quantization patterns on very smooth materials (sharp highlights, mirror-like reflections are the real test case for this). And none of the mappings is optimal, there are still opportunities to stretch and tweak the encoding to use its bits better. 
Fun times ahead.


A good test case with highly specular materials, flat surfaces, slowly rotating objects/view

03 September, 2014

Notes on real-time renderers

This accounts only for well-established methods, there are many other methods and combinations of methods I'm not covering. It's a sort of "recap".

Forward
Single pass over geometry generates "final" image, lights are bound to draw calls (via uniforms), accurate culling of light influence on geometry requires CSG splits. Multiple lights require either loops/branches in the shaders or shader permutations.
  • Benefits
    • Fastest in its baseline case (single light per pixel, "simple" shaders or even baked lighting). Doesn't have a "constant" up-front investment, you pay as you go (more lights, more textures...).
    • Least memory necessary (least bandwidth, at least in theory). Makes MSAA possible.
    • Easy to integrate with shadowmaps (can render them one-at-a-time, or almost)
    • No extra pass over geometry
    • Any material, except ones that require screen-space passes like Jimenez's SS-SSS
  • Issues
    • Culling lights on geometry requires geometrical splits (not a huge deal, actually). Can support "static" variations of shaders to customize for a given rendering case (number/type of lights, number of textures and so on) but "pays" such optimization with combinatorial explosion of shader cases and many more draw calls.
    • Culling dynamic lights can't be efficiently done. Scripted lights along fixed paths can be somewhat culled via geometry cutting, but fully dynamic lights can't efficiently cut geometry in runtime, just be assigned to objects, thus wasting computation.
    • Decals need to be multipass, lit twice. Alternatively, for static decals mesh can be cut and texture layering used (more shader variations), or for dynamic decals color can be splatted before main pass (but that costs an access to the offscreen buffer regardless or not a decal is there).
    • Complex shaders might not run optimally. As you have to do texturing and lighting (and shadowing) in the same pass, shaders can require a lot of registers and yield limited occupancy. Accessing many textures in sequence might create more trashing than accessing them in separate passes.
    • Lighting/texturing variations have to be dealt with dynamic branches which are often problematic for the shader compiler (must allocate registers for the worst case...), conditional moves (wasted work and registers) or shader permutations (combinatorial explosion)
    • Many "modern" rending effects require a depth/normal pre-pass anyways (i.e. SSAO, screen-space shadows, reflections and so on). Even though some of these can be faded out after a preset range and thus they can work with a partial pre-pass.
    • All shading is done on geometry, which means we pay all the eventual inefficiencies (e.g. partial quads, overdraw) on all shaders.

Forward+ (light indexed)
Forward+ is basically identical to forward but doesn't do any geometry split on the scene as a pre-pass, it relies on tiles or 3d grids ("clustered") to cull lights in runtime.
  • Benefits
    • Same memory as forward, more bandwidth. Enables MSAA.
    • Any material (same as forward)
    • Compared to forward, no mesh splitting necessary, much less shader permutations, less draw calls.
    • Compared to forward it handles dynamic lights with good culling.
  • Issues
    • Light occlusion culling requires a full depth pre-pass for a total of two geometrical passes. Can be somehow sidestepped with a clustered light grid, if you don't have to end up splatting too many lights into it.
    • All shadowmaps need to be generated upfront (more memory) or splatted in screen-space in a pre-pass.
    • All lighting permutations need to be addressed as dynamic branches in the shader. Not good if we need to support many kinds of light/shadow types. In cases where simple lighting is needed, still has to pay the price of a monolithic ubershader that has to consider any lighting scenario.
    • Compared to forward, seems a steep price to pay to just get rid of geometry cutting. Note that even if it "solved" shader permutations, its solution is the same as doing forward with shaders that dynamic branch over light types/number of lights and setting these parameters per draw call.

Deferred shading
Geometry pass renders a buffer of material attributes (and other proprieties needed for lighting but bound to geometry, e.g. lightmaps, vertex-baked lighting...). Lighting is computed in screenspace either by rendering volumes (stencil) or by using tiling. Multiple shading equations need either to be handled via branches in the lighting shaders, or via multiple passes per light.
  • Benefits
    • Decouples texturing from lighting. Executes only texturing on geometry so it suffers less from partial quads, overdraw. Also, potentially can be faster on complex shaders (as discussed in the forward rendering issues).
    • Allows volumetric or multipass decals (and special effects) on the GBuffer (without computing the lighting twice).
    • Allows full-screen material passes like analytic geometric specular antialiasing (pre-filtering), which really works only done on the GBuffer, in forward it fails on all hard edges (split normals), and screen-space subsurface scattering.
    • Less draw calls, less shader permutations, one or few lighting shaders that can be hand-optimized well.
  • Issues
    • Uses more memory and bandwidth. Might be slower due to more memory communication needed, especially on areas with simple lighting.
    • Doesn't handle transparencies easily. If a tiled or clustered deferred is used, the light information can be passed to a forward+ pass for transparencies.
    • Limits materials that need many different material parameters to be passed from geometry to lighting (GBuffer), even if shader variation for material in a modern PBR renderer tends not to be a problem.
    • Can't do lighting computations per object/vertex (i.e. GI), needs to pass everything per pixel in the GBuffer. An alternative is to store baked data in a voxel structure.
    • Accessing lighting related textures (gobos, cubemaps) might be less cache-coherent.
    • In general it has lots of enticing benefits over forward, and it -might- be faster in complex lighting/material/decal scenarios, but the baseline simple lighting/shading case is much more expensive.
Notes on tiled/clustered versus "stenciled" techniques
On older hardware early-stencil was limited to a single bit, so it couldn't be used both to mark the light volume and distinguish surface types. Tiled could be needed as it allowed more material variety by categorizing tiles and issuing multiple tile draws if needed.

On newer hardware tiled benefits lie in the ability of reducing bandwidth by processing all lights in a tile in a single shader. It also has some benefits for very small lights as these might stall early in the pipeline of the rasterizer, if drawn as volumes (draws that generate too little PS work). 

In fact most tile renderers demos like to show thousands of lights in view... But the reality is that it's still tricky to afford many shadowed lights per pixel in any case (even on nextgen where we have enough memory to cache shadowmaps), and unshadowed, cheap lights are worse than no lighting at all.

Often, these cheap unshadowed lights are used as "fill", a cheap replacement for GI. This is not an unreasonable use case, but there are better ways, and standard lights, even when diffuse only, are actually not a great representation of indirect radiance. 
Voxel, vertex and lightmap bakes are often superior, or one could thing of special fill volumes that can take more space, embedding some radiance representation and falloff in them.

In fact one of the typical "deferred" looks that many games still have today is characterized by "many" cheap point lights without shadowing (nor GI, nor gobos...), creating ugly circular splotches in the scene. 
Also tiled/clustered makes dynamic shadows somewhat harder, as you can't render one shadowmap at a time...

Tiled and clustered have their reasons, but demo scenes with thousands of cheap point lights are not one. Mostly they are interesting if you can compute other interesting data per tile/voxel.
You can still get a BW saving in "realistic" scenes with low overlap of lighting volumes, but it's a tradeoff between that and accurate per-quad light culling you get from a modern early-z volume renderer.

Deferred lighting
I'd say this is dwindling technique nowadays compared to deferred shading.
  • Benefits
    • It requires less memory per pass, but the total memory traffic summing all passes is roughly the same. The former used to be -very- important due to the limited EDRAM memory on xbox 360 (and its inability to render outside EDRAM).
    • In theory allows more material "hacks", but it's still very limited. In fact I'd say the material expressiveness is identical to deferred shading, but you can add "extra" lighting in the second geometry pass. On deferred shading that has to be passed along using extra GBuffer space.
    • Allows shadows to be generated and used per light, instead of all upfront like deferred shading/forward+
  • Issues
    • An extra geometric pass (could be avoided by using the same GBuffer as deferred shading, then doing lighting and compositing with the textures in separate fullscreen passes - but then it's almost more a variant of DS than DL imho)
Some Links: