Search this blog

Showing posts with label Stupid rendering tricks. Show all posts
Showing posts with label Stupid rendering tricks. Show all posts

12 April, 2023

Half baked and a half: A small update.

Previously: C0DE517E: Half baked: Dynamic Occlusion Culling

Trying the idea of using the (incrementally accumulated) voxel data to augment the reprojection of the previous depth buffer.

Actually, I use here a depth from five frames ago (storing them in a ring buffer) - to simulate the (really worst-case) delay we would expect from CPU readbacks.

Scene and the final occlusion buffer (quarter res):


Here is the occlusion buffer, generated with different techniques. Top: without median, Bottom: with. Left to right: depth reprojection only, voxel only, both. 

Note that the camera was undergoing fast rotation, you can see that the reprojected depth has a large area along the bottom and left edges where there is no information.


Debug views: accumulated voxel data. 256x256x128 (8mb) 8bit voxels, each voxel stores a 2x2x2 binary sub-voxel. 


The sub-voxels are rendered only "up close", they are a simple LOD scheme. In practice, we can LOD more, render (splat) only up close and only in areas where the depth reprojection has holes.

Note that my voxel renderer (point splatter) right now is just a brute-force compute shader that iterates over the entire 3d texture (doesn't even try to frustum cull). 
Of course that's bad, but it's not useful for me to improve performance, only to test LOD ideas, memory requirements and so on, as the real implementation would need to be on the CPU anyways.

Let's go step by step now, to further illustrate the idea thus far.

Naive Z reprojection (bottom left) and the ring buffer of five quarter-res depth buffers:


Note the three main issues with the depth reprojection:
  1. It cannot cover the entire frame, there is a gap (in this case on the bottom left) where we had no data due to camera movement/rotation.
  2. The point reprojection undersampled in the areas of the frame that get "stretched" - creating small holes (look around the right edge of the image). This is the primary job of the median filter to fix, albeit I suspect that this step can be fast enough that we could also supersample a bit (say, reproject a half-res depth into the quarter res buffer...)
  3. Disocclusion "holes" (see around the poles on the left half of the frame)
After the median filter (2x magnification). On the left, a debug image showing the absolute error compared to the real (end of frame) z-buffer. 

The error scale goes from yellow (negative error - false occlusion) to black (no error) to cyan (positive error - false disocclusion. Also, there is a faint yellow dot pattern marking the areas that were not written at all by the reprojection.

Note how all the error right now it "positive" - which is good:


My current hole-filling median algorithm does not fix all the small reprojection gaps, it could be more aggressive, but in practice right now it didn't seem to be a problem.

Now let's start adding in the voxel point splats:


And finally, only in the areas that still are "empty" from either pass, we do a further dilation (this time, a larger filter, starting from 3x3 but going up to 5x5, taking the farthest sample)


We get the entire frame reconstructed, with an error that is surprisingly decent.

A cute trick: it's cheap to use the subvoxel data, when we don't render the 2x2x2, to bias the position of the voxel point. Just a simple lookup[256] to a float3 with the average position of the corresponding full subvoxels for that given encoded byte.

This reasoning could be extended to "supervoxels", 64 bits could and should (data should be in Morton order, which would result in an implicit, full octree) encode 2x2x2 8 bit voxels... then far away we could splat only one point per 64bit supervoxels, and position it with the same bias logic (create an 8bit mask from the 64bits, then use the lookup).




10 April, 2023

From the archive: Notes on GGX parallax correction.

As for all my "series" - this might very well the first and last post about it, we'll see. I have a reasonable trove of solutions on my hard-drive that were either shipped, but never published, not even shipped or were, shipped, "published" but with minimal details, as a side note of bigger presentations. Wouldn't it be a shame if they spoiled?

Warning! All of what I'm going to talk about next probably is not very meaningful if you haven't been implementing parallax-corrected cubemaps before (or rather, recently), but if you did, it will (hopefully) all make sense.
This is not going to be a gentle introduction to the topic, just a dump of some notes...

Preconvoluted specular cubemaps come with all kinds of errors, but in the past decade or so we invented a better technique, where we improve the spatial locality of the cubemap by using a proxy geometry and raycasting. 

Typically the proxy geometry is rectangular, and the technique is known as parallax-corrected specular cubemaps. This better technique comes with even more errors built-in, I did a summary of all of the problems here, back in 2015.

From Seb. Lagarde (link above)

The following is an attempt to solve one of the defects parallax correction introduces, by retrofitting some math I did for area lights to see if we can come up with a good solution.

Setup is the following: We have a cubemap specular reflection probe somewhere, and we want to use that to get the specular from a location different from the cube center. 
In order to do so, we trace a reflection ray from the surface to be shaded to the scene geometry, represented via some proxies that are easy to intersect, then we look the reflection baked in the probe towards the intersection point.

The problem with this setup is illustrated below. If you think of the specular lobe as projecting its intensity on the surfaces of the scene, you get a given footprint, which will be in general discontinuous (due to visibility) and stretched.

Think of our specular lobe like shining light from a torch on a surface.


Clearly, when we baked the cubemap, we were moving the torch in a given way, from the cubemap center all around. When we query though, we are looking for the lobe that a torch would create on the scene from the shaded point, towards the reflection direction (or well, technically not as a BRDF is not a lobe around the mirror reflection direction but you know that with preconvolved cubemaps we always approximate with "Phong"-like lobes).


By using the cubemap information, we get a given projected kernel which in general doesn't match -at all- the kernel that our specular lobe on the surface projects.
There is no guarantee that they are even closely related, because they can be at different distances, at different angles and "looking" at different scene surfaces (due to discontinuities).

Now, geometry is the worst offender here. 

Even if the parallax proxy geometry is not the real scene, and we use proxies that are convex (boxes, k-dops...), naively intersecting planes to get a "corrected" reflection lookup clearly shows in shading at higher roughness, due to discontinuities in the derivatives.

From youtube - note how the reflected corners of the room appear sharp, are not correctly blurred by the rough floor material.

The proxy geometry becomes "visible" in the reflection: as the ray changes plane, it changes the ratio of correction, and the plane discontinuity becomes obvious in the final image. 

This is why in practice intersecting boxes is not great, and you'd have to find some smoother proxy geometry or "fade" out the parallax correction at high roughness. To my knowledge, everyone (??) does this "by eye", I'm not aware of a scientific approach, motivated in approximations and errors.

Honestly today I cannot recall what ended up shipping at the time, I think we initially had the idea of "fading" the parallax correction, then I added a weighting scheme to "blend" the intersection (ray parameter) between planes, and I also "pushed away" the parallax planes if we are too near them.

In theory you could intersect something like a rounded box primitive, control the rounding with the roughness parameter, and reason about Jacobians (derivatives, continuity of the resulting filtering kernel, distortion...) but that sounds expensive and harder to generalize to k-dops.

The second worst "offender" with parallax correction is the difference in shape of the specular lobes, the precomputed one versus the "ideal" one we want to reconstruct, that happens even when both are projected on the same plane (i.e. in absence of visibility discontinuities).

The simplest correction to make is in the case where the two lobes are both perpendicular to a surface, the only difference being the distance to it.

This is relatively easy as increasing the distance looks close enough to increasing the roughness. Not exactly the same, but close enough to fit a simple correction formula that tweaks the roughness we fetch from the cubemap based on the ratio between the cubemap-to-intersection distance and the surface-to-intersection one:


From this observation we know we can use numerical fitting and precomputation to find a correction factor from one model to another. 
Then, we can take that fitted data and either using a lookup for the conversion or we can find an analytic function that approximates it.

This methodology is what I described at Siggraph 2015 and have used many times since. Formulate an hypothesis: this can be approximated with that. Use brute force to optimize free parameters. Visualize the fitting and end results versus ground truth to understand if the process worked or if not, why not (where are the errors). Rinse and repeat.

Here you can see the first step. For every roughness (alpha) and distance, I fit a GGX D lobe with a new alpha', here adding a multiplicative scaling factor and an additive offset (subtractive, really, as the fitting will show).


Why we use an additive offset? Well, it helps with the fitting, and it should be clear why, if we look at the previous grid. GGX at high roughness has long tail that turns "omnidirectional", whilst a low roughness lobe that is shining far away from a plane does not exhibit that omnidirectional factor.


We cannot use it though, we employ only to help the fitting process find a good match. Why? Well, first, because we can't express it with a single fetch in a preconvolved cubemap mip hierarchy (we can only change the preconvolved lobe by a multiplicative factor), but also note that it is non-zero only in the area where the roughness maxes out (we cannot get rougher than alpha=1), and in that area there is nothing really that we can do.

Of course, next we'd want to find an analytic approximation, but also make sure everything is done in whatever exact association there is from cubemap mip level to alpha, ending up with a function that goes from GGX mip selection to adjusted GGX mip selection (given the distance). 
This is really engine-dependent, and left as an exercise to the reader (in all honesty, I don't even have the final formulas/code anymore)

Next up is to consider the case where the cubemap and the surface are not perpendicular to the intersection plane (even keeping that to be just a plane, so again, no discontinuities). Can we account for that as well?

To illustrate the problem, the following shows the absolute value of the cosine of the angle of the intersection between the reflection direction and the proxy planes in a scene.


This is much harder to fit a correction factor for. The problem is that the two different directions (the precomputed one and the actual one) can be quite different.
Same distance, one kernel hits at polar angle Pi/3,0, the second -Pi/3,Pi/3. How do you adjust the mip (roughness) to make one match the other?


One possible idea is to consider how different is the intersection at an angle and the corresponding perpendicular one.
If we have a function that goes from angle,distance -> an isotropic, perpendicular kernel (roughness', angle=0, same distance) then we could maybe go from the real footprint we need for specular to an isotropic footprint, and from the real footprints that we have in the cubemap mips to the isotropic and search for the closest match between the two isotropic projections.

The problem here is that really, with a single fetch/isotropic kernel, it doesn't seem that there a lot to gain by changing the roughness as function of the angle. 

In the following, I grapth projections at an angle compared to perpendicular lobe (GGX D term only). 
All graphs are with alpha = 0.1, distance = plane size (so it's equivalent to the kernel at the center of a prefiltered cubemap when you ignore the slant). 

Pi/6 - the two lobes seem "visually" very close:


At Pi/2.5 we get a very long "tail" but note that the width of the central part of the kernel seems still to fit the isotropic fetch without any change of roughness.


Now here "seems to fit" really doesn't mean much. What we should do is to look at rendered results, compare to ground truth / best effort (i.e. using sampling instead of prefiltering, whilst still using the assumption of representing radiance with the baked, localized cubemap), and if we want to then use numerical methods, do so with an error measure based on some perceptual metric.

And this is what I did, but failed to find any reasonable correction, keeping the limitation of a single fetch. The only hope is to turn to multiple fetches, and optimize the preconvolution specifically to bake data that is useful for the reconstruction, not using a GGX prefiltering necessarily.

I suspect that actually the long anisotropic tail created by the BRDF specular lobe is not, visually, an huge issue. 
The problem that what we get is (also) the opposite, from the point of view of the reconstruction, we get tails "baked" into the prefiltered cube at arbitrary angles (compared to the angles we need for specular on surfaces), and these long tails create artifacts.

To account for that, the prefiltering step should probably take directly into account the proxy geometry shape. I.e. if these observations are correct, they point towards the idea that parallax-corrected cubemaps should be filtered by a fixed distance (relative to projected texel size), perpendicular to the proxy plane kernel. 

That way when we query the cubemap we have only to convert the projected specular kernel to a kernel perpendicular to the surface (which would be ~ the same kernel we get at that roughness and same distance, just perpendicular), and then look in the mip chain the roughness that gives us a similar prefiltered image, by doing a distance-ratio-to-roughness adjustment as described in the first part of this text. 



24 August, 2019

Misunderstanding Multilayering (Diffuse-Specular Energy Conservation)

- Introduction to the problem

In our last episode, "misunderstanding multiscattering", we saw how to create a multiscattering BRDF mostly by intuition. We used the concept of directional albedo (a.k.a. directional-hemispherical reflectance) to normalize a specular BRDF and from there we derived a very simple closed-form approximation for GGX energy conservation.

We also showed that the directional albedo is the response of our BRDF in a white light furnace at different normal to view angles, and we typically have that information available in tabular form as it's a key component of the "split sum" approximation to the integrals needed for image-based lighting on contemporary BRDFs. Great!

This time, we'll show how the very same data, ideas, and intuitions, can be used to build multilayered BRDFs, and in particular to couple matte-specular models (diffuse lobes to specular lobes). 
Like the previous article, this is not anything particularly novel, this isn't Siggraph worthy material, just some notes on the subject.

Let's start, as always, with the problem, with a very quick recap of things everyone reading this will probably already know (my flight has been delayed a couple of hours, so yes, we have time for recaps).

Under the framework of geometrical optics we look for light interactions with materials at surfaces where the index of refraction changes. We assume that light travels in a medium of uniform IOR (e.g. air), and then eventually hits a surface with a different IOR and either gets reflected out of the surface, or refracted into it.
In fact in real-time rendering we only consider the air->material interface, we don't handle nested objects (e.g. air->glass->water->glass->air) even when we render transparent materials or multilayered materials. Which is wrong, but it's just one of the many ways we are still totally wrong (and yet might or might not be right by not caring). Even for offline path-tracers, this is not entirely trivial, depending on how you do light transport.

We then consider, for a small but not infinitesimal patch of our surface, the statistics of how probable are these reflections and refraction in function of the light incident angle, a given outgoing angle we want to measure, and the surface patch normal, and these statistics create a BRDF "lobe".

But BRDFs though don't typically have a single lobe: we usually have "metals" and "dielectrics" and in the latter case we have a specular and a diffuse lobe. How comes? Well, because we don't really consider a single interaction, that was sort-of a lie. We do create a lobe with the light that is scattered from the surface interaction, but we also have to consider the light that gets refracted into the surface. In the case of metals, the energy that goes into the material somehow disappears (becomes heat or maybe fairy dust who knows - physics) and we effectively have a single lobe. 
For dielectrics, though the refracted light keeps hitting molecules inside what's essentially a participating medium, and eventually taking random paths it comes back out from the surface and we model that as a diffuse lobe. 

Diffuse lobes are reasonable when the "bouncing around" is such that it effectively randomizes the direction at which the light comes out, but still happens in such a small space that we consider the light coming out effectively at the same point it came in. 
If that's not true, and the scattering distances are bigger, we have what we call "subsurface scattering" effects. If the direction is not randomized "enough", then instead we have transparent materials, and somewhere in the middle, we have what we usually address as participating media.

Any time that we consider multi-layered materials we have to model the way the light that is refracted by a layer reaches the next. At the very least, we have to know at least how much energy is "handled" by a given layer/lobe and how much is "passed down" to the subsequent lobes, as if we gave the same input light to all the layers we would sum the interactions up and potentially end up with more energy coming out of a material than it came in!
In theory we should understand much more than that, namely how the light travels in a given layer and reaches another one (its statistical distribution over directions and space), but for diffuse lobes found in dielectrics we can imagine that it doesn't matter much (anyways the light is going to be randomized...), so right now we just want to know how much energy survives the topmost specular lobe of a dielectric (gets refracted as opposed to reflected).

This might seem a lot of handwaving. And it is! This is the point of this post -and- the previous one!

We know that we have certain problems in our math, but should we care? How much should we care and why? Does it matter for the goal of generating photorealistic images easily? Remember, this is our goal, not fixing physics. Then again if we wanted to look at the physics there are a ton of assumptions that we are making anyways.

Left to right: Specular only, Specular+Diffuse, Specular+Diffuse*(1-Fresnel)
It turns out that this problem matters a lot, much more in fact than the energy loss in non-multiscattering specular lobes. The multiscattering issue was small, happened only at high roughnesses and it was entirely fixable by artists tweaking specular albedo (typically controlled by Fresnel f0 parameter) a bit.

Tweaking albedo is fine because it still keeps the materials decoupled from lighting, the main issue we wanted to fix by embracing physically-based rendering models, and it's doable because the tweaks needed are small and entirely in the range of realistic albedos (which don't ever go to f0 = 1 for metals, and of course even less so for dielectrics).

Not having any energy conservation between specular and diffuse instead creates overly bright materials that can look fine in a certain scene but will glow unnaturally under different conditions, and it's quite hard for artists to make sure that the lobes are tuned in a way that doesn't produce more energy than they should.
In fact, most real-time rendering today works without multiscattering BRDFs, but (hopefully) it always considers some way of balancing specular and diffuse.

- Solutions and pretty images

Now we know the problem -and- we know it matters, so we are justified in spending some time to investigate further. How? Well, as we did with the previous post, we bring back our friend the white furnace test. Pretty much any time we want to check for energy conservation, we take our materials and put them in a furnace!


As for the previous time though, just putting a material in a furnace doesn't tell us that much. Yes, it's already apparent how the middle image looks unnaturally bright, but that's not a great test. What we want is to setup our scene so we expect the material to reflect back exactly all the light that comes in, no matter which path the light takes in the material and across the microfacets, and then see if we get more light than we expect or less.
We have to think, what is that is absorbing light in our materials? The specular reflects light out or refracts it in, it doesn't absorb, so all the energy loss is when we go "inside" the material, in the dielectrics case, in the diffuse layer. It stands to reason then that if we set our diffuse albedo to one, regardless of the specular parameters, we should be perfectly white in a white furnace. Let's see:

Comparing no energy conservation to (1-Fresnel) with a fully white diffuse albedo.
Bingo! We see now that just adding diffuse with no attempt at energy conservation does indeed result in energy being created. And surprisingly it looks like that the simple idea of using (1-F) as a multiplication factor to normalize diffuse is indeed doing a very good job, in fact, it looks alright if it wasn't for the grazing angles... Why is that?

(1-Fresnel) at varying roughness (from 0.1 to 0.9)
Same as above, but using a non-multiscattering Specular instead of the simple one presented in the last post.
Well if we think about it, it's fairly obvious. For the terms we commonly use, at the incident angle (N=V=L) the shadowing term has no effect and the NDF takes a constant value regardless of the roughness parameter, Fresnel dominates. At grazing angles, the shadowing term starts to matter, and there is where we see our simple normalization breaking down.

What we can try to fix this? If you read the previous post, it should be obvious by now. We have something that should be white in a furnace, it isn't, let's make it! We know how much energy the specular lobe will scatter back in the furnace, for any roughness, Fresnel f0, and viewing angle, this again is the split-sum table we use for environment map lighting. We know that a Lambert diffuse is correctly normalized, so it will scatter back all incoming energy if the albedo is set to one. So how much do we have to scale the lobe for the sum of the specular plus diffuse to be one, with an arbitrary specular and a unitary albedo diffuse? Obviously just by 1-E, where E is what we get from the split-sum table!

(1-E) energy conservation test, notice the more correct grazing angles.
Note: some artifacts remain due to approximations in the Directional Albedo table I used.
Furnace fixed! And note, this works regardless of if we're using or not the multiscattering specular BRDF, as a non-multiscattering one will just behave as it's refracting more light to the diffuse layer, which in our case will still bounce it all back out. And, if we're using the simple renormalization for multiscattering presented in the previous post, we don't even need to compute a new table for directional albedo, as it's easy to derive how to analytically modify the output of the single scattering table to accommodate for the multiscattering correction (I'll leave this as an exercise for the reader, it's trivial and might motivate you to actually look at the definition of directional albedo...)

As it was for the multiscattering normalization idea, this is wrong and we can see immediately that it's wrong from the equation, even if the error won't show in the furnace, because we don't respect reciprocity (we consider only ndotv and not ndotl). It's even intuitive, as we don't seem to consider at all that the light going from the specular layer to the diffuse, scattering inside the material and eventually coming out, has to cross again the specular interface as it comes out!

Before we delve further into this, I want though to stop for a second and check out what we are doing (again, same as last time). Let's plot the 1-E multiplicative factor we're applying to diffuse and see what happens:
(1-E) normalization compared to (1-Fresnel) at varying roughness,
for a non-multiscattering specular.

Same as above, but using simple multiscattering specular. 
This is interesting! Albeit we can couple diffuse with and without a multiscattering specular, the plots for the multiscattering case are much simpler, so much so that it's easy to derive, by hand, a function that would approximate them. Fun!

Approximation: mix( (vec3(1,1,1) - fresnel(f0,ndotv) ), vec3(1) - f0, roughness )
And now, at last, let's have a look at a more correct solution that respects reciprocity and see if that matters or not for real-time rendering. This is quite hard to know intuitively, so we'll have just to implement something and check.

Luckily this is nothing new, in 2001 Kelemen and László Szirmay-Kalos published "A Microfacet Based Coupled Specular-Matte BRDF Model with Importance Sampling", and we can pretty much copy and paste their equation, which unsurprisingly ends up looking very close to the way we adjust for reciprocity in specular multiscattering (in fact Kulla and Conty cite the afore mentioned KSK paper in theirs).

Without further ado:

Left to right: Simple EC, Fresnel EC, KSK EC.
Simple Multiscattering GGX, Roughness 0.1.
Same as above but for 0.3 roughness GGX.

And some more images. The difference between (1-F) and (1-E) can be seen only at low roughness. Difference is even less pronounced for table-based (1-E), the approximation showed above and the full KSK method.

1-Fresnel; Roughness 0.1 to 0.9, GGX with simple multiscattering.
Approximated 1-E.
Table based 1-E.
KSK.

12 August, 2019

Misunderstanding Multiscattering

Today we will build a state-of-the-art multiscattering GGX BRDF for physically-based rendering. 

Already done you say? Yes, you're right, by people who understand maths and physics, that's cheating. We instead will try to do the same trying to learn as little as possible...

If you want to do this the right way instead, these are some excellent articles:
Ok, ready? 

Let's start from your vanilla contemporary GGX with all the correlated-Smith fixings. The following sequence of images is generated with Disney's BRDF explorer environment lighting mode. They show a metallic GGX withy varying roughness, from 0.3 to 0.9, using the common parametrization alpha = roughness squared.
Note: please always indicate the parametrization you are using in your articles and presentations. There are so many variants out there. Here I use one of the most common ones, even if, for what is worth probably the best idea is alpha = roughess^4, which is more perceptually linear and also manages to be a great approximation for the one-over-square-root Blinn-Phong Gloss to GGX mapping many engines prefer.


We can notice how at high roughness the material looks noticeably darker. This is annoying, and smart people have worked hard to find and fix this issue. 

Let's compare the image above with Kulla/Conty's Multiscattering GGX BRDF.


Fixed! 

Now, we could try to understand how this solution work, the math is already there. And you probably should, it's both interesting and useful to understand other related problems. But, let's try not to...

First of all, we should ask why is this a problem? How do we know it is a problem? And also, if it is a problem, how big of a problem it is?

Let's start at the top. Why do we like "PBR"? It's not because we like physics, at least, I don't... And if we did like physics, this would be a small error to focus on, considering that we are surely committing worse sins in our end-to-end image pipeline...

Computer graphics is not predictive rendering. We don't try to simulate physics to make accurate simulations, that's not a goal (and the people who care about physics are an order of magnitude ahead of us). We make pretty pictures.

At a given point, we noticed that using some physics to make pretty pictures allowed for easier workflows, decoupling materials from lighting, reducing the number of hacks and parameters, allowing to use libraries of real materials and so on. 
We thought artists would be happier and be able to work more efficiently, while still being able to create a lot of different scenes. It took a while to move over all our workflows, but it was the right call.

So, when we look for right and wrong, we have to start with art and artists. If there is a need we can identify there, then we can look at physics to see if the answers are there. In this case, we can say that yes, indeed we have a (small) problem. 
It would be nice if our material parameters were orthogonal, and having things go darker as we change roughness is not ideal.

Let's move to step two then. Can physics help? Is this darkening physically correct, or not? How do we test? Enter the furnace! Let's put our "GGX coated" metallic object in a uniformly lit environment and see what happens:


Light hits our surface, hits our microfacets. The microfacets reflect back some light, and refract some other, according to Fresnel. If we're assuming a metal people told us that the refracted light, that goes "inside" the surface, is quickly absorbed and never comes out (becomes heat). 

It's reasonable that even in a furnace our metallic object has a color, as some of the energy will be absorbed. But what if we take our microfacets and make them always reflect all the light, no absorption? 
This means we need to set our f0 to 1 (remember, Fresnel is what controls how much light the microfacets scatter), let's try that and see what happens:


The object is still not white. Something's wrong! Ok, the inquisitive reader might still say - how do you know it's wrong. Perhaps certain directions scatter more light and others less, so we still see some shading even if all the light eventually comes out... 
If we think of BRDF and lobes it's not easy to get correct intuition. Let's instead think of how a light path, starting from the camera, looks like. It might hit some of the microfacets, one or many times, bounce around then eventually escape and connect to the furnace environment which will always be a emitting a given contant energy.
How much of that energy reaches the camera? All of it! Because regardless of which and how many microfacets we hit, all the energy is reflected and none is absorbed.

Now we have an intuition of why the image above should be white, it isn't, so we have a problem in our math...

If you studied BRDFs you know that in microfacet model we have a masking-shadowing visibility function that models which microfacets are occluded by others. What we don't typically model though is the fact that these occlusions are themselves microfacets, so the light should bounce around and eventually get out, not just be discarded. 
This is what the multiscattering models model and fix, and indeed if we did put in a furnace Kulla and Conty's GGX that we have shown before, it would generate a boring and correct white image for a fully reflective material, regardless of its roughness.

Kulla's model is not trivial though, and in many cases is likely not worth using to fix such a relatively minor problem. So. Can we be more ignorant? What if we knew how much light our BRDF gives out in a furnace, for a given roughness and viewing angle (fixing then again f0 = 1)? Could we just take that value and normalize the BRDF with it? 

Spoiler alert, we can. And not only we can but it's also trivial to do, because we already have this "furnace" value in most modern engines, in the look-up tables used for the popular split-sum image based lighting approximation. 


The split-sum table boils down the "BRDF in a furnace" (also known as directional albedo or directional-hemispherical reflectance) to a scale and bias (add) factor to be applied to the Fresnel f0 value.
In our case, we want to normalize considering f0=1 so all we do is to scale our BRDF lobe by one over bias(roughness,ndotv) + scale(roughness,ndotv). This is the result:


Indeed, we're getting some energy back at high roughness, and if we tested this in a furnace it would come up white, correctly, at f0 = 1. But it's also different from Kulla's, in particular, color is not quite as saturated in the rough materials. How comes? Well again, if you studied this problem already (cheater!) you know the answer.

Proper multiscattering adds saturation because as light hits more and more microfacets before escaping the surface, we pick up more color (raising a color to a power results in a more saturated color). But so how can this be physically wrong, but still correct in our test? 

Well, it's simple, really. By not simulating this extra saturation we are still energy conserving, but we changed the meaning of our BRDF parameters. The f0 "meaning" in our "ignorant" multiscattering BRDF is not the same as Kulla's, it results in a different albedo, but the BRDF itself is still energy conserving, it's just a different parametrization. 
Most importantly, I'd argue it's a better parametrization! Then again, remember our objectives. We don't do physics for physics' sake, we do it to help our production.

We wanted to make our parameters more orthogonal so that artists don't need to artificially "brighten" our BRDF at high roughness. If we went with the "more correct" solution (about that, this recent post by Narkowicz is a great read) we would add a different dependency, now roughness instead of darkening our materials it makes them more saturated, which would kind of defeat the purpose. 
You can imagine some scenarios where this might be desirable, but I'd say it's almost always wrong for our use-cases.

If we wanted to simulate the added saturation, there are a few easy ways. Again, going for the most "ignorant" (simple) we can just scale the BRDF by 1+f0*(1/(bias(roughness,ndotv) + scale(roughness,ndotv)) - 1), resulting in the following:


Now we are close to Kulla's solution. This is still not entirely correct if you wonder why Kulla's approximation is better, as it might not show in images, it's because ours doesn't respect reciprocity.
This might be important for Sony Imageworks, as certain offline path-tracing light transport algorithms do require it, but it's fairly irrelevant for us.

Ok, so now that we have found an approximation we're happy with we can (and should) go a step further and see what we are really doing. Yes, we have a formula, but it depends on some look-up tables. 
This isn't a big issue as we need these tables around for image-based lighting anyways, but it would be an extra texture fetch and in general it's always important to double-check our math, so let's visualize the 1/(bias(roughness,ndotv) + scale(roughness,ndotv)) function we're using:


It looks remarkably simple! Indeed, it's so simple it has a trivial approximation, you don't even need any sophisticated tool to find it so I'll just show it: 1 + 2*alpha*alpha * ndotv. This fits very well:

Approximation compared to the correct normalization factor (gray surface)

You can see that there is a bit of error in the furnace test. We could improve by doing a proper polynomial fit - yes, it turns out "2" and "1" in the formula above are not exactly the best constants. 
Actually defining "best" would be a problem all on its own, because doing a simple mean-square minimization on the normalization function doesn't really make a lot of sense (we should care about end visuals, perceptive measures, which angles matter more and so on), and I think we already spent too much time for such a small fix.
Moreover, the actual rendered images are really hard to distinguish from the table-based solution.

What is interesting is to see what could we achieve if we wanted to be even simpler, dropping the dependency from ndotv. It turns out in this case 1 + alpha*alpha does the trick decently as well.
This means that we're just applying a multiplicative factor to our BRDF to brighten it at high roughness, which just makes a lot of sense.

This of course adds more error and it starts to show as the BRDF shape changes, we get more energy at grazing angles on rough surfaces, but it might just be good enough depending on your needs:

Under-exposed to highlight that with the simpler approximation sometimes we lose light, sometimes we add.