How do we generate, use and manage mipmaps? The devil is in the details, I want to talk about this because today at work, we had a couple of problems with those, and because they are another nice example of how doing things without exactly knowing what you're doing, in a mathematical perspective, can cause subtle or less subtle problems.
Mipmap levels
On modern hardware (GPU) mipmaps are chosen according to UV gradients. The GPU computes the rate of change for the shaded point of the texture coordinates for each texture fetch instruction, by using finite differences. That's why GPU always operates on 2x2 pixel quads, and also why dynamic branching in pixel shader can't in the most general case conditionally jump over texture fetching instructions (it usually can if the texture fetch is indexed by coordinates that are directly obtained from the vertex shader interpolated outputs, as the gradient of those outputs is always available).
Those gradients are an estimate of the footprint of the currently shaded pixel (or sample) in texture space, it's like taking our square pixel, projecting it into the triangle and transforming this projection in the texture space. Note that if our texture coordinates are computed for each vertex and then linearly interpolated over the triangle (with no perspective correction), that linear estimate is exactly our footprint.
If the sampler is not using anisotropic filtering, the mipmap level is chosen by considering the gradient vector with the highest magnitude and finding a mipmap level where the texel size best fits that vector. If the sampler is using anisotropic filtering, then the mipmap is chosen according to the smallest of the two vectors, and a number of samples is averaged along the directon of the longest one. This operation is needed as if the two gradients are of radically different sizes, then considering the biggest one results in overblurring by sampling a too coarse mipmap. If even we bias GPU mipmap computations to choose a less coarse mipmap level we can't fix the problem, we'll just trade overblurring with aliasing. If the pixel footprint is not round, we need anisotropic filtering.
Optimizing samplers
Mipmapping is a win-win technique on modern GPU. It improves quality, by avoiding texture aliasing, and it improves performances, by insuring that texture fetches from nearby pixels happen in nearby texels in the chosen mipmap level. But care has to be taken. If we use anisotropic filtering or if we don't we want to minimize the anisotropy of our gradients. In the first case to reduce blurring, in the second, also to improve performances, as anisotropic filtering on some GPUs is dynamic, it performs more samples if needed.
What causes texture gradients anisotropy? Well, it depends on how the mapping is seen by the currect viewing direction, so it's a factor of the mapping itself and of the viewing angle between the triangle and the camera. We usually have total control over the first factor, as texture coordinates are most of the time authored by artists (UV mapping) and a limited one over the second. What we should try is to UV map our objects in a way that minimizes texture streching.
For most objects, that have no "preferred" viewing angle, that means simply that the mapping has to be uniform, a triangle in the UV map has to maintain a shape similar to its 3d version. Isotropic sampling will work well for front-facing triangles and blurriness will be seen at grazing angles. Failing to achieve this can lead to terrible results, yesterday I "fixed" a severely blurred texture by simply asking the artists to resize it from 1024x1024 to 512x1024, as that size (even if smaller!) resulted in better gradients.
But this also leads to an interesting optimization for objects that do have a preferred viewing angle (in texture space), like roads in a racing game, where the asphalt is usually mapped in a way that follows the road (i.e. the U coordinate is always parallel to the road the the V is perpendicular to it) and mostly seen at a given angle. In those cases we can easily compute the optimal U/V mapping ratio (to minimize gradient anisotropy) by either analitic means or by visualizing and analyzing an image that displays the gradient ratio.
Usually in those cases anisotropic filtering is required (and if even we optimize for a given viewing direction, it still will be needed, because eventually our player will do something wrong and the car will run perpendicular to the track, pessimizing our optimized gradients), but better gradients can lead to better quality and performances (on some GPUs also trilinear filtering can be tuned, and a performance gain can be obtained by limiting the regions were trilinar filter is used, but that's mostly an hack that trades quality for performance).
Mipmap generation (1)
Usually each level of a mipmap chain is half the size, in both dimensions, of the previous level. This means that mipmapped textures cost only 1.(3) times non mipmapped ones. And also makes mipmap generation trivial, for each level, we compute a texel by taking the mean of a 2x2 texel block in the preceeding mipmap level. It's so easy and fast that we can also do it for runtime generated (render-to-texture) textures. And we do. And it is wrong.
The mean that we do exactly corresponds to computing the integral over the texel's footprint of the original texture texels. If we interpret our texture as a bidimensional function, made of square steps (the "little squares" model), then it's an exact way of computing a resampled version of it, as it's analytical, no aliasing occours. The problem is in the model we are using, our texture does not encode that function, and it's easy to see that is not what we want, that function if magnified will show all the texel's steps as big, square pixels. Steps are discontinous, and are a source of infinite frequency in the Fourier domain. Also, when we sample the texture, we don't usually interpet it as made of steps (if we're not point filtering it), we use bilinear interpolation. So we have to change our framework, our texels are not little squares, are sample points!
Boring, long digression
Let's introduce a bit of point sampling theory. When we take uniform samples of a function (equally spaced) we can exactly reconstruct that function from the samples if that function did not contain too high frequencies. How high? Up to a limit, that is directly proportional to the sampling density and that's called the Nyquist limit.
Unfortunately in our rendering work, most of the times we're sampling a geometrical scene, and that scene contains discontinuities, and those discontinuities yield infinite frequencies, so we can't sample it in a way where the samples contain all the information we need. Some frequencies will mix with others, in an unpleasing way called aliasing. There's an easy way to fight aliasing, and it's to prefilter the function we're sampling to cut the nasty frequencies. In other words we want to take our source function in the frequency space and multiply its frequencies by a step function to cut everything above a given limit. Multiplying in frequency space equals to convolution in the image space, the step in frequency equals to a sinc function.
We incour in two problems. First one is that the representation we have of our function is not in any way analitically filtrable. The second one is that the sinc function has an infinite support. So this approach is not useful. What we do instead to fight aliasing is to take many samples, to push our Nyquist limit as far as possible, and then reconstruct from those samples a bandlimited function, by convolving them with an approximate version of the sinc (linear and cubical approximations exists other than more complex windowed sincs), and then using for our pixel the integral over the pixel's footprint of that reconstructed function (in practice, it's as simple as multiplying the samples with the filter function, centered in the pixel center and taking their mean). Or if we can take non uniform samples, trade aliasing for noise by using random sampling positions (or better, quasi-random, or randomized-quasi-random)
Mipmap generation (2)
We can see our texture as a point sampled version of a bandlimited function, containing frequencies only up to the Nyquist limit. If so, to resample it, we have to reconstruct our function from our samples, and then evaluate it at the new sample locations. Bilinear filtering used by GPU samplers approximates the function with a linear kernel. Nice thing is that for that linear kernel, taking the average is still the correct analityical resampling we need, so mipmap generation by averaging is coherent with that the GPU does when sampling. But it also introduces overblurring.
The bottom line is that using better resizing methods, even if they're not coherent with how the GPU intepretes samples, leads to sharper mipmaps and better quality.
But that's not the only thing we have to be aware of when generating the mipmaps. We also have to take care of what kind of data we store in the texture itself. Is that a color? Colors are usually stored in a non linear, gamma corrected space. That's fine, it optimizes the bit use storing colors in a space that is visually linear(ish). But it's bad when we have to perform computations on them because our math prefer to work on numbers that are uniformly spaced. So when we generate mipmaps for color textures, we should take care of the degamma/gamma problem, as we do in shaders and samplers (see valve publications, post processing in the orange box, or the post that started it all, at least for me, by Peter Shirley).
Failing to correctly process gamma in mipmap generation usually produces mipmaps that are darker than they should be.
What if they contain other data? Well, of course, similar care has to be taken. Normalmaps are hard (see this paper and this one too). At least we should always take care that the normals do not change their mean length across the mipmap levels (i.e. maintain them normalized if we only want normal vectors), but that does not solve all the problems. Normals are directions on a sphere, normalmaps are four dimensional functions, reampling is not trivial. The easiest thing is to thing about them as the gradients of another function, that we can obtain by solving a Poisson integral (that's actually quite fast to do in practice). Then resample that function, and compute from that our gradients. That's also a good way to think about all operation involving normal data (see the GDC 2008 slides "care and feeding of normal vectors", by Jörn Loviscach).
Final remarks
Computer graphics is both easy and hard. You will always have an image, but if you don't know what are you doing and why, subtle errors will creep in, that probably won't be filed as bugs, but will compromise quality, or performance, and worse, artists will try to fix those errors by tweaking their assets, producing images that usually are fine, but still do not look quite right. Most of the times we blame our cheap mathematical models for that failures and seek for more complex tools, or worse, introduce hacks and tweaks on those models to bend them into something that looks better but it's even more wrong.
We should start right, do the right maths, and only then blame our cheap models and seek for something else, always trying to know why we are doing that, basing our maths on a solid base.
Mipmap levels
On modern hardware (GPU) mipmaps are chosen according to UV gradients. The GPU computes the rate of change for the shaded point of the texture coordinates for each texture fetch instruction, by using finite differences. That's why GPU always operates on 2x2 pixel quads, and also why dynamic branching in pixel shader can't in the most general case conditionally jump over texture fetching instructions (it usually can if the texture fetch is indexed by coordinates that are directly obtained from the vertex shader interpolated outputs, as the gradient of those outputs is always available).
Those gradients are an estimate of the footprint of the currently shaded pixel (or sample) in texture space, it's like taking our square pixel, projecting it into the triangle and transforming this projection in the texture space. Note that if our texture coordinates are computed for each vertex and then linearly interpolated over the triangle (with no perspective correction), that linear estimate is exactly our footprint.
If the sampler is not using anisotropic filtering, the mipmap level is chosen by considering the gradient vector with the highest magnitude and finding a mipmap level where the texel size best fits that vector. If the sampler is using anisotropic filtering, then the mipmap is chosen according to the smallest of the two vectors, and a number of samples is averaged along the directon of the longest one. This operation is needed as if the two gradients are of radically different sizes, then considering the biggest one results in overblurring by sampling a too coarse mipmap. If even we bias GPU mipmap computations to choose a less coarse mipmap level we can't fix the problem, we'll just trade overblurring with aliasing. If the pixel footprint is not round, we need anisotropic filtering.
Optimizing samplers
Mipmapping is a win-win technique on modern GPU. It improves quality, by avoiding texture aliasing, and it improves performances, by insuring that texture fetches from nearby pixels happen in nearby texels in the chosen mipmap level. But care has to be taken. If we use anisotropic filtering or if we don't we want to minimize the anisotropy of our gradients. In the first case to reduce blurring, in the second, also to improve performances, as anisotropic filtering on some GPUs is dynamic, it performs more samples if needed.
What causes texture gradients anisotropy? Well, it depends on how the mapping is seen by the currect viewing direction, so it's a factor of the mapping itself and of the viewing angle between the triangle and the camera. We usually have total control over the first factor, as texture coordinates are most of the time authored by artists (UV mapping) and a limited one over the second. What we should try is to UV map our objects in a way that minimizes texture streching.
For most objects, that have no "preferred" viewing angle, that means simply that the mapping has to be uniform, a triangle in the UV map has to maintain a shape similar to its 3d version. Isotropic sampling will work well for front-facing triangles and blurriness will be seen at grazing angles. Failing to achieve this can lead to terrible results, yesterday I "fixed" a severely blurred texture by simply asking the artists to resize it from 1024x1024 to 512x1024, as that size (even if smaller!) resulted in better gradients.
But this also leads to an interesting optimization for objects that do have a preferred viewing angle (in texture space), like roads in a racing game, where the asphalt is usually mapped in a way that follows the road (i.e. the U coordinate is always parallel to the road the the V is perpendicular to it) and mostly seen at a given angle. In those cases we can easily compute the optimal U/V mapping ratio (to minimize gradient anisotropy) by either analitic means or by visualizing and analyzing an image that displays the gradient ratio.
Usually in those cases anisotropic filtering is required (and if even we optimize for a given viewing direction, it still will be needed, because eventually our player will do something wrong and the car will run perpendicular to the track, pessimizing our optimized gradients), but better gradients can lead to better quality and performances (on some GPUs also trilinear filtering can be tuned, and a performance gain can be obtained by limiting the regions were trilinar filter is used, but that's mostly an hack that trades quality for performance).
Mipmap generation (1)
Usually each level of a mipmap chain is half the size, in both dimensions, of the previous level. This means that mipmapped textures cost only 1.(3) times non mipmapped ones. And also makes mipmap generation trivial, for each level, we compute a texel by taking the mean of a 2x2 texel block in the preceeding mipmap level. It's so easy and fast that we can also do it for runtime generated (render-to-texture) textures. And we do. And it is wrong.
The mean that we do exactly corresponds to computing the integral over the texel's footprint of the original texture texels. If we interpret our texture as a bidimensional function, made of square steps (the "little squares" model), then it's an exact way of computing a resampled version of it, as it's analytical, no aliasing occours. The problem is in the model we are using, our texture does not encode that function, and it's easy to see that is not what we want, that function if magnified will show all the texel's steps as big, square pixels. Steps are discontinous, and are a source of infinite frequency in the Fourier domain. Also, when we sample the texture, we don't usually interpet it as made of steps (if we're not point filtering it), we use bilinear interpolation. So we have to change our framework, our texels are not little squares, are sample points!
Boring, long digression
Let's introduce a bit of point sampling theory. When we take uniform samples of a function (equally spaced) we can exactly reconstruct that function from the samples if that function did not contain too high frequencies. How high? Up to a limit, that is directly proportional to the sampling density and that's called the Nyquist limit.
Unfortunately in our rendering work, most of the times we're sampling a geometrical scene, and that scene contains discontinuities, and those discontinuities yield infinite frequencies, so we can't sample it in a way where the samples contain all the information we need. Some frequencies will mix with others, in an unpleasing way called aliasing. There's an easy way to fight aliasing, and it's to prefilter the function we're sampling to cut the nasty frequencies. In other words we want to take our source function in the frequency space and multiply its frequencies by a step function to cut everything above a given limit. Multiplying in frequency space equals to convolution in the image space, the step in frequency equals to a sinc function.
We incour in two problems. First one is that the representation we have of our function is not in any way analitically filtrable. The second one is that the sinc function has an infinite support. So this approach is not useful. What we do instead to fight aliasing is to take many samples, to push our Nyquist limit as far as possible, and then reconstruct from those samples a bandlimited function, by convolving them with an approximate version of the sinc (linear and cubical approximations exists other than more complex windowed sincs), and then using for our pixel the integral over the pixel's footprint of that reconstructed function (in practice, it's as simple as multiplying the samples with the filter function, centered in the pixel center and taking their mean). Or if we can take non uniform samples, trade aliasing for noise by using random sampling positions (or better, quasi-random, or randomized-quasi-random)
Mipmap generation (2)
We can see our texture as a point sampled version of a bandlimited function, containing frequencies only up to the Nyquist limit. If so, to resample it, we have to reconstruct our function from our samples, and then evaluate it at the new sample locations. Bilinear filtering used by GPU samplers approximates the function with a linear kernel. Nice thing is that for that linear kernel, taking the average is still the correct analityical resampling we need, so mipmap generation by averaging is coherent with that the GPU does when sampling. But it also introduces overblurring.
The bottom line is that using better resizing methods, even if they're not coherent with how the GPU intepretes samples, leads to sharper mipmaps and better quality.
But that's not the only thing we have to be aware of when generating the mipmaps. We also have to take care of what kind of data we store in the texture itself. Is that a color? Colors are usually stored in a non linear, gamma corrected space. That's fine, it optimizes the bit use storing colors in a space that is visually linear(ish). But it's bad when we have to perform computations on them because our math prefer to work on numbers that are uniformly spaced. So when we generate mipmaps for color textures, we should take care of the degamma/gamma problem, as we do in shaders and samplers (see valve publications, post processing in the orange box, or the post that started it all, at least for me, by Peter Shirley).
Failing to correctly process gamma in mipmap generation usually produces mipmaps that are darker than they should be.
What if they contain other data? Well, of course, similar care has to be taken. Normalmaps are hard (see this paper and this one too). At least we should always take care that the normals do not change their mean length across the mipmap levels (i.e. maintain them normalized if we only want normal vectors), but that does not solve all the problems. Normals are directions on a sphere, normalmaps are four dimensional functions, reampling is not trivial. The easiest thing is to thing about them as the gradients of another function, that we can obtain by solving a Poisson integral (that's actually quite fast to do in practice). Then resample that function, and compute from that our gradients. That's also a good way to think about all operation involving normal data (see the GDC 2008 slides "care and feeding of normal vectors", by Jörn Loviscach).
Final remarks
Computer graphics is both easy and hard. You will always have an image, but if you don't know what are you doing and why, subtle errors will creep in, that probably won't be filed as bugs, but will compromise quality, or performance, and worse, artists will try to fix those errors by tweaking their assets, producing images that usually are fine, but still do not look quite right. Most of the times we blame our cheap mathematical models for that failures and seek for more complex tools, or worse, introduce hacks and tweaks on those models to bend them into something that looks better but it's even more wrong.
We should start right, do the right maths, and only then blame our cheap models and seek for something else, always trying to know why we are doing that, basing our maths on a solid base.
10 comments:
Also with normal maps, I've read of storing a spec value in the alpha channel that gets much lower in your mipmaps.
This way the high frequency details that cannot be represented in the smaller mipmaps more correctly appear as "soft".
Unfortunately I don't know where to find the paper/presentation/slides offhand. This was the nvidia presentation where there was also the recommendation that if you know the destination resolution and target geometry you can tweak the color of your mipmaps to do fog, the example is green "fog" looking down a brown sewer pipe.
Nick, you're not referring to the "Toksvig factor" paper, are you?
Hi !
I agree, mipmaps is certainly one of the easiest idea in computer graphics, but much more complicated. As you seem to be a "mipmap guru", I would like to ask you some questions, as neither at GameDev nor GPGPU they achieved to answer me yet.
I'm trying to implement a paper about illumination that makes heavy use of mipmaps. Until now, I've only used them for simple texture use (generate mipmaps, and let's OpenGL do the rest).
But here, it's much more complicated. Here is the idea : I have to render into several textures normals, positions, intensities from the light point of view, and an additionnal parameter : a validity parameter (1 means valid, and 0 invalid) that is stored into an alpha cannal (the paper says it is better to store it in an alpha value rather than a A8 texture). So the intensity will be stored, for instance, in the alpha cannal of the intensity texture (which is R16G16B16A16F).
The level 0 (ie. base level) has validity 1 for EACH of the pixels. And then comes the problem. The paper says that I have to generate each sublevels with hardware accelerated generation of mipmaps, which is quite easy, and then add the validity parameter in the fragment shader in a bottom-up algorithm.
I haven't found a lot of things about that on the Internet, but as I can't add value to a uniform texture, I decided to create my own mipmaps on the GPU, but I really don't know if I do right :
1) I bind each base level as a uniform sampler2D.
2) I attach the next levels as several render targets and then perform MRT. Then, in the fragment shader, I store into each texture (ie. next level) the average of the four fragments, and I compute the validity in the same fragment shader.
The validity is quite simple to compute : for instance, I compute the average value of the four sub-pixels positions, and compare each of those pixels with the average value, and if one of them is superior to a threshold, it is invalid, otherwise it's valid.
By doing this, I think I can generate my mipmaps efficiently AND store in the same passes the validity (one pass for generating one level).
Here is a snippet of the fragment shader for one texture : (I put it on MegaUpload) : http://www.megaupload.com/fr/?d=QWU3PEZC
But I really don't know if I do it well, or if there are better ways to do that, since efficiency is really important, that I have to generate those mipmaps each frames, and that I have not only one, but three textures to generate (Positions, Intensity, Normals).
Thanks for you help and sorry for the English and the very long post !
Unfortunately I haven't really understood the algorithm you're trying to implement.
If you give me a link to the paper, maybe I can tell you something more.
What I can say for sure is that "I attach the next levels as several render targets and then perform MRT" makes no sense, you can't MRT on mipmap levels as MRT targets have to be all of the same size.
The usual way to generate mipmaps on the GPU is to bind a level as a sampler and the next one as the rendertarget, until you have processed all the levels. It's useful when you don't have automatic mipmap creation or when for example, you have to create mipmaps of shadowmaps (that require a min or max filtering, not average).
Hi,
Sorry for not having been clear !
"The usual way to generate mipmaps on the GPU is to bind a level as a sampler and the next one as the rendertarget, until you have processed all the levels."
This is what I wanted to say :). When I told about MRT, it was because I created several textures at once (I had three textures, so I bind texture1 with GL_TEXTURE0, texture2 and texture3, and then do MRT with gl_FragData[0] being the next level of texture one, gl_FragData[1] the next level of texture two...
Here is the algorithm : it's a new algorithm for real time indirect illumination and it is, I have to say, really clever. I understand it really well, but I have difficulties in implementing it... : http://ki-h.com/archive/KiH-VC08-LP.pdf
In fact, I read it again, and in fact it says : "You can store this value in an A8 texture" (read 3.2).
I think it will be simpler to :
1) Create three buffers R16G16B16 (base level).
2) Do hardware accelerated mipmaps
3) Then fill an A8 texture with validity... It will be a lot of texture fetches :D.
Next step will be to understand how to access to mipmap levels correctly inside the fragment shader. There isn't much about that on the Internet :/
I'll read the paper when I have the time, now I'm busy building my furniture (IKEA stuff).
For accessing a given lod level, if you need it to be per-pixel you have to use tex2dlod, that depending on your hardware can have a performance penality (some GPUs implement it by emitting 1-pixel quads with ad-hoc generated gradients to match the lod). If you need to force a mipmap level per object instead, you can set the min-max mipmaps in the sampler state.
Thank you ;).
Good luck with your IKEA furniture :).
Hi again :)
I think I do it correctly this time ! So, what I did is :
1) Generating my textures, and generating hardware-accelerating mipmap, so this is very fast.
2) Next, I fill a validity texture with that shader : http://www.megaupload.com/fr/?d=1NFERP2M
For now, I use an RGBA16F texture because I can render it directly, but next an A8 texture will be sufficient because I ahve only one value to store (0 or 1).
I draw each validity texture and it seems quite coherent : low level are more white (because more valid), while high level are nearly all black. I think I've used correctly texture fetches with mipmap :) (the third parameter in texture2D is the mipmap level).
ah ye, sorry I told you the HLSL way of doing things I really hate OpenGL shading language, under OpenGL I use Nvidia CG, that's waaay better in my opinion.
I prefer GLSlang, because it integrates really easily with OpenGL :).
Now I try to traverse the mipmap tree, much more difficult :/
Post a Comment