Search this blog

Showing posts with label Stupid rendering tricks. Show all posts
Showing posts with label Stupid rendering tricks. Show all posts

10 December, 2013

Never again: point lights

Distant, point, spotlight, am I right? Or maybe you can merge point and spot into an uberlight. No.
Have you ever actually seen a point-light in the real world? It's very rare, isn't it? Even bare-bulbs don't exactly project uniformly in the hemisphere...
If you're working with a baked-GI solution that might not affect you much, in the end you can start with a point, construct a light fixture around it and have GI take care of that. But even in the baked world you'll have analytic lights most often. In deferred, it's even worse. How many games show "blobs" of light due to points being placed around? Too many!
With directional and spots we can circumvent the issue somehow by adding "cookies", 2d projected textures. With points we could use cube textures, but in practice I've seen too many games not doing it (authoring also could be simpler than hand-painting cubes...)
During Fight Night (boxing game) one little feature we had was light from camera flashes, which was interesting as you could clearly see (for a fraction of a second) the pattern they made on the canvas (journalists are all around the ring) and there it was the first time I noticed how much point lights suck.
The solution was easy, really, I created a mix of a point and distant light, which gave a nice directional gradient to the flash without a cone shape of spots. You could think of the light as being a point and the "directional" part being a function that made the emission non constant on the hemisphere. 


It's a multiply-add. Do it. Now!

Minimum-effort "directional" point


Another little trick that I employed (which is quite different) is to "mix" point and directional in terms of the incoming light normal on the shaded point (biasing point normals towards a direction), at the time an attempt to create lights that were "area" somehow, softer than pure points. But that was really a hack...
Nowadays you might have heard of IES lights (see this and this for example), which are light emission profiles often measured by light manufacturers (which can be converted to cubemaps, by the way). 
I would argue -against- them. I mean sure, if you're going towards a cubemap based solution sure, have that as an option, but IES are really meaningful if you have to go out in the real world and buy a light to match a rendering you did of an architectural piece, if you are modeling fantasy worlds there is no reason to make your artists go through a catalog of light fixtures just to find something that looks interesting. What is the right IES for a pointlight inside a barrel set on fire?

A more complicated function

A good authoring tool would be imho just a freehand curve, that gets baked into a simple 1d texture (in realtime, please, let your artists experiment interactively), mapped with light direction dot (light position-shaded point).
If you want to be adventurous, you can take a tangent vector for the light and add a second dot product and lookup. And add the ability of coloring the light as well, a lot of lights have non-constant colors as well, go around and have a look (i.e. direct light vs light reflected out of the fixture or passing through semi-transparent material...).

1d lookups are actually -better- than a cubemap cookies, because if you see in real world example many fixtures generate very sharp discontinuities in the light output, which are harder (require much more resolution) to capture in a cubemap...
Exercise left for the reader: bake the light profile approximating a GI solution, automatically adapting it to the enviroment the light was "dropped" in...


29 July, 2013

Tiny HDR writer

It's really annoying to see the lack of simple (i.e. one function) writers of HDR image formats. I wrote my own Radiance HDR (RGB-exponent) and floating point TIFF writers. (forgive the sin of using std::string as a vector of bytes, was done as certain file write function in the framework used strings). 

They are rather ignorant as you will see from the source, rather slow too... But making them clever is left as an exercise to the reader, let's say this is really nothing more than documentation on the formats themselves, or better, on the minimal subset of the formats you need to know in order to write out HDR data.

Also you might want, as the TIFF routine writes raw float data, to convert it into and "inline" operation (i.e. BeginTiff, PushFloat, EndTiff kind of interface), which is simple enough especially if you move the IFD before the image data... Also, it would be much easier if it wrote the endian in the header based on your current platform file output order, making it easier than byte-by-byte writing as it is now.

UpdateAras Pranckevičius tweeted his EXR writer, so I was wrong, where was at least one simple HDR writer out there already. Also, EXR is more widespread than floating point TIF, and even easier... Partially related, Jon Olick has a neat single file JPEG and MPEG writers, handy (and I'm sure everybody knows about stb_image and image write, but just in case...)!


// http://paulbourke.net/dataformats/tiff/ and http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf
// Not all programs support floating-point TIFFs, this was tested reading it back using Picturenaut and HDRShop
static std::string EncodeFloatTIFF(unsigned int wunsigned int hfloatRGBdataunsigned int floatsPerPixel = 4)
{
 assert(floatsPerPixel>=3); // we write only three floats (RGB) but support larger strides
 
 std::string outData;
 
 unsigned int image_size_bytes = w*h*3 * sizeof(float);
 outData.reserve(image_size_bytes + 500); // 500 is some slack for headers etc, I should compute it exactly... :)
 
 // Header
 outData.push_back(0x4d); outData.push_back(0x4d); // First two chars specify MM for big endian TODO - convert to little to make it easier on x86
 outData.push_back(0); outData.push_back(42); // Tiff version ID 
 
 unsigned int IFD_offset = 8 + image_size_bytes; // IFD table usually follows image
 outData.push_back((IFD_offset & 0xff000000) >> 24);
 outData.push_back((IFD_offset & 0xff0000) >> 16);
 outData.push_back((IFD_offset & 0xff00) >> 8);
 outData.push_back(IFD_offset & 0xff);
 
 // Image data
 for (unsigned int y=0; y<h; y++) 
 {
  for (unsigned int x=0; x<w; x++) 
  {
   unsigned int f = 0;
   for(; f< 3; f++,RGBdata++)
   {
    uint32_t floatAsInt = *reinterpret_cast<uint32_t*>(RGBdata);
    outData.push_back((floatAsInt & 0xff000000) >> 24);
    outData.push_back((floatAsInt & 0xff0000) >> 16);
    outData.push_back((floatAsInt & 0xff00) >> 8);
    outData.push_back(floatAsInt & 0xff);
   }
   for(; f<floatsPerPixel; f++)
    RGBdata++;
  }
 }
 
 // IFD Tags
 unsigned int NUM_IFD = 12;
 
 assert(outData.size() == IFD_offset);
 outData.push_back(0);
 outData.push_back(NUM_IFD); // Number of tags
 
 outData.push_back(1); outData.push_back(0); // -- width tag
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back((w & 0xff00) >> 8); outData.push_back(w & 0xff); // value
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(1); // -- height tag
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back((h & 0xff00) >> 8); outData.push_back(h & 0xff); // value
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(3); // -- compression
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(1); // none
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(6); // -- photometric interpretation
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(2); // RGB
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(0x12); // -- orientation
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(1); // 
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(0x15); // -- samples per pixel
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(3); // three samples (RGB)
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(0x16); // -- rows per strip
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back((h & 0xff00) >> 8); outData.push_back(h & 0xff); // value
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(0x17); // -- strip byte count (total size)
 outData.push_back(0); outData.push_back(4); // long format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back((image_size_bytes & 0xff000000) >> 24);
 outData.push_back((image_size_bytes & 0xff0000) >> 16);
 outData.push_back((image_size_bytes & 0xff00) >> 8);
 outData.push_back(image_size_bytes & 0xff);
 
 outData.push_back(1); outData.push_back(0x1c); // -- planar configuration
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(1); // single image plane
 outData.push_back(0); outData.push_back(0); // padding (as we specified short value)
 
 outData.push_back(1); outData.push_back(0x11); // -- strip offset
 outData.push_back(0); outData.push_back(4); // long format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(1); // single value
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(8); // image starts right after the 8-byte header
 
 outData.push_back(1); outData.push_back(2); // -- bits per sample
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(3); // three values
 unsigned int BPS_offset = 8 + image_size_bytes + 2 + (NUM_IFD * 12) + 4; // offset to data (as data is > 4 bytes)
 outData.push_back((BPS_offset & 0xff000000) >> 24);
 outData.push_back((BPS_offset & 0xff0000) >> 16);
 outData.push_back((BPS_offset & 0xff00) >> 8);
 outData.push_back(BPS_offset & 0xff);
 
 outData.push_back(1); outData.push_back(0x53); // -- sample format
 outData.push_back(0); outData.push_back(3); // short format
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(3); // three values
 unsigned int SF_offset = BPS_offset + 3*2; // offset to data (as data is > 4 bytes)
 outData.push_back((SF_offset & 0xff000000) >> 24);
 outData.push_back((SF_offset & 0xff0000) >> 16);
 outData.push_back((SF_offset & 0xff00) >> 8);
 outData.push_back(SF_offset & 0xff);
 
 outData.push_back(0); outData.push_back(0); outData.push_back(0); outData.push_back(0); // IFD END
 
 // bits per sample data
 assert(outData.size() == BPS_offset);
 outData.push_back(0); outData.push_back(8*sizeof(float)); outData.push_back(0); outData.push_back(8*sizeof(float)); outData.push_back(0); outData.push_back(8*sizeof(float));
 
 // sample format data (1 = uint, 2 = sint, 3 = float)
 assert(outData.size() == SF_offset);
 outData.push_back(0); outData.push_back(3); outData.push_back(0); outData.push_back(3); outData.push_back(0); outData.push_back(3);
 
 return outData;
}

static std::string EncodeRadianceHDR(unsigned int wunsigned int hfloatRGBdataunsigned int floatsPerPixel = 4)
{
assert(floatsPerPixel >= 3); // we write only three floats (RGB) but support larger strides
// Key-Value pairs after RADIANCE are optional //const char header[] = "#?RADIANCE\nEXPOSURE=1\nGAMMA=2.2\nFORMAT=32-bit_rle_rgbe\n\n"; const char header[] = "#?RADIANCE\nFORMAT=32-bit_rle_rgbe\n\n"; std::string outData; outData.reserve(w*h*4 + sizeof(header) + 200); // 200 is some slack... std::vector<unsigned char> scanline[4]; scanline[0].resize(w); scanline[1].resize(w); scanline[2].resize(w); scanline[3].resize(w); outData.append(header, sizeof(header)-1); outData.append("-Y ", 3); outData += std::to_string(h); outData.append(" +X ", 4); outData += std::to_string(w); outData.push_back('\n'); for(unsigned int y=0 ; y<h; y++) { // RLE header // TODO looking at stb_image there seems to be also a non RLE line mode, which we should use as we don't really encode RLE here, // but I'm not sure that the way stb_image decodes the line header is standard-compliant... outData.push_back(2); outData.push_back(2); outData.push_back( (unsigned char)((w>>8) & 0xff) ); outData.push_back( (unsigned char)(w & 0xff) ); for(unsigned int x=0 ; x<w; x++) { unsigned char encodedPixel[4]; float r = RGBdata[0], g = RGBdata[1], b= RGBdata[2]; //r /= 179.0; g /= 179.0; b /= 179.0;   double maxV = r; if(maxV < g) maxV = g; if(maxV < b) maxV = b; if(maxV < std::numeric_limits<double>::epsilon()) { encodedPixel[0] = encodedPixel[1] = encodedPixel[2] = encodedPixel[3] = 0; } else { int e; maxV = frexp(maxV, &e) * 256.0/maxV; encodedPixel[0] = unsigned char(maxV * r); encodedPixel[1] = unsigned char(maxV * g); encodedPixel[2] = unsigned char(maxV * b); encodedPixel[3] = unsigned char(e + 128); } scanline[0][x] = encodedPixel[0]; scanline[1][x] = encodedPixel[1]; scanline[2][x] = encodedPixel[2]; scanline[3][x] = encodedPixel[3]; RGBdata += floatsPerPixel; } // For simplicity, write all as it was not RLE... for(unsigned int line=0; line < 4; line++) { auto scanIter = scanline[line].begin(); auto scanEnd = scanline[line].end(); while( scanIter < scanEnd ) { size_t remaining = scanEnd-scanIter; // the last bit in a char, if set, would indicate a RLE run, we want to avoid that unsigned char toWrite = remaining>127 ? 127 : (unsigned char)remaining;  outData.push_back(toWrite); // length of the "non run" data outData.append((char*)& scanIter[0], (size_t)toWrite); scanIter += (size_t)toWrite; } } } return outData; }

14 July, 2013

DX11: GPU "printf"

So, first a little "announcement": I'm crafting a small DX11 rendering framework in my spare time. I want to have it opensourced, and it's based on MJP's excellent SampleFramework11.
The goals are to provide an environment roughly as fast to iterate upon as FXComposer was (I consider it dead now...) but for programmers, without being a "shader editor".
If you're interested in collaborating, send me an email at c0de517e (it's a gmail account) with a brief introduction, there is an interesting list of things to do.

That said, this is a little bit of functionality Maurizio Cerrato and I have been working on in a couple of days, a "printf" like function for pixel (and compute) shaders. It all started when chatting Daniel Sewell (a brilliant guy, was my rendering lead on Fight Night) he made me notice that he found, working on CS that a neat way to debug them was to display all kinds of interesting debug visualizations by having geometry shaders "decode" buffers and emit lines.

if(IsDebuggedPixel(input.PositionSS.xy)) DebugDrawFloat(float2(ssao, bloom.x), clipPos);
The astute readers will at this point have already all figured it out. PS and CS support append buffers, so a "printf" has only to append some data to a buffer that later you can convert to lines in a geometry shader.

You could emit such data per each PS invocation and later sift through it and display what you needed in a meaningful way, but that will be quite slow (and at that point you might want to consider just packing everything into some MRT outputs). The idea behind appendbuffers is to do the work only for a handful of invocations (e.g. screen positions, if current sv_position equals the pixel to "debug" then GPU printf...).

In order to keep everything snappy we also minimize the structure size we use in the append buffer, you can't really printf strings, the debugger so far support only one to three floats w/color and position or lines. Lines is were we started really, our struct containts two end-points a color (index) and a flag which distinguishes lines from float printf. Floats just reinterpret one of the endpoints as the data to print.

This append buffer structure gets then fed to a VS/GS that is invocated twice the times the append buffer count (via draw indirect, you need to multiply by two the count in a small CS, remember, you can't emit the start/end vertices as two separate append calls because the order of these is not deterministic, the vertices will end all mixed in the buffer!), and the GS emits extra lines if we're priting floats to display a small line-based font.

If you're thinking that is lame, well it is, there are certain limitations in the number of primitives the GS can emit that effectively limit the number of digits you can display, and you have to be careful about that, I "optimized" the code to display the most digits possible which unfortunately gives you very low-precision 3-float printf and higher precision 2-float and 1-float (you could though call three times the 1-float version... as there the ordering of the three call doesn't matter).

Keeping the same number of printed digits, the point has to float...
Why not using a bitmap font instead? Glad you asked. Lazyness, partially justified by the fact that I didn't want to have two different append buffers, one for lines and one for fonts, as the append buffers are a scarce resource on DX11. But it's a very lame justification, because there are plenty of workarounds left for the reader, you could filter the append buffer in two drawcalls in a computer shader, or even draw lines as quads, which would probably be better anyways!

Anyhow, together with shader hot-reloading (which everybody has, right), this is a quite a handy trick. Bonus: on a similar note, have a look at this shadertoy snippet by my coworker Paul Malin... brilliant guy!

Some code, without doubt full of bugs:

Snippet from the CPU/C++ side, drawing the debug lines...
void ShaderDebugDraw(ID3D11DeviceContextcontextconst Float4x4viewProjectionMatrixconst Float4x4projMatrix )
{
    SampleFramework11::PIXEvent market(L"ShaderDebug Draw");
 
    context->CopyStructureCount(AppendBufferCountCopy, 0, AppendBuffer.UAView);
 
    // We need a compute shader to write BufferCountUAV, as we need to multiply CopyStructureCount by two 
    ID3D11ShaderResourceView* srViews[] = { AppendBuffer.SRView };
    ID3D11UnorderedAccessView* uaViews[] = { AppendBufferCountCopyUAV };
    UINT uavsCount[] = { 0 };
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);
    context->CSSetShader(DebugDrawShader.AcquireCS(), NULL, 0);
    context->Dispatch(1,1,1);
    context->CSSetShader(NULLNULL, 0);
    uaViews[0] = NULL;
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);
 
    // Set all IA stage inputs to NULL, since we're not using it at all.
    void* nulls[D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT] = { NULL };
 
    context->IASetVertexBuffers(0, D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT, (ID3D11Buffer**)nulls, (UINT*)nulls, (UINT*)nulls);
    context->IASetInputLayout(NULL);
    context->IASetIndexBuffer(NULLDXGI_FORMAT_UNKNOWN, 0);
 
    // Draw debug lines
    srViews[0] =  AppendBuffer.SRView;
    context->VSSetShaderResources(0, 1, srViews);
    context->GSSetShaderResources(0, 1, srViews);
    context->GSSetShader(DebugDrawShader.AcquireGS(), NULL, 0);  
    context->VSSetShader(DebugDrawShader.AcquireVS(), NULL, 0);
    context->PSSetShader(DebugDrawShader.AcquirePS(), NULL, 0);
 
    shaderDebugDrawDataVS.Data.ViewProjection = viewProjectionMatrix;
    shaderDebugDrawDataVS.Data.Projection = projMatrix;
    shaderDebugDrawDataVS.ApplyChanges(context);
    shaderDebugDrawDataVS.SetVS(context, 0);
 
    context->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_LINELIST);
    context->DrawInstancedIndirect(AppendBufferCountCopy, 0);
[...]

This is roughly how the shader library looks for emitting debug lines/debug numbers from pixel shaders
struct ShaderDebugLine
{
 float3 posStart;
 float3 posEnd;
 uint color;
 uint flag;
};
 
cbuffer ShaderDebugData : register(b13)
{
 float2 debugPixelCoords;
 float2 oneOverDisplaySize;
 int debugType;
};
void DebugDrawFloat(float3 number,  float3 pos, int color = 0, uint spaceFlag = SHADER_DEBUG_FLAG_2D)
{
 ShaderDebugLine l;
 l.posStart = pos;
 l.color = color;
 l.posEnd = number;
 l.flag = SHADER_DEBUG_PRIM_FLOAT3|spaceFlag;
 ShaderDebugAppendBuffer.Append(l);
}
float2 SVPosToClipspace(float2 svPos, float2 oneOverDisplaySize) { return (svPos * oneOverDisplaySize) * float2(2,-2) + float2(-1,1); }
 
bool IsDebuggedPixel(float2 svPos)
{
 // This is a bit tricky because it depends on the MSAA pattern
 
 if(debugType == 1)
  return dot(abs(debugPixelCoords - svPos + float2(0.5,0.5)), 1.0.xx) <= 0.01f;
 else if(debugType == 2)
  return dot(abs(svPos % float2(100,100)), 1.0.xx) <= 1.01f;
 else return false;
}

And finally, the VS/GS/CS shaders needed to draw the debug buffer emitted from the various PS executions:
static const int DigitFontOffsets[] =
{
 0, 8, 10, 20, 30, 38, 48, 58, 62, 72, 82, 84, 86
};
 
static const float DigitFontScaling = 0.03;
static const float DigitFontWidth = 0.7 * DigitFontScaling; // The font width is 0.5, but we add spacing
static const int DigitFontMaxLinesPerDigit = 5;
static const float2 DigitFont[] =
{
 /* 0 */
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
 /*1*/
 float2(0.5f, 0.f), float2(0.5f, -1.f),
 /*2*/
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, -1.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 /*3*/
 float2(0.f, 0.f), float2(0.5f,0.f), float2(0.5f,0.f), float2(0.5f,-0.5f),
 float2(0.5f,-0.5f), float2(0.f,-0.5f), float2(0.5f,-0.5f), float2(0.5f,-1.f),
 float2(0.5f,-1.f), float2(0.f,-1.f),
 /*4*/
 float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, 0.f), float2(0.5f, -0.5f), float2(0.5f, -1.f),
 /*5*/
 float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, -1.f), float2(0.f, 0.f), float2(0.5f, 0.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 /*6*/
 float2(0.f, 0.f), float2(0.f, -1.f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, -1.f), /* avoidable */ float2(0.f, 0.f), float2(0.5f, 0.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 /*7*/
 float2(0.5f, 0.f), float2(0.5f, -1.f), float2(0.5f, 0.f), float2(0.f, 0.f),
 /* 8 */
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
 float2(0.f, -0.5f), float2(0.5f, -0.5f),
 /*9*/
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, 0.f),
 float2(0.5f, -1.f), float2(0.f, -1.f),
 /*-*/
 float2(0.5f, -0.5f), float2(0.f, -0.5f),    
 /*.*/
 float2(0.8f, -0.9f), float2(0.9f, -1.f),
};
 
cbuffer ShaderDebugDrawData : register(b0)
{
 float4x4 Projection;
 float4x4 ViewProjection;
};
 
struct vsOut
{
 float4 Pos : SV_Position;
 float3 Color : TexCoord0;
};
 
StructuredBuffer ShaderDebugStructuredBuffer : register(u0);
RWBuffer<uint> StructureCount : register(u1);
 
void DebugDrawDigit(int digit, float4 pos, inout LineStream GS_Out, float3 color)
{  
 for (int i = DigitFontOffsets[digit]; i < DigitFontOffsets[digit+1] - 1; i+=2)
 {
  vsOut p;
  p.Color = color;
 
  p.Pos = pos + float4(DigitFont[i] * DigitFontScaling, 0, 0);
  GS_Out.Append(p);
 
  p.Pos = pos + float4(DigitFont[i +1] * DigitFontScaling, 0, 0);
  GS_Out.Append(p);
 
  GS_Out.RestartStrip();
 }
}
 
float4 DebugDrawIntGS(int numberAbs, uint numdigit, float4 pos, inout LineStream GS_Out, float3 color)
{
 while(numdigit > 0)
 {
  DebugDrawDigit(numberAbs % 10u , pos, GS_Out, color);
  numberAbs /= 10u;
  --numdigit;
  pos.x -= DigitFontWidth;
 }
 
 return pos;
}
 
void DebugDrawFloatHelperGS(float number, float4 pos, inout LineStream GS_Out, float3 color, int totalDigits)
{
 float numberAbs = abs(number);
 uint intPart = (int)numberAbs; 
 uint intDigits = 0;
 
 if(intPart > 0)
  intDigits = (uint) log10 ((float) intPart) + 1;
 
 uint fractDigits = max(0, totalDigits - intDigits);
 
 // Get the fractional part 
 uint fractPart = round(frac(numberAbs) * pow(10, (fractDigits-1)));
 
 // Draw the fractional part
 pos = DebugDrawIntGS(fractPart, fractDigits, pos, GS_Out, color * 0.5 /* make fractional part darker */);
 
 // Draw the .
 pos.x -= DigitFontWidth * 0.5;
 DebugDrawDigit(11, pos, GS_Out, color);
 pos.x += DigitFontWidth * 0.25;
 
 // Draw the int part
 if (numberAbs > 0)
 {
  pos = DebugDrawIntGS(intPart, intDigits, pos, GS_Out, color);
  if (number < 0)
   DebugDrawDigit(10 /* draw a minus sign */, pos, GS_Out, color);
 }
}
 
vsOut VS(uint VertexID : SV_VertexID)
{
 uint index = VertexID/2;
 
 uint col = ShaderDebugStructuredBuffer[index].color;
 uint flags = ShaderDebugStructuredBuffer[index].flag;
 
 float3 pos;
 if((VertexID & 1)==0) // we're processing the start of the line
  pos = ShaderDebugStructuredBuffer[index].posStart;
 else // we're processing the start of the line
  pos = ShaderDebugStructuredBuffer[index].posEnd;
 
 vsOut output = (vsOut)0;
 output.Color = ShaderDebugColors[col];
 
 if(flags & SHADER_DEBUG_FLAG_2D)
  output.Pos = float4(pos.xy,0,1);
 else if (flags & SHADER_DEBUG_FLAG_3D_VIEWSPACE)
  output.Pos = mul( float4(pos.xyz,1.0) , Projection);
 else // we just assume SHADER_DEBUG_FLAG_3D_WORLDSPACE otherwise
  output.Pos = mul( float4(pos.xyz,1.0) , ViewProjection);
 
 return output;
}
 
[numthreads(1,1,1)]
void CS(uint3 id : SV_DispatchThreadID)
{
  StructureCount[0] *= 2;
  StructureCount[1] = 1;
  StructureCount[2] = 0;
  StructureCount[3] = 0; 
}
 
float4 PS(vsOut input) : SV_Target0
{
 return float4(input.Color, 1.0f);
}
 
// Worst case we print 3 floats... 4 digits per float plus we need 4 vertices for the . and -, and another four 4 for the cross
[maxvertexcount(3 * (4*(2*DigitFontMaxLinesPerDigit)+4) + 4)]
void GS(line vsOut gin[2], inout LineStream GS_Out, uint PrimitiveID : SV_PrimitiveID)
{
 // We'll get two vertices, one primitive, out of the VS for each element in ShaderDebugStructuredBuffer...
 // TODO: we could avoid reading ShaderDebugStructuredBuffer if we passed the number flag along from the VS
 ShaderDebugLine dbgLine = ShaderDebugStructuredBuffer[PrimitiveID];
 
 // If we got a line, then just re-emit the line coordinates
 if((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_LINE)
 {
  GS_Out.Append(gin[0]);
  GS_Out.Append(gin[1]);
  GS_Out.RestartStrip();
 
  return;
 }
 
 float4 pos = gin[0].Pos;
 
 // Draw cross
 vsOut p;
 p.Color = gin[0].Color;
 
 p.Pos = pos + float4(DigitFontWidth*0.5,0,0,0);
 GS_Out.Append(p);
 p.Pos = pos + float4(-DigitFontWidth*0.5,0,0,0);
 GS_Out.Append(p);
 GS_Out.RestartStrip();
 
 p.Pos = pos + float4(0,DigitFontWidth*0.5,0,0);
 GS_Out.Append(p);
 p.Pos = pos + float4(0,-DigitFontWidth*0.5,0,0);
 GS_Out.Append(p);
 GS_Out.RestartStrip();
 
 // Draw the numbers, as lines
 pos += float4(0,-DigitFontWidth*1.5,0,0);
 float3 number = gin[1].Pos.xyz;
 
 if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT1)
 {
  // Less floats drawn means we can afford more precision without exceeding maxvertexcount
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 12);
 }
 else if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT2) 
 {
  // Less floats drawn means we can afford more precision without exceeding maxvertexcount, 12/2 = 6 digits
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 6);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 6);
 }
 else //if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT3)
 {
  // 3*4 we draw 12 digits here...
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 4);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 4);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.z, pos, GS_Out, gin[0].Color, 4);
 }
}

01 April, 2013

Space Marine did it first!

You know, I don't usually post links to news or so, but all the guys behind Space Marine worked so hard and were so amazing I have to do this shameless plug. I think you get more attached to a product when people work really their ass off, and they are super smart, and in the end, sales are not great... Oh well...

Nowadays a few people are doing "medium range" ambient occlusion using top-down projected and blurred depth buffers. No one credited Space Marine and I think very honestly, as we didn't publish much at all on it. Still, it might be worth a second look at the slides I've pushed, as SM's technique still has a few tricks that I didn't see in the others I've seen around so far, with titling to keep the update times small and depth peeling to handle interiors and areas with multiple heights.

Shadowmaps and cascades rant/thoughts...
On a only slightly partially related note, and to add some "novel" content to this post, I was wondering for a bit about shadowmaps. We tried a couple of ways couple of ways of caching them, but in SM they failed.
Simply updating some cascades every other frame didn't work with self-occlusions of dynamic objects, and re-rendering dynamics (and more advanced methods) failed because the bandwidth required to move shadowmaps around was huge on 360/ps3.

What I don't remember anymore is if we tried to solve the problem of the self-shadowing by accessing the cascades of the dynamic objects using the position they had at the previous frame (when the cascade was computed). My memory is very bad (that's partially why I keep this blog...), I'll have to ask my then coworkers about this. If we didn't try, I was dumb. If we did, I wonder why it failed. Food for thought, maybe I'll post an update on this later on. As far as I gathered, Crytek didn't do this in their every other frame update on Crysis 2.

Update: I see the catch. Space Marine did "splat" the shadows in screen space, for good reasons. And if you do so, you reconstruct the position of the objects to be shadowed using the current frame depth buffer from a depth prepass (in our case, from the GBuffer pass, being a deferred renderer), there is no easy way to implement this.
There are ways, like stenciling and using a MRT containing last frame's world position... which could have been later used for motion blur vectors (which we did compute), so it's not crazy even in that scenario, but I'm quite sure now we didn't try all this, for how bad my memory is I would have remembered such a large change :)

Bonus hint: always point your SSAO towards the sky...

10 February, 2013

Color blindness and videogames

After PC Gamers published this article on color blindness and games, I was curious to see what could be done and how, to target better the 5% (8% among males) of gamers affected with this deficiency. The original article didn't link any research or implementation hint but came with a note saying that it should be trivial to do... As most games nowadays do some form of color correction, often via volume color textures, you would think it's not hard to bake a global color transform that maps better the RGB space into what can be seen by colorblind persons.

Well, indeed, it is. Most papers seem to reference as a starting point a project report by Fidaner, Lin and Ozguven, "Analysis of Color Blindness" which derives a simple linear transform by going from RGB to LMS color space, simulating color blindness by losing one of the receptors in the LMS space, computing the difference between the two images and feeding back this difference by adding colors that can be perceived instead.
The algorithm is so simple it's easier to read the paper than my summary of it. Past this simple mapping, all the research I've found improve on the mapping by adjusting for the characteristic of the image you need to convey, which is not only more expensive but also could be not ideal in our case, as the image contents are changing frame to frame. I wonder if a nonlinear transform could improve the situation, but I haven't found much about static, global color transforms other than the aforementioned work.

Further notes:
  1. The website daltonize.org has implementations of the linear transform in many languages.
  2. Some papers seem to do the RGB->LMS (and viceversa) conversion in gamma space (including the Analysis of Color Blindness one), while some other don't. The confusion I guess comes from the fact that there are many RGB spaces, and not only sRGB with its gamma 2.2ish transfer function. From what Wikipedia says, CIEXYZ to LMS is a linear transform, and keeping in mind that sRGB to CIEXYZ is not, we have to gamma/degamma. I've also found this paper (comes with sourcecode) which makes conversion between sRGB to LMS an even less trivial matter.
  3. There are some variants of the original daltonization algorithm. In particular, this paper, proposes (among other things which are less relevant) the use of a modified error matrix (formula n.5).
  4. If you wanted to spare some GPU cycles, it's possible to feedback the error term computed by the daltonization in other post-effects, to locally enhance contrast of areas between two similar colors.
    1. This paper illustrates the concept.
    2. You could feedback into an existing, suitable effect (i.e. bloom)
    3. You could trade off a post/processing step for this, for example, remove DOF or motion blur and add an "unsharp mask" filter guided by the error. A way of doing this is to compute an unsharp mask, in a single pass, for both the regular and color-blind simulated colors, and then feedback the error (contrast loss) in the image.
  5. A color-blind simulation mode could at the very least, help UI designers with their job.

03 November, 2012

Addendum to Mathematica and Skin Rendering

Update: there was a typo in the shader code, I didn't update the screenshot...

A shader code snippet following my work on Penner's scattering approximation. I'm actually rather sure it can be done better and for cheaper, especially if you "clamp" the radius range to a narrower window, currently it goes too broad on one side while trying to go back to a standard, not wrapped N dot L for large radii, it's not really useful. But, I've already given the Mathematica code so you can do better...

float3 PSSFitFunction(float NdL, float r)
{
   float3 a0 = {0.0605, 0.2076, 0.2243};
   float3 a1 = {0.0903, 0.1687, 0.2436};
   float3 a2 = {-0.0210, -0.0942, -0.1116};
   float3 a3 = {0.6896, 0.6762, 0.6480};
   float3 a4 = {-0.1110, -0.5023, -0.6703};
   float3 a5 = {0.8177, 0.9119, 0.9209};

   float3 t = NdL.xxx * (a0*r+a1) + (a2*r+a3);
   float3 fade = saturate(a4*r+a5);
  
   return t*t*t * fade + saturate(NdL) * (1-fade);
}

07 October, 2012

Supersampling and antialiasing distance fields

Just found this note cleaning up my stuff, thought I might as well post it...

We had some issues with using signed distance fields for font rendering and antialiasing. The idea is to conceptually similar to doing a two dimentional marching squares and then computing the area of the resulting convex polygon.

If you can't (or don't want to read) my horrible note, the "algorithm" samples four taps of the distance field (use ddx/ddy UV for the width) on a (unit) square and then walks the vertices and edges of the square counterclockwise (first a vertex, then an edge, then the next vertex and so on).

The polygon is constructed by taking the coordinates of all the vertices that are inside the distance field and computing an intersection point for all the edges that are between two in-out vertices. The polygon area (which equals the coverage) is computed incrementally using the determinant method. All unrolled in a shader.

Jason Peng, an incredibly talented UBC student implemented this at Capcom Vancouver. He tells me it worked :)


P.S. You'll notice that I write counterclockwise and then draw all the arrows clockwise... :) Stupid me.

17 March, 2012

DirectX9 vs depth resolve

GDC 2012 is over and I bet there are a lot of interesting new techniques to discuss and analyze. I didn't start this work yet (actually I still have to catch up with a few other conferences first, from Siggraph Asia to Eurographics I've not been doing my homework much lately) but what I love when I read papers is not really the application of a given concept (which in scientific papers is often presented in such a biased way that is hard even for experts to really understand the merits of a given implementation), but to find new ideas that can spark new applications.

One such paper was for sure the Variance Shadow Map paper by Donnelly and Lauritzen, which thought statistically illiterate people like me (I know probability, but I'm really a novice when it comes to statistics) a couple of things about means, averages and how these can be used in computer graphics.

After reading that paper a few notions should stick in your mind. The average over a comparison versus comparison over average is one, which applies directly to occlusion, but more in general, that you can summarize a population of samples in a few statistics, which are nicely additive.

If that is the case, then you can apply what you learned in other contexts, with varying degrees of success. And this brings us to the lame technique of the month... As you might know, DirectX 9 is a bitch when it comes to depth buffer resolves. Never-mind reading MSAA depth samples, even accessing depth information is pretty much hopeless unless you want to do hardware PCF shadows. So you end up writing to a colour R32F target your depth, and as you don't have MSAA sample access even in that case, if you want to do some depth-aware effects like SSAO or soft-particles, you're in a lot of pain.

Assuming you don't want to pay the ridiculous price of having to do a depth-prepass, with an extra R32F rendertarget, to then throw it away and not be able to use it as a early-depth priming for your next MSAA passes, your depth samples will be summed and averaged, while what you really would like is to compute a min-max buffer or something like that.

Something like that... We have average... We want to know something about these averaged samples... Mhm... Variance to the rescue? Let's try... This is a small test I did in a few hours, so it's not going to be cool. The whole point of this post is not to show a cool implementation but to stress that you should learn meta-techniques, not stress on implementations...

So here it is. I had a grayscale scene (shading = depth because I'm lazy) and on top of it I laid down two "depth fog" passes with a very very sharp falloff (the green and red thingies). That's because a sharp function applied to the depth is what you want to stress the worst case of fetching the average depth at object boundaries instead of the correct one.

This is what it looks accessing the MSAA R32F depth, averaged:

Notice the horrible edges...

This is how it looks by computing mean and variance in a 16bit ARGB buffer, and then offsetting the depth used in the fog computation by some function of the variance. This is to "emulate" a min-max buffer, where we choose only one of the two endpoints, which works for many effects (i.e. preferring the foreground).
Better, even in the jpeg compressed image... Even better, if we compute out of the variance two depths, then compute the fog with both endpoints, and average, as it would be more correct even having a full min-max buffer...
Detail... Good enough :)

03 February, 2012

Normalmaps everywhere/2

First followup to this: http://c0de517e.blogspot.com/2012/02/normalmaps-everywhere.html

Nothing smart really, it's just a small Mathematica playground I made to visualize and experiment with normalmaps, mipmaps and phong, I'm posting it in case you want to play too!

http://www.scribd.com/doc/80425295/Normalmaps-mipmaps-Mathematica-playground

Just to be clear, this is something to have fun and experiment, of course if all I wanted to do was to fit Phong (or any other rotation-invariant BRDF or part of them) I could have done it better in many other ways (first of all, by considering a single parameter that is the dot between the normal and the reflection vector/halfvector instead of working on the hemisphere). Hopefully I'll have something more "serious" coming out on the topic soo.. eventually :)

01 February, 2012

Normalmaps everywhere

You are modeling a photorealistic face. Your artists want to be able to have detail, detail detail. Skin pores. Sweat droplets. Thin wrinkles. What do you do?

If your answer is to give them high-res normalmaps and some cool, sharp specular, think twice. 

Normalmaps are not about thin, very high frequency detail! Especially NOT when rendered by our current antialiasing techniques.

Let's think about a high fidelity model for example, millions of triangles. It's too expensive to render it in realtime, but we can imagine it looking nice. It will have aliasing issues on regions of high frequency detail, like the lashes on a human face, so it requires many samples to avoid that artifact. We bake normalmaps on a low-poly base mesh, and all our problems disappeared. Right? Now it's fast and it doesn't shimmer. Cool. How? How comes that that aliasing magically disappeared? Isn't it a bit fishy?

Normalmaps: a few pixels wide detail
Normalmaps suffer from two main issues.

The first one is that they are not really that good at capturing surface variations. Imagine some thin steps, or square ridges on a flat surface. The normals in tangent space will be all "up" but hey, we had some dramatic discontinuities (steps) where are they? Of course, you lost them in the resolution of your normalmap. Now, steps are not representable no matter what obviously (discontinuity = infinite frequency = no resolution will be enough) but you might think, ok, I create a normalmap with a high resolution, create a bit of bevel around these edges so the normalmap captures it et voilĆ , problem solved. Right?

This introduces us to the second problem, which puts the nail on the coffin. Normalmaps don't antialias well. They are textures, so on the GPU they are antialiased by some sort of pre-filtering (i.e. mipmaps). We will get always a normal per sample (and MSAA does not give you more shading samples, only depth/geometry ones), in the end (unless we do crazy stuff in the shader that I've never seen done but maybe it would be a cool idea to try...) thus it will take little to zoom out enough to have our normals disappear and become flat again due to the averaging done by filtering. It takes actually _really_ little.

You might want to think about this a bit, and I could make lots of examples, but in the end normalmaps won't cut it, by their nature of being a derivative surface measure they take quite a bit of space to encode details, and can reproduce them faithfully only if there is no minification, so if a feature takes five texels on a normalmap, you might get good results if on-screen that feature still takes five pixels, when you start zooming out it will fade to flat normals again.

Unfortunately some times artists fail to recognize that, and they "cheat" themselves into thinking that certain features are wider than they are, because if they model them realistically they won't show. Thus, we get faces with huge pores and thick lashes or facial hair. 
Or models which are covered with detail in the three-five pixel frequency range but that does not exhibit any finer one geometry-wise (other than the geometry edges) and creates a very weird look where the frequencies do not match between colourmaps, normalmaps and geometry, almost like a badly cast plastic toy whose geometry could not hold well the detail.
It also doesn't help that this issue is largely not present in offline rendering, where proper supersampling is able to resolve subpixel detail correctly from normalmaps, thus there is some mismatch in the experience between the same technique used with different rendering algorithms.

A normalmapping example... from google images (sorry)
So what? 
How can we solve this? By now some ideas should be circulating around your head. Prefiltering versus postfiltering (supersampling). Averaging some quantities and then performing some operations (lighting) versus performing operations on the individual quantities and then averaging? These problems should be familiar, one day I'll write something about it, but always remember than a given rendering technique is not so much useful per se as it is as a part of a "generic" idea that we can apply... For example, a similar-sounding problem could be shadow filtering... and VSM...

But let's step back again a second. The problem with mipmapping basically means that far enough, normalmaps will always suck right? And what we want really is to find a mean to capture the average of the illumination all these bunch of tiny normals that we can't express in a given miplevel would have created. Some sort of "baking"...

Mhm but this averaging illumination... Isn't that what a material, a shader, and a BDRF exactly do? You know, the whole "there is no diffuse material" but really it's just that if you zoom close enough, a lot of tiny fragments in the material are all rough and oriented in different directions and thus scatter light everywhere, in a "diffuse" way... We ignore these geometrical details because they are way too small to capture and we created a statistical around them! Sounds promising. What could we use?

Well, lots of things, but two of them are _really common_ and underused, and you have them already (most probably) so why not...

Occlusion maps: pixel-sized detail
One great way to encode geometric detail, maybe surprisingly, is to use occlusion. The sharper, the smaller a depression is, the more occluded it will be.

And occlusion is really easy to encode. And it mipmaps "right", because it's something that we apply _after_ lighting, it's not a measure that we need to compute lighting. Thus the average in a mipmap will still make sense, as we don't need to transform that by a function that we can't distribute over the mipmap averaging...

Digital Emily demonstrating again specular occlusion

Also, chances are that a lot of your detail will come from the specular, as the specular more high-frequency than diffuse. And chances are that you already have a specular map around... Profit!

Ok but what if we have geometric details which cause sharp lighting discontinuities but not occlusion related. For example, round rivets on a metal tank's body. Or transparent sweat beads on a man's forehead. What could we do then?

Exponent maps: specular detail to the subpixel
The two examples I just made are convenient. Round, highly specular shapes. What does that do, lighting wise?

Photographers are known to use the reflection of the lights in the eyes to help them reverse engineering the lighting in a photo. These round shapes will then do something similar, capture the light from any direction.

And as they are highly specular, they will create a strong reflection if there is any strong light anywhere in their field of view, in the field of view of the hemisphere. This strong reflection, even if it's small, will still be significantly visible even if we look at the feature far enough not to be able to resolve, in our samples, the spherical shape itself.

So that is interesting. In a way this surface which has a high phong exponent and creates sharp highlights, when seen from far away acts as a surface with a much lower exponent, that shines if a light is "in view" from the surface and not only at very narrow viewing angle. Is this surprising even though? Really not right? We already know that in reality our materials are made by purely specular, tiny fragments, that average together to create what we model with Phong exponents of whatever other equation.

Sweat on a face though is usually better modeled, for the typical viewing distances found in videogames, as streaks of lower exponents in the material "gloss" map, instead of using normalmaps. This also completes the answer to the original question, at the beginning of the post...

Conclusion
So that's it, right? Tricks for artists to express detail, done.

Well, sort-of. Because what we said so far kinda works only at a given scale (the occlusion does not). But what if I need to express a material at all scales. What if I want to zoom close enough to see the sweat droplets, and then pan backwards? I would need a model where up close the sweat layer uses normalmaps and a high specular exponent, but far away the normals get "encoded" in the BRDF by lowering the gloss exponent instead...

Well. Yes. We can do exactly that! Luckily, we have mipmapping which lets us encode different properties of a material at different scales. We don't have to straight average from the top miplevel in order to generate mipmaps... We can get creative!

And indeed, we are getting creative. Lean and Clean, and more to the point, Clean translated into exponent maps. Or transitioning from a model to an entirely different one, for oceans or trees...


Now if you were following, you should also understand why I mentioned VSM and shadows... One idea could be to be able to extract variance from the hardware filtering operation (which is a general idea which is _always_ cool, I'll show that in another post... soon) and use that variance somehow to understand how big is the normal bundle we have to express. 


While I was chatting about all this with a coworker, he brought to my attention the work that  COD:BlackOps did (I guess I was distracted during Siggraph...), that is exactly along these lines (kudos to them, I love COD rendering)... They directly use this variance to modify the gloss amount, and to bake that in mipmaps, so we don't have to compute anything at runtime, only during mipmap generation we take the variance of the normalmaps in a given mip texel and use that to modify the gloss... Simple implementation, great idea!

But it's also a matter of educating artists, and I was really happy to see that one of our artists here proposed to drop normalmaps now that our game is rendering from a more open viewpoint, because they were not able to provide good detail (either too noisy if too little filtering or too flat), and resorted to specular maps instead for the purpose.

You might even think that you could take all your maps, and a BRDF, and compute maps and mipmaps for the same BRDF or another BRDF re-fitted to best match the lighting from the original at a given scale... And this is really what I was working on... but this article is already too big and my Mathematica example is too stupid, so I'll probably post that in a follow-up article. Stay tuned...

Sneak peek
P.S. From twitter people reminded me of these other references, which I admit I knew about but didn't read yet (shame on me!): http://www.cs.columbia.edu/cg/normalmap/index.htmlhttp://developer.nvidia.com/content/mipmapping-normal-mapshttp://blog.selfshadow.com/2011/07/22/specular-showdown/
Also an addendum, when thinking about aliasing... what about specular aliasing due to geometric detail? I actually started looking into that before thinking about this. Differential operators in the shaders could be used to evaluate curvature for example, but really I found it to be hard to have problems because of geometric normals even for the dense models I was using, you really need very high exponents, so I dropped that and went to analyze normalmaps instead.