Search this blog

14 July, 2013

DX11: GPU "printf"

So, first a little "announcement": I'm crafting a small DX11 rendering framework in my spare time. I want to have it opensourced, and it's based on MJP's excellent SampleFramework11.
The goals are to provide an environment roughly as fast to iterate upon as FXComposer was (I consider it dead now...) but for programmers, without being a "shader editor".
If you're interested in collaborating, send me an email at c0de517e (it's a gmail account) with a brief introduction, there is an interesting list of things to do.

That said, this is a little bit of functionality Maurizio Cerrato and I have been working on in a couple of days, a "printf" like function for pixel (and compute) shaders. It all started when chatting Daniel Sewell (a brilliant guy, was my rendering lead on Fight Night) he made me notice that he found, working on CS that a neat way to debug them was to display all kinds of interesting debug visualizations by having geometry shaders "decode" buffers and emit lines.

if(IsDebuggedPixel(input.PositionSS.xy)) DebugDrawFloat(float2(ssao, bloom.x), clipPos);
The astute readers will at this point have already all figured it out. PS and CS support append buffers, so a "printf" has only to append some data to a buffer that later you can convert to lines in a geometry shader.

You could emit such data per each PS invocation and later sift through it and display what you needed in a meaningful way, but that will be quite slow (and at that point you might want to consider just packing everything into some MRT outputs). The idea behind appendbuffers is to do the work only for a handful of invocations (e.g. screen positions, if current sv_position equals the pixel to "debug" then GPU printf...).

In order to keep everything snappy we also minimize the structure size we use in the append buffer, you can't really printf strings, the debugger so far support only one to three floats w/color and position or lines. Lines is were we started really, our struct containts two end-points a color (index) and a flag which distinguishes lines from float printf. Floats just reinterpret one of the endpoints as the data to print.

This append buffer structure gets then fed to a VS/GS that is invocated twice the times the append buffer count (via draw indirect, you need to multiply by two the count in a small CS, remember, you can't emit the start/end vertices as two separate append calls because the order of these is not deterministic, the vertices will end all mixed in the buffer!), and the GS emits extra lines if we're priting floats to display a small line-based font.

If you're thinking that is lame, well it is, there are certain limitations in the number of primitives the GS can emit that effectively limit the number of digits you can display, and you have to be careful about that, I "optimized" the code to display the most digits possible which unfortunately gives you very low-precision 3-float printf and higher precision 2-float and 1-float (you could though call three times the 1-float version... as there the ordering of the three call doesn't matter).

Keeping the same number of printed digits, the point has to float...
Why not using a bitmap font instead? Glad you asked. Lazyness, partially justified by the fact that I didn't want to have two different append buffers, one for lines and one for fonts, as the append buffers are a scarce resource on DX11. But it's a very lame justification, because there are plenty of workarounds left for the reader, you could filter the append buffer in two drawcalls in a computer shader, or even draw lines as quads, which would probably be better anyways!

Anyhow, together with shader hot-reloading (which everybody has, right), this is a quite a handy trick. Bonus: on a similar note, have a look at this shadertoy snippet by my coworker Paul Malin... brilliant guy!

Some code, without doubt full of bugs:

Snippet from the CPU/C++ side, drawing the debug lines...
void ShaderDebugDraw(ID3D11DeviceContextcontextconst Float4x4viewProjectionMatrixconst Float4x4projMatrix )
    SampleFramework11::PIXEvent market(L"ShaderDebug Draw");
    context->CopyStructureCount(AppendBufferCountCopy, 0, AppendBuffer.UAView);
    // We need a compute shader to write BufferCountUAV, as we need to multiply CopyStructureCount by two 
    ID3D11ShaderResourceView* srViews[] = { AppendBuffer.SRView };
    ID3D11UnorderedAccessView* uaViews[] = { AppendBufferCountCopyUAV };
    UINT uavsCount[] = { 0 };
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);
    context->CSSetShader(DebugDrawShader.AcquireCS(), NULL, 0);
    context->CSSetShader(NULLNULL, 0);
    uaViews[0] = NULL;
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);
    // Set all IA stage inputs to NULL, since we're not using it at all.
    context->IASetVertexBuffers(0, D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT, (ID3D11Buffer**)nulls, (UINT*)nulls, (UINT*)nulls);
    context->IASetIndexBuffer(NULLDXGI_FORMAT_UNKNOWN, 0);
    // Draw debug lines
    srViews[0] =  AppendBuffer.SRView;
    context->VSSetShaderResources(0, 1, srViews);
    context->GSSetShaderResources(0, 1, srViews);
    context->GSSetShader(DebugDrawShader.AcquireGS(), NULL, 0);  
    context->VSSetShader(DebugDrawShader.AcquireVS(), NULL, 0);
    context->PSSetShader(DebugDrawShader.AcquirePS(), NULL, 0);
    shaderDebugDrawDataVS.Data.ViewProjection = viewProjectionMatrix;
    shaderDebugDrawDataVS.Data.Projection = projMatrix;
    shaderDebugDrawDataVS.SetVS(context, 0);
    context->DrawInstancedIndirect(AppendBufferCountCopy, 0);

This is roughly how the shader library looks for emitting debug lines/debug numbers from pixel shaders
struct ShaderDebugLine
 float3 posStart;
 float3 posEnd;
 uint color;
 uint flag;
cbuffer ShaderDebugData : register(b13)
 float2 debugPixelCoords;
 float2 oneOverDisplaySize;
 int debugType;
void DebugDrawFloat(float3 number,  float3 pos, int color = 0, uint spaceFlag = SHADER_DEBUG_FLAG_2D)
 ShaderDebugLine l;
 l.posStart = pos;
 l.color = color;
 l.posEnd = number;
 l.flag = SHADER_DEBUG_PRIM_FLOAT3|spaceFlag;
float2 SVPosToClipspace(float2 svPos, float2 oneOverDisplaySize) { return (svPos * oneOverDisplaySize) * float2(2,-2) + float2(-1,1); }
bool IsDebuggedPixel(float2 svPos)
 // This is a bit tricky because it depends on the MSAA pattern
 if(debugType == 1)
  return dot(abs(debugPixelCoords - svPos + float2(0.5,0.5)), 1.0.xx) <= 0.01f;
 else if(debugType == 2)
  return dot(abs(svPos % float2(100,100)), 1.0.xx) <= 1.01f;
 else return false;

And finally, the VS/GS/CS shaders needed to draw the debug buffer emitted from the various PS executions:
static const int DigitFontOffsets[] =
 0, 8, 10, 20, 30, 38, 48, 58, 62, 72, 82, 84, 86
static const float DigitFontScaling = 0.03;
static const float DigitFontWidth = 0.7 * DigitFontScaling; // The font width is 0.5, but we add spacing
static const int DigitFontMaxLinesPerDigit = 5;
static const float2 DigitFont[] =
 /* 0 */
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
 float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, -1.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 float2(0.f, 0.f), float2(0.5f,0.f), float2(0.5f,0.f), float2(0.5f,-0.5f),
 float2(0.5f,-0.5f), float2(0.f,-0.5f), float2(0.5f,-0.5f), float2(0.5f,-1.f),
 float2(0.5f,-1.f), float2(0.f,-1.f),
 float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, 0.f), float2(0.5f, -0.5f), float2(0.5f, -1.f),
 float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, -1.f), float2(0.f, 0.f), float2(0.5f, 0.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 float2(0.f, 0.f), float2(0.f, -1.f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.5f, -0.5f), float2(0.5f, -1.f), /* avoidable */ float2(0.f, 0.f), float2(0.5f, 0.f),
 float2(0.f, -1.f), float2(0.5f, -1.f),
 float2(0.5f, 0.f), float2(0.5f, -1.f), float2(0.5f, 0.f), float2(0.f, 0.f),
 /* 8 */
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
 float2(0.f, -0.5f), float2(0.5f, -0.5f),
 float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
 float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, 0.f),
 float2(0.5f, -1.f), float2(0.f, -1.f),
 float2(0.5f, -0.5f), float2(0.f, -0.5f),    
 float2(0.8f, -0.9f), float2(0.9f, -1.f),
cbuffer ShaderDebugDrawData : register(b0)
 float4x4 Projection;
 float4x4 ViewProjection;
struct vsOut
 float4 Pos : SV_Position;
 float3 Color : TexCoord0;
StructuredBuffer ShaderDebugStructuredBuffer : register(u0);
RWBuffer<uint> StructureCount : register(u1);
void DebugDrawDigit(int digit, float4 pos, inout LineStream GS_Out, float3 color)
 for (int i = DigitFontOffsets[digit]; i < DigitFontOffsets[digit+1] - 1; i+=2)
  vsOut p;
  p.Color = color;
  p.Pos = pos + float4(DigitFont[i] * DigitFontScaling, 0, 0);
  p.Pos = pos + float4(DigitFont[i +1] * DigitFontScaling, 0, 0);
float4 DebugDrawIntGS(int numberAbs, uint numdigit, float4 pos, inout LineStream GS_Out, float3 color)
 while(numdigit > 0)
  DebugDrawDigit(numberAbs % 10u , pos, GS_Out, color);
  numberAbs /= 10u;
  pos.x -= DigitFontWidth;
 return pos;
void DebugDrawFloatHelperGS(float number, float4 pos, inout LineStream GS_Out, float3 color, int totalDigits)
 float numberAbs = abs(number);
 uint intPart = (int)numberAbs; 
 uint intDigits = 0;
 if(intPart > 0)
  intDigits = (uint) log10 ((float) intPart) + 1;
 uint fractDigits = max(0, totalDigits - intDigits);
 // Get the fractional part 
 uint fractPart = round(frac(numberAbs) * pow(10, (fractDigits-1)));
 // Draw the fractional part
 pos = DebugDrawIntGS(fractPart, fractDigits, pos, GS_Out, color * 0.5 /* make fractional part darker */);
 // Draw the .
 pos.x -= DigitFontWidth * 0.5;
 DebugDrawDigit(11, pos, GS_Out, color);
 pos.x += DigitFontWidth * 0.25;
 // Draw the int part
 if (numberAbs > 0)
  pos = DebugDrawIntGS(intPart, intDigits, pos, GS_Out, color);
  if (number < 0)
   DebugDrawDigit(10 /* draw a minus sign */, pos, GS_Out, color);
vsOut VS(uint VertexID : SV_VertexID)
 uint index = VertexID/2;
 uint col = ShaderDebugStructuredBuffer[index].color;
 uint flags = ShaderDebugStructuredBuffer[index].flag;
 float3 pos;
 if((VertexID & 1)==0) // we're processing the start of the line
  pos = ShaderDebugStructuredBuffer[index].posStart;
 else // we're processing the start of the line
  pos = ShaderDebugStructuredBuffer[index].posEnd;
 vsOut output = (vsOut)0;
 output.Color = ShaderDebugColors[col];
 if(flags & SHADER_DEBUG_FLAG_2D)
  output.Pos = float4(pos.xy,0,1);
  output.Pos = mul( float4(,1.0) , Projection);
 else // we just assume SHADER_DEBUG_FLAG_3D_WORLDSPACE otherwise
  output.Pos = mul( float4(,1.0) , ViewProjection);
 return output;
void CS(uint3 id : SV_DispatchThreadID)
  StructureCount[0] *= 2;
  StructureCount[1] = 1;
  StructureCount[2] = 0;
  StructureCount[3] = 0; 
float4 PS(vsOut input) : SV_Target0
 return float4(input.Color, 1.0f);
// Worst case we print 3 floats... 4 digits per float plus we need 4 vertices for the . and -, and another four 4 for the cross
[maxvertexcount(3 * (4*(2*DigitFontMaxLinesPerDigit)+4) + 4)]
void GS(line vsOut gin[2], inout LineStream GS_Out, uint PrimitiveID : SV_PrimitiveID)
 // We'll get two vertices, one primitive, out of the VS for each element in ShaderDebugStructuredBuffer...
 // TODO: we could avoid reading ShaderDebugStructuredBuffer if we passed the number flag along from the VS
 ShaderDebugLine dbgLine = ShaderDebugStructuredBuffer[PrimitiveID];
 // If we got a line, then just re-emit the line coordinates
 float4 pos = gin[0].Pos;
 // Draw cross
 vsOut p;
 p.Color = gin[0].Color;
 p.Pos = pos + float4(DigitFontWidth*0.5,0,0,0);
 p.Pos = pos + float4(-DigitFontWidth*0.5,0,0,0);
 p.Pos = pos + float4(0,DigitFontWidth*0.5,0,0);
 p.Pos = pos + float4(0,-DigitFontWidth*0.5,0,0);
 // Draw the numbers, as lines
 pos += float4(0,-DigitFontWidth*1.5,0,0);
 float3 number = gin[1];
  // Less floats drawn means we can afford more precision without exceeding maxvertexcount
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 12);
  // Less floats drawn means we can afford more precision without exceeding maxvertexcount, 12/2 = 6 digits
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 6);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 6);
  // 3*4 we draw 12 digits here...
  DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 4);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 4);
  pos.y -= DigitFontWidth * 2;
  DebugDrawFloatHelperGS(number.z, pos, GS_Out, gin[0].Color, 4);


Raleigh said...


Anonymous said...

"I want to have it opensourced, and it's based on MJP's excellent SampleFramework11."

Btw did you publish your code on any public repository?
What about SampleFramework11, is it open-sourced too? I could not find any repo on MJP's site.

DEADC0DE said...

SampleFramework11 is not on a repo but you can download it with any of MJP's samples and it includes a MIT license.

My additions are on a private repo, way too messy to make them public yet.