Search this blog

25 June, 2014

Oh envmap lighting, how do we get you wrong? Let me count the ways...

Environment map lighting via prefiltered cubemaps is very popular in realtime CG.

The basics are well known:
  1. Generate a cubemap of your environment radiance (a probe, even offline or in realtime).
  2. Blur it with a cosine hemisphere kernel for diffuse lighting (irradiance) and with a number of phong lobes of varying exponent for specular. The various convolutions for phong are stored in the mip chain of the cubemap, with rougher exponents placed in the coarser mips.
  3. At runtime we fetch the diffuse cube using the surface normal and the specular cube using the reflection vector, forcing the latter fetch to happen at a mip corresponding to the material roughness.
Many engines stop at that, but a few extensions emerged (somewhat) recently:
Especially the last extension allowed a huge leap in quality and applicability, it's so nifty it's worth explaining a second.

The problem with Cook-Torrance BRDFs is that they depend from three functions: a distribution function that depends on N.H, a shadowing function that depends on N.H, N.L and N.V and the Fresnel function that depends on N.V.

While we know we can somehow solve functions that depend on N.H by fetching a prefiltered cube in the reflection direction (not really the same, but the same different that there is between the Phong and Blinn specular models), if something depends on N.V it would add another dimension to the preintegrated solution (requiring an array of cubemaps) and we completely wouldn't know what to do with N.L as we don't have a single light vector in environment lighting.

The cleverness of the solution that was found can be explained by observing the BRDF and how its shape changes when manipulating the Fresnel and shadowing components.
You should notice that the BRDF shape, thus the filtering kernel on the environment map, is mostly determined by the distribution function, that we know how to tackle. The other two components don't change much of the shape but scale it and "shift" it away from the H vector. 

So we can imagine an approximation that integrates the distribution function with a preconvolved cubemap mip pyramid, and the other components are somehow relegated into a scaling component by preintegrating them against an all-white cubemap, ignoring specifically how the lighting is distributed. 
And this is the main extension we employ today, we correct the cubemap that has been preintegrated only with the distribution lobe with a (very clever) biasing factor.

All good, and works, but now, is all this -right-? Obviously not! I won't offer (just yet) solutions here but can you count the ways we're wrong?
  1. First and foremost the reflection vector is not the half-vector, obviously.
    • The preconvolved BRDF expresses a radially symmetric lobe around the reflection vector, but an half-vector BRDF is not radially symmetric at grazing angles (when H!=N), it becomes stretched.
    • It's also different from the its reflection-vector based one when R=H=N but there it can be adjusted with a simple constant roughness modification (just remember to do it!).
  2. As we said, Cook-Torrance is not based only on an half-vector lobe. 
    • We have a solution that works well but it's based only on a bias, and while that accounts for the biggest difference between using only the distribution and using the full CT formulation, it's not the only difference.
    • Fresnel and shadowing also "push" the BRDF lobe so it doesn't reach its peak value on the reflection direction.
  3. If we bake lighting from points close enough that perspective matters, then discarding position dependence is wrong. 
    • It's true that perceptually is hard for us to judge where lighting comes from when we see a specular highlight (good!) but for reflections of nearby objects the error can be easy to spot. 
    • We can employ warping as we mentioned, but then the preconvolution is warped as well.
    • If for example we warp the cubemap by considering it representing light from a box placed in the scene, what we should do is to trace the BRDF against the box and see how it projects onto it. That projection won't be a radially symmetric filtering kernel in most cases.
    • In the "box" localized environment map scenario the problem is closely related to texture card area lights.
  4. We disregard occlusions.
    • Any form of shadowing of the preconvolved enviroment lighting that just scales it down is wrong as occlusion should happen before prefiltering.
    • Still -DO- shadow environment map lighting somehow. A good way is to use screen-space (or voxel-traced) computed occlusion by casting a cone emanating from the reflection vector, even if that's done without considering roughness for the cone size, or somehow precomputing and baking some form of directional occlusion information.
    • Really this is still due to the fact that we use the envmap information at a point that is not the one from which it was baked.
    • Another good alternative to try to fix this issue is renormalization as shown by Call of Duty.
  5. We don't clip the specular lobe to the normal-oriented hemisphere
    1. So, even for purely radial-symmetric BRDFs around the reflection vector (Phong), in an environment without occlusion, the approximations are not correct.
    2. Not clipping is similar to the issues we have integrating area lights (where we should clip the area light when it dips below the surface horizon, but for the most part we do not)
    3. This is expected to have a Fresnel-like effect - we are messing up with the grazing angles.
    4. A possible correction would be to skew the reflection vector away from the edges of the hemisphere, and shrink it (fit it to the clipped lobe)
  6. We disregard surface normal variance.
    • Forcing a given miplevel (texCubeLod) is needed as mips in our case represent different lobes at different roughnesses, but that means we don't antialias that texture considering how normals change inside the footprint of a pixel (note: some HW gets that wrong even with regular texCube fetches)
    • The solution here is "simple" as it's related to the specular antialiasing we do by pushing normal variance into specular roughness.
    • But that line of thought, no matter the details, is also provably wrong (still -do- that). The problem is closesly related to the "roughness modification" solution for spherical area lights and it suffers from the same issue, the proper integral of the BRDF with a normal cone is flatter than what we get at any roughness on the original BRDF.
    • Also, the footprint of the normals won't be a cone with a circular base, and even what we get with the finite difference ddx/ddy approximation would be elliptical.
  7. Bonus: compression issues for cubemaps and dx9 hardware.
    • Older hardware couldn't properly do bilinear filtering across cubemap edges, thus leading to visibile artifacts that some corrected by making sure the edge texels were the same across faces.
    • What most don't consider though is that if we use a block-compression format on the cubemap (DXT, BCn and so on) there will be discontinuities between blocks which will make the edge texels different again. Compressors in these cases should be modified so the edge blocks share the same reference colors.
    • Adding borders is better.
    • These techniques are relevant also for hardware that does bilinear filter across cubemap edges, as that might be slower... Also, avoid using the very bottom mips...
I'll close with some links that might inspire further thinking:
#phyiscallybasedrenderingproblems #naturesucks

16 June, 2014

Bonus round: languages, metaprogramming and terseness

People always surprise me. I didn't expect a lot out of my latest post but instead it spawned many very interesting comments and discussion over twitter and a few Reddit threads.

I didn't want to talk about languages really, I did so already a lot in the past, I wanted to make a few silly examples to the point of why we code in C++ and what could be needed to move us out of it (and only us, gamedevs, system, low-level, AAA console guys, not in general the programming world which often already ditched C++ and sometimes is even perfectly ok with OO) but instead I got people being really passionate about languages and touting the merits of D, Rust, Julia (or even Nimrod and Vala).

Also to my surprise I didn't get any C++ related flame, nobody trying really to prove that C++ was the best possible given the constraints or arguing the virtues of OO and so on, it really seems most agree today and we're actually ready and waiting for the switch!

Anyhow, I wanted to write an addendum to the post because it's related to the humanistic POV I tried to make, talking about language design.

- Beauty and terseness

Some people started talking about meta-programming and in general expressiveness and terseness. I wanted to offer a perspective on how I see some language concepts in terms of what I do.

In theory, beauty in programming is simplicity and expressiveness, and terseness. To a degree programming itself can be seen as data compression, so the more powerful our statements are, the more they express, the more they compress, the better.
But obviously this analogy goes only so far, as we wouldn't consider programming in a LZ-compressed code representation to be beautiful, even if it would be truly be more terse (and it would be a form of meta-programming, even).

That's obviously because programming languages are not only made to be expressive, but also understandable by the meat bags that type them in, so there are trade-offs. And I think we have to be very cautious with them.

Take meta-programming for example, it allows truly beautiful constructions, the ability of extending your language semantics and adapting it to your domain, all the way to embedded DSLs and the infamous homoiconicity dear to lispers. 
But as a side effect, the most you dabble in that, the more your language statements lose meaning in isolation (to the lisp extreme where there is almost no syntax to carry any meaning), and that's not great.

There might be some contexts where a team can accept to build upon a particular framework of interpretation of statements, and they get trained in it and know that in this context a given statement has a given meaning.
To a degree we all need to get used to a codebase before really understanding what's going on, but it is a very hard trade the one that adds burden to the mental model of what things mean.

For gamedev in particular is quite important not only that A = B/C means A = B/C, but also that it is executed in a fixed way. Perhaps certain times we overemphasize the need of control (and for example often have to debate to persuade people that GC isn't evil, lack of control over heap allocation is) because of a given cultural background, but undeniably it does exist.

[Small rant] Certainly I would not be happy if B/C meant something silly like concatenating strings. I could possibly even tolerate "+" for that because it's so common it is a natural semantic, maybe stretching it even "|" or "&". But "/" would be completely fucked up. Unless you're a Boost programmer and are really furiously masturbating on the thought of how pretty is to use "/" to concatenate paths because directory dividers are slashes and you feel so damn smart.

That's why most teams won't accept metaprogramming in general and will allow C++ templates only as generics, for collections. And will allow operator overloading only for numeric types and basic operations.
...And why don't like references if they are non-const (the argument is that a mutable reference parameter to a function can change the value of a variable and that change is not syntactically highlighted at the call-site, a better option would be to have an "out" annotation like C# or HLSL). ...And don't like anything that adds complexity to the resolution rules of C++ calls, or exceptions, or the auto-generated operators of C++ classes, and should thus stay away also from R-value references.

- Meatbags

For humans certain constraints are good, they are actually liberating. Knowing exactly what things will do allows me to reason about them more easily than languages that require lots of context and mental models to interpret statements. That's why going back to C is often as fun as discovering python for the first time.

Now of course YMMV, and there are situations where truly we can tolerate more magic. I love dabbling with Mathematica, even if most of the times I don't exactly know how the interpreter will chain rules to compute something, it works and even when it doesn't I can iterate quickly and try workarounds until it does work.
Sometimes I really need to know more and there is when you open a can of worms, but for that kind of work it's fine, it's prototyping, it's exploratory programming, it's fun and empowering and productive. And I'm fine not knowing and just kicking stuff, clicking buttons until things work in these contexts, not everybody has to know how things work all the way to the metal and definitely not all the time, there are places where we should just take a step back...
But I wouldn't write a AAA console game engine that way. I need to understand, be able to have a precise simple mental model that is "standard" and understood by everybody on a project, even new hires. C++ is already too complex for this to happen and that's one of the big reasons we "subset" it into a manageable variant enforced by standard and linters. 

Abstractions are powerful but we should be aware of their mental cost, and maybe counter it with simple implementations and great tools (e.g. not make templates impossible to debug like in C++ are...) so it doesn't feel like you're digging in a compiler, when they fail.

Not that all language refinements impose a burden, so there can't be a more expressive than C language that is still similarly easy to understand, but many language decisions come with a tradeoff, and I find ones that loosen the relationship between syntax and semantics particularly taxing.

As the infamous Gang of Four wrote (and I feel dirty citing a book that I so adverse): "highly parameterized software is harder to understand and build than more static software".

That's why I advocate for increased productivity to seek interactivity and zero-iteration times, live-coding and live-inspection over most other language features these days.

And before going to metaprogramming I'd advocate to seek solutions (if needed) in better type systems. C++ functors are just lambdas and first-class functions done wrong, parametric polymorphism should be bounded, auto_ptr is a way to express linear types and so on... Bringing functionality into the language is often better than having a generic language that ca be extended in custom ways. Better for the meatbags and for the machine (tools, compilers and so on)

That said, every tool is just that, a tool. Even when I diss OOP it's not that per se having OO is evil, really a tool is a tool, the evil starts when people reason in certain ways and code in certain ways. 
Sometimes also the implementation of a given tool is particularly, objectively broken. If you only know metaprogramming from C++ templates, that were just an ignorant attempt at generics went very wrong (and that is still today not patched, concepts were rejected and I don't trust anyways them to be implemented in a sane way), then you might be overly pessimistic.

But with great power comes often great complexity to really know what's going on, and sane constraints are often an undervalued tool, we often assume that less constraints will be a productivity win and that's not true at all.

- Extra marks: a concrete example

When I saw Boost::Geometry I was so enraged I wanted to blog about it, but it's really so hyperbolically idiotic that I decided to take the high road and ignore it - Non ragioniam di lor, ma guarda e passa.

As an example I'll post here a much more reasonable use case someone showed me in a discussion, I have actually no qualms with this code, it's not even metaprogramming (just parametric types and unorthodox operator overloading) and could be appropriate in certain contexts, so it's useful to show some tradeoffs.

va << 10, 10, 255, 0, 0, 255;

Can you guess what's that? I can't, so I would go and look at the declaration of va for guidance.

vertex_array < attrib < GLfloat, 2 >, attrib < GLubyte, 4 > > va;

Ok so now it's clear, right? The code is taking numbers and packing into an interleaved buffer for rendering. I can also imagine how it's implemented, but not with certainty, I'd have to check. The full code snippet was:

vertex_array < attrib < GLfloat, 2 >, attrib < GLubyte, 4 > > va;

va << 10, 10, 255, 0, 0, 255; // add vertex with attributes (10, 10) and (255, 0, 0, 255)
// ...
va.draw(GL_TRIANGLE_STRIP);

This is quite tricky to implement in a simpler C-style C++ also because it hits certain deficiencies of C, the unsafe variadic functions and the lack of array literals. 
Let's try, one possibility is:

VertexType vtype = {GLfloat, 2, GLubyte, 3};
void *va = MakeVertexArray(vtype);

AddVertexData(&va, vtype, 10, 10, 255, 0, 0, 255, END);
Draw(va, vtype);

But that's still quite a bit magical at the call-site, not really any better. Can we improve? What about:

VertexType vtype = {GLfloat, 2, GLubyte, 3};
void *va = MakeVertexArray(vtype);

va = AddFloatVertexData(va, 10, 10);
va = AddByteVertexData(va, 0, 0, 255);
Draw(va);

or better (as chances are that you want to pass vertex data as arrays here and there):

VertexType vtype = {GLfloat, 2, GLubyte, 3};
void *va = MakeVertexArray(vtype);

float vertexPos[] = {10, 10};
byte vertexColor[] = {0, 0, 255};
va = AddVertexData(va, vertexPos, array_size(vertexPos));
va = AddVertexData(va, vertexColor, array_size(vertexColor));
Draw(va);

And that's basically plain old ugly C! How does this fare?

The code is obviously more verbose, true. But it also tells exactly what's going on without the need of -any- global information, in fact we're hardly using types at all. We don't have to look around or to add comments, and we can exactly imagine from the call site the logic behind the implementation. 
It's not "neat" at all, but it's not "magical" anymore. It's also much more "grep-able" which is a great added bonus.

And now try to imagine the implementation of both options. How much simpler and smaller the C version will be? We gained clarity both at call-site and in the implementation using a much less "powerful" paradigm!

Other tradeoffs could be made, templates without the overloading would already be more explicit, or we could use a fixed array class for example to pass data safely, but the C-style version scores very well in terms of lines of code versus computation (actual work done, bits changed doing useful stuff) and locality of semantics (versus having to know the environment to understand what the code does).

An objection could be that the templated and overloaded version is faster, because it knows statically the sizes of the arrays and the types and so on, but it's quite moot. The -only- reason the template knows it's really because it's inline, and it -must- be. The C-style version offers the option of being inline for performance, or not, if you don't need and want the bloat.

It's true that the fancy typed C++ version is more safe, and it is equally true that such safety could be achieved with a better type system. Not -all- language innovations carry a burden on the mental model of program execution.

Already in C99 for example you could use variable length arrays and compound literals to somewhat ameliorate the situation, but really a simple solution that would go a long way would be array literals and the support of knowing array sizes passed to functions (an optional implicit extra parameter).

Note that I wrote this in C++, but it's not about C++, even in metaprogramming environments that are MUCH better than C++, like Lisp with hygienic macros, metaprogramming should be used with caution.
In Lisp it's easy to introduce all kind of new constructs that look nifty and shorten the code, but each time you add one it's one more foreign syntax that is local to a context and people have to learn and recognize. Not to be abused.
Also, this is valid for any abstraction, abstraction always is a matter of trade-offs. We sometimes forget this, and start "loving" the "beauty" of generic code even if it's not actually the best choice for a problem.

14 June, 2014

Where is my C++ replacement?

Nowadays I can safely say the OO fad, at least for the slice of the programming world I deal with, is over.
Not that we're not using classes anymore (and why should we not), but most good studios don't think OOP and thanks to a few high-profile programmers who spoke up (more amusing reads in the "The rest and the C++ box of chocolate" section here) people are thinking about what programs do (transform data) instead of how to create hierarchies.
I can't remember last time someone dared to ask about Design Patterns at a coding interview (or anywhere). Good.

Better yet, not only OOP has been under attack, but C++ as well. Metaprogramming via C++ templates? Not cool. Boost? Laughed at. I wouldn't be surprised if Alexandrescu even thought policies (via C++ templates) are crazy...
And not only we subset C++ into a manageable, almost-sane language (via coding standards and linters), but more and more people are even going back to a C-like C++ style.

So it begs the question. If we're so unhappy about OO and even recognize many of the faults of C++, where is the replacement? Why are we still all using C++?
I wrote a big, followed post on programming languages back in 2011 and I haven't updated it yet because I don't feel too much has changed...

Addendum: I didn't really mean to discuss language features, just success and adoption in my field and some of the reasons I believe are behind it. But there was something I wanted to add when it comes to languages and I wrote it here

- Engineers should know about marketing

And people. And entrepreneurship. Really. I'll be writing some of the same considerations I've expressed in my last post about graphics APIs, but it's not a surprise, because they are universal.

So, let's do it again. How close are "C++ replacements" of being viable for us? What do we want from a new language?
- Solve pain (big returns). Oh, a new multi-paradigm, procedural, object-oriented, functional, generic language with type inference and dependent types? Cool! How does it make me happier? How does it solve my problems?
- Don't create pain (low investment). Legacy is a wall for the adoption of any new language. How easy is your new language to integrate in my workflow? Does it work with my other languages? Tools? IDEs?

Now, armed again with this obvious metric, let's see how some languages fare from the perspective of rendering/AAA videogames...

- D language

D should be the most obvious candidate as a C++ replacement. D is an attempt at a C++ "done right", learning from C++ mistakes, complexity issues, bad defaults and so on while keeping the feeling of a "systems" language, C-like, compiled.
It's not a "high-performance" language (in the sense of numerical HPC, even if it does, at least, support 128bit SIMD as part of the -standard- library, so in that respect it's an evolution) but, like C++, is relatively low-overhead on top of C.

So why doesn't it fly (at least yet)? Well, in my opinion the problem is that nowadays "fixing" C++ is not quite enough to switch. We already "fixed" C++ largely by writing sane libraries, by having great compilers and IDEs, detecting issues with linters and so on.

Yes, it would be great to have a language without so many pitfalls, but we worked around most of them. What does D do that our own "fixed" C++ subsets don't? 
Garbage Collection, which is important for modularity but "systems" programmers hate (mostly out of prejudice and ignorance, really). Better templates to a community which is quite (rightfully) scared of meta-programming.

It doesn't even make adoption too hard, there are a number of compilers out there, even a LLVM based one (which guarantees good platform support also for the future), Visual Studio integration, it can natively call C functions with no overhead (but not C++ in general, even if it's an understandable decision).

It's good. But not compelling (enough) reason to switch. It quite clearly aims to be used for -any- code that C++ is used for by being prettier. That's like trying to replace EBay with a new site that has the same business plan as EBay but with a better interface (and no marketing)...

It almost seems to be made thinking that you can do something better and then people will flock to it because well, it's better. But things almost never go this way. Successful languages solve a need for some people and they often start with a focused niche of adopters and then if they're lucky they expand 
Java, JavaScript, Perl, Python, all started in such a way. Some languages do arguably succeeded at being "just better" (or anyhow started from scratch to replace some others), but these they had huge groups pushing them behind them, like Microsoft did with C#.

- Rust

Rust departs from C++ more than D and many people are looking at it with some hope it could be the systems language of the future. It's in its early stages of development still (v 0.10 as of today) but it starts well by having a big bold target: concurrency and safety, with low overhead via an ingenious type system.

The latter attracted the interest of gamedevs (even if today, in its early implementation, Rust is not super fast), as while most type-safe languages have to rely on Garbage Collection, Rust does without, employing a more complex static type system instead.

It's very interesting but for the time being and the foreseeable future for us (game/rendering programmers) Rust's aim is not so enticing.

We solved concurrency with a bunch of big parallel_for over large data arrays and some dependencies between a bunch of jobs carrying such loops.
We don't share data, we process arrays with very explicit flows and we know how to do this quite well already. Also, this organization is quite important for performance, a bunch of incoherent jobs would not use resources quite as well.

If we needed something "more" for less predictable computations (AI... gameplay...) we could employ messages (actors), but that kind of async computing is much slower. C++ doesn't make any of this trivial (of course!) but once it's up and running we don't have much to fear (that's also why fancy models like transactional shared memory are, I think, completely irrelevant to us).

Safety could be a bit more interesting as safer type system could save us some time, if they don't end up in increased complexity. But, even if it's true that sometimes we have to chase horrific bugs, considering that we're working on the least safe language in the world, I'd say we're not doing badly.
Or maybe we are, but just think about all the times you considered a big refactoring to make the code more safe, and didn't manage to justify it well enough in terms of returns... And that's a much less ambitious thing than changing language!

I'd like to maintain a database of bugs (time spent, bug category and so on) in our industry to data-mine, many people are "scared" of allocation and memory related one but to be honest I wonder how much impact they have, armed with a good debugging allocator (logging, guard pages, pattern and canary checking and so on).

Maybe certain games do care more about safety (e.g. online servers) and maybe I'm biased being a rendering engineer, our code has (should have) simple data flows and really hard bugs are usually related to hardware (e.g. synchronization with GPU).
Not that we would not love to have Rust's benefits, I simply don't think though they are important enough to pay the price of a new language. 

Nonetheless, it's a very interesting one to follow though, and it's still in its early stages, so I might change my ideas.

- Golang

Go is somehow similar to Rust at least as far as they are both C++ replacements born "out of the web" (even if Go was thought mostly for server-side stuff while Rust's first application aims to be a browser), but it could be a bit more interesting because of one of its objectives.

In many ways it's not a great language (especially right now) but it is promising.

On one hand it's quite a bit simpler, with a much more familiar type system (also due to the fact that it doesn't try to enforce memory safety without a GC), so it requires a smaller investment, not quite as ground-breaking, but very practical.

On the other hand it has at least one very enticing core design feature for us: it's built for fast iteration, explicitly, and that is, finally, something we do really strongly care about in our day to day work!

We go to great lengths to avoid long iteration times, and C++ is so terrible in that respect that we even sacrifice performance with scripting or worse with "data-driven" logic (not data-driven programming, but logic, that's to say with data that doesn't express a Turing-complete language but yet expresses some of the logic that we need, usually requiring some very badly written interpreter of sorts).

It's also backed by a huge corporation, so it solved the "early adopters" issue easily.

Yet, as it stands now there is still too much friction for us to consider it: it doesn't quite work in our environments, it has a slow C interop and moreover most of its language features are not too relevant for us to a degree where just using C would be not much different in terms of expressiveness.

It's a nifty, simple language that has a strong backing and will probably succeed, but hardly for us, even if in principle it starts going somewhere we really need languages to go...

- Irrelevance...

That's a big problem, a substantial reason about why I think we didn't find a C++ replacement.

It's not that all new languages don't understand what's needed for success, but most languages that do understand that are just interested in other fields. 

Web really won. Python, Javascript (and the many languages built on top of it), Go, Rust, Ruby, Java (and the many languages built on top of the JVM).

If you look around the key is not to find a C++ replacement, that already widely happened in many performance critical fields. It's to find our C++ replacement for our field that doesn't see anymore much language activity.

Application languages also left us behind, C# is great as a language, clean, advanced, fast iteration, modern support for tools (reflection, code generation, annotations...) and the one that flirted with games most closely... 
But it just seems that nobody is -really- concerned about making a static compiler for (most of) it that has the performance guarantees (contracts on stack, value-passing, inlining...) and the (zero cost) interoperability we'd like for it to really fly.

High-performance computing does many of the same things we do, going wide with parallel instruction (SIMD), threads, GPUs. But they are not concerned with meshing with C/C++ almost at all, they are not low-overhead systems languages. 
When you have to process arrays of thousand of elements, even the cost of interpreting each operation that will then be executed wide, is not important, so HPC languages tend to be much higher-level that we'd like.

Also, even when they are well integrated with C (i.e. C++AMP and OpenMP or the excellent ISPC, Julia is also worth a look), HPC takes care of small computational kernels which we know well how to code even all the way down to assembly, we're not too concerned about that.
Maybe in the future this will shift if we see an actual need of targeting heterogeneous architectures with a single code base, but right now that seems not too important.

Maybe mobile app development will save us, the irony. Not that I'm advocating Swift right now but it's certainly interesting that we see much more language dynamism there.

- In a perfect world...

How could a language really please us? What should the next C++ even look like to make us happy? C++ was a small set of macros on top of C that added a feature that people at the time wanted, OO. What's the killer feature for us, today?

Nice is not enough. D is nice. Rust has lots of nice features and we can debate a lot about nice language features we'd like to have, and things that should be fixed, and I do enjoy that and I do love languages.

But, I don't think that's how change happens, it doesn't happen because something is simply better. Not even if it's much better, not in big fields with lots of legacy (and not if "better" doesn't necessarily translate to making lots more money as well or spending lots less).

As engineers we sometimes tend to underestimate just how much something has to be conveniente in order to be adopted. It's not only the technical plane (not at all). It's not only, the tools, the code legacy, the documentation.
When all these are done, there is still the community to take care, the education, what your programmers know and what programmers you want to hire know... And when you have all these in line you still need to overcome people laziness, biases, irrationality (all defects I partake in myself). 
And even if all is there you simply might not have the resources to pay for the cost, even if the investment is positive in the long run, or, which is actually harder, be able to prove that such investment will make more money!

It's a mountain. That's why C++ survives for us.

Back to the beginning, cost/return, how can we find a disruptive change in that equation? I think for us a new language can succeed only if it fulfills two requirements.

One is to be very low-cost, preferably "free", like C++ was (C with Classes). Compiling down to C++ is a good option to have, makes us feel safe. That's why C++ superset and subset, are already very popular today: we lint, we parse, we code-generate... reflection, static-checking, enforcing of language subsets, extensions...

The other is to be so compelling for our use cases, that we can't do without. And in our industry that means I think something that saves order of magnitudes in effort, time and money.
We're good with performance already even if we have to sweat and we don't have standard vectors or good standard libraries and so on. 
We don't care (IMHO) enough about safety, that we are becoming better at achieving with tools and static checkers. Not concurrency, that we solved. Not even simplicity, because we can "simplify" already our work by ignoring complex stuff... But productivity, that is my bet.

- Speed of light

If I have to point at what is most needed for productivity, I'd say interactivity. Interactive visualization, manipulation, REPLs, exploratory programming, live-coding.

That's so badly needed in our industry that we often just pay the cost of integrating Lua (or craft other scripts), but that can work only in certain parts of the codebase...

Why did Lua succeed? It's a scripting language! Why aren't we hot-swapping D instead? We sacrificed runtime performance, to what? To both productivity and cost!
Lua is easy(-ish... with some modifications...) to integrate, maybe other languages could be as easy but crucially Lua being a portable interpreter guarantees it will work on any platform that supports C (or we can fix it to work, easily). And Lua is productive, allows interactive coding, it's even better than hot-reloading C++ in terms of iteration. 

Among the languages that are "safe", guaranteed to work with all our platforms (even in the futre) and that interop with C easily, and that allow live-coding, Lua is the fastest, so we picked it. Not for any language feature (actually the language itself is not really ideal and it heap-allocates a lot). It could have been gwbasic I think for what we cared about the syntax...

A language that meshes well with C/C++ codebases, that we can trust in its availability on all platforms (the option of a C/C++ codegen is a way to ensure that) but that offers fast iteration will succeed in our field. 
In fact I would gladly give up any of the C++11 features (even the few decent ones) for modules (preferably dynamic, but even static would increase code malleability), but of course the committee is a sad joke today so, they rather just add complexity to one of the most arcane languages out there.

I really think iteration time is the key, and approaching interactivity is a game changer. I would take any language, regardless of the details, if it's interactive. In fact I do, as a rendering engineer, I love shader programming even if shader languages are not great and their tools are not great, just because shaders are trivial to hot-swap.
It's such a disruptive advantage, and it's really the only thing that I can think of that is compelling enough for us to pay the price of a new language.

My best hope nowadays is LLVM, which seems it's more and more poised to be the common substrate for systems programming across platforms (windows is still not the best target though, but work is in progress). 
That could enable low-cost adoption of new languages, well integrated with C/C++ compiler and libraries, the same as JS is now the web common substrate for a lot of languages (or JVM is for server stuff).

03 June, 2014

Rate my API

Metal, Mantle, OpenGL's ADZO, GL|ES, DirectX 12... Not to mention the "secret" console ones. It's good to be a graphics API these days... And everybody is talking about them.
As I love to be "on trend" now you get my take on all this from hopefully a slightly different perspective.

To be honest, I initially wrote an article as a rant in reaction to the excellent post on modern OpenGL by Aras (Unity 3d) but then after Metal and some twitter chats I became persuaded I should write something a bit more "serious". Or at least, try...

- When is a graphics API sexy?

Various smart people are talking with nice detail about the technical merits of certain API design decision (e.g. Ryg and Timothy's exchanges on OpenGL: original, reply, re:re: and another one) so I won't add to that right now. I want instead to cast these discussion in a different and to me more relevant point of view. What do we really want from a API (or really any piece of software)? 

First and foremost will consider to adopt a technology if it's useful. It might seem obvious but apparently it's not. How many times have you seem projects that don't really work, yet spend time on aesthetic improvements?

Ease of use, documentation, great design, simplicity. All these attributes are completely irrelevant if the software doesn't do some compelling work. We can learn undocumented stuff, we can write our own tools, we are engineers and if there is something we need and there is a road open to obtaining it, we can achieve what we need. Of course we'd rather prefer not to endure pain, but pain is better than just not being able to do what we need to. Easy is better than hard, but hard is better than impossible.

Of course after we have something that does something we could be interested in, then if we'll adopt it or not depends on how much we want it divided how hard it is to achieve it. Cost/Benefit, unsurprisingly. To recap we want an API that is:
  • Working. Is actually implemented somewhere, the implementation actually works. If it's written on paper but it's not reliably deployed, we can safely ignore its existence. This is actually part of "useful" or of the benefits, but it's important enough I'd like to remark it here.
  • Useful. Does something that we need, in a market we're interested in. If I'm a AAA company and you make a great API that enables incredible graphics on a device that sold ten thousands units, that's not useful. If you provide a big speed improvement on a platform that is not performance bounds on my products, that's not so useful either. And so on.
  • Easy. Do I need to change my entire engine or workflow to adopt this API? That's the most pressing question. Then it comes documentation and support. Then tools. Then in general how nice the API design is. APIs usually work in a realm that is well-separated from the rest of the software, if your API requires to sacrifice a (virtual) goat each time I have to call it, it's probably still not going to make all my project bad, it's not going to "spread". If the bad API design "spreads" to the engine or the entire software then that's changing the workflow, so it goes back to the first, most important attribute.

Now we can go through a few graphics API on the table these days and see how they fare (in my humble opinion) according to this (obvious but sometimes forgotten) metric.

- OpenGL and AZDO

OpenGL has a long history, once upon a time was winning the graphics API war, started to lose ground and by the time DirectX9 was around, pretty much all games switched (a good history lesson was posted a while ago on stackoverflow).
That didn't stop the downward spiral, to the point that around the time DirectX11 came (2008, shipped with Windows 7 in 2009) even multi-plattform CG software (Maya, Max and the likes) moved to DirectX on Windows as the preferred frontend.
OpenGL  took years to catch up with a variety of patches to DirectX11 (g-truc reviews are awesome) and even longer to see robust implementation of these concepts. Still today the driver quality and the number of extensions supported varies wildly across vendors and OS (some examples here), ironically (and to make things worse) the platform where OpenGL has the best drivers across vendors today is Windows (that though doesn't even ship by default with OpenGL drivers but only an ancient OpenGL 1.1 to Dx layer) while OSX which is the best use-case for OpenGL in many ways, has drivers that tragically lag behind (but at least they are guaranteed to be updated with the OS!).

But, for all the faults it has, today OpenGL is offering something very worth considering, which is what cool people call AZDO (instance rendering on steroids): a way to reduce draw-call overhead by orders of magnitude by shifting the responsibility of working with resources from the CPU, generating commands that set said resources into the command buffer, to the GPU, that in this model follows a few indirections starting from a single pointer to tables of resources in memory.

To a degree AZDO is more a solution "around" OpenGL, rather than fixing OpenGL by creating an api that allows fast multithreaded command buffer generation, it provides a way to draw with minimal API/driver intervention.
In a way is a feat of engineering genius, instead of waiting for OpenGL to evolve its multithreading model it found a minimal set of extensions to work around it, on the other hand this probably will further delay the multithreading changes...

Results seem great, the downside of this approach is that all other modern competitors (DirectX12, Mantle, XBox One and PS4 libGNM) allow both to reduce CPU work by offloading state binding to GPU indirection and support fast CPU command buffer generation via multithreading and lower-level concepts, which map to more "conventional" engine pipelines a bit more easily. There is also a question about if the more indirect approach is always the fastest (i.e. when dealing with draws that generate little GPU work) but that's yet up to debate (as AZDO is very new and I'm not aware of comparisons pitting it against the other approach).

For AAA games. Today for most companies this means consoles first, Windows second, anything else is much less important. For these games having more performance on a platform that is not the primary authoring one and that is not often a performance bottleneck, at the cost of significant engine changes doesn't seem attractive at all (and with no debug tools, little documentation and so on...), especially considering that DirectX12 is coming, an alternative that promises to be as good but easier, better supported and that will also target Xbox One, thus covering two of the three target platforms.

A notable exception though are free-to-play games hugely popular in Asia that are not only usually Windows exclusive, but where Windows XP is still very relevant, which means no DirectX11 and even less DirectX12. For these games I guess OpenGL could be a great option today.
Note also that AZDO is currently not fully supported on Intel hardware (no bindless, MDO software emulated) so you'll probably need a fallback renderer as well, as Intel hardware is quite interesting for games at the lower end.

For applications. Most CGI applications are the worst-case scenarios for GPU efficiency, they tend to do lots of draws with very little actual work (wireframe drawing, little culling) and in not very optimized ways as well due to having to work with editable, unoptimized data and often also carrying legacy code or code not thought to achieve the best GPU performance. 
Also, shipping on multiple platforms is the norm while working across multiple vendors is less of a concern, NVidia has the golden share among CG studios and Intel is completely out of the picture, even only NVidia/Linux is probably a compelling enough target to consider "modern OpenGL there" and even more as Windows would benefit as well.
These things considered I would expect modern OpenGL to be something most applications will move towards, even if it might be a significant effort to do so.

Some more links:

- Mantle

AMD's Mantle is an clear example of a nice, good, easy API (exaggerated, but interesting praise here) that fails (in my opinion) to be really useful for shipped titles. On the technical level there's nothing to complain, it seems very reasonable and well done. 

For AAA games. Today Mantle works only on Windows with AMD hardware. That's a bit little, then again especially when DirectX12 is coming and AZDO is an alternative too. While it's most probably easier to deploy than AZDO (and I bet AMD is going to be willing to help, even if right now there might be no tools and so on), is also much less useful. Worse even if you consider that even on AMD hardware only certain CPU/GPU combinations are CPU limited.
It simply covers too little ground, I hoped at the beginning that AMD would come out sooner and with a PS4 layer as well, thus getting deployed by many projects that were looking for an easier way to target PS4 than figuring out libGNM. It didn't happen and that I think is the end of it. Some people were thinking it could have been a new cross-vendor standard, but it will -never- happen.
They did though score with Frostbite's support which pretty much means all EA games. But I would be very surprised if they didn't have to pay for that, and wonder how long it will last (as it's still a cost to support it, as it is supporting any platform)...

For applications. It's a bit more interesting there, as if you remove the consoles from your target then you're increasing the surface occupied by Windows. Also it's not unreasonable to think that Mantle could be ported on Linux. Unfortunately though NVidia is more popular than AMD for CG studios and that pretty much kills it.

For the people. There is something thought that needs to be praised a lot: AMD also has lots of great, public documentation about the working of its GPUs (Intel is not bad as well, NVidia is absolutely terrible, a sad joke) and tools that show the actual GPU shader working (i.e. shader disassembly) which is really great as it allows everybody to talk and share their findings without fearing NDAs.
This creates a positive ecosystem where everybody can work "close to the metal" and Mantle is part of that. Historically it just happens that the more people are able to hack, the most amazing things get created. See what happened after twenty years of C64 hacking (some examples here).
I expect all graphic researchers to focus on GCN from now on.

- DirectX12

It's hard to criticize DirectX11, especially if you consider that it was presented in 2008 and what was the state of the other APIs at that point. It changed everything, mapping better to modern GPU concepts, introduced Tessellation and Compute Shaders, looks great and easy, is reasonably well documented and supported, and it's very successful.

Arguably DirectX9 had better tools  (VSGD is horrible AND they killed Pix that was actually working fine), but that's hardly a fault of 11 and rather due to the loss of interest in PC gaming, nowadays things are getting much better. Consider that only now we're starting really to play with Compute Shaders for example, because next-gen consoles arrived, but we had them for five years now! It was so ahead of time that it needed only rather minor updates in 11.1 and 11.2.

The only, big issue with 11 is that Microsoft wants to make things simpler than they really should be, for no great reason. So 11 shipped with certain "contracts" in its multithreading model that don't seem really useful or needed but hugely impacted the performance of multithreaded drivers to the point where multithreading is useful only if your application and not the driver is the bottleneck. 
If your code is fast enough, multithreaded Dx11 will actually be slower than single-threaded, which is clearly an issue. I suspect it could still be technically possibly to carve "fast paths" for applications swearing not to exercise the API in certain ways but probably it was simply not important enough for the PC gaming market and now 12 is coming, probably just in time...

For everything Microsoft. DirectX won on windows and it also ships on Microsoft consoles. I can't comment much on 12 and it's not finished yet. Hardly it will be displaced on Windows though, especially for games.

- Metal

Metal is Apple's Mantle. On my very personal biased poll from the reactions I've read on my twitter feed, it has not been received with the same enthusiasm as AMD's initiative. Some explained to me it's because Mantle promised to be a multi-vendor API while Metal didn't. Oh Apple, outclassed at marketing from AMD, you don't know how to appeal the engineers, next time say it's designed to be open...

I've also seen many people complaining this is foul play designed only to create vendor lock-in, a mere marketing move. I don't agree, and if you think it's only marketing then you should prove that's possible to write an equally fast driver in today's OpenGL|ES.
I believe that's not technically possible and I believe OpenGL|ES is plagued by many of the same defects of desktop OpenGL, only much worse as it has no AZDO and it ships on platforms that are very resource constrained, so where performance and efficiency matters even more!
It would have probably been possible to carve fast-paths and patch ES with proprietary extensions that would have been a bit more friendly to the ecosystem (extension often get incorporated into the standard down the line), but if it reaches the point where most of the rendering would have gone through extensions what's the point, really?

Actually this might be for the best even for the overall ecosystem, as it's a bigger kick-in-the-nuts than everything else could have been, and when many vendors on Android are shipping drivers that are just the -worst- software ever and Khronos shows to be slow to evolve and ridden with politics, a hard kick is what's most needed.
It's very new as we speak and I haven't had an in depth look into it, so I might edit this section later on.

For games. iOS has still the golden share of mobile gaming, with many more exclusives and games shipping "first" of that system than the competitor, but the gap is not huge. Also, most games are still 2d and not too demanding on the hardware, so for a lot of people a degree of portability will matter more than a magnitude improvement in drawcall performance. 
But, for the games that do care about performance, Metal is just great, iOS is big enough that even if your game is not exclusive, it's very reasonable to think about spending money to implement unique features to make your game nicer on it. 
It's true that Metal won't be available on older Apple hardware but Apple has always succeeded in giving people reasons to update both their software and their hardware, so that's not probably a big concern.

- Conclusions

Learn AZDO, play with Mantle, ship with DirectX.

If you're doing an indie title do use a rendering library or engine (I keep pointing at https://github.com/bkaradzic/bgfx but it's just an example) so you'll still ship with the best API for each platform and with the least amount of headaches. If you really love toying with the graphics API directly then I guess a flavor of OpenGL that is supported across platforms could be nice (3.3 if you care about Intel/Linux right now).

If a market is interesting enough for a given application and the vendors there decide on their own API, like it's happening for Metal and happened for DirectX, I'd welcome that.

The problem with many of the APIs we're seeing is not that they divide the market, but that they try to do so in segments that are too small and uninteresting to specifically target. If for example Linux decided on its own 3d API for games I doubt that would be at all interesting...
If AMD shipped Mantle on consoles and PC then it could have been big enough of a segment to target, PC-only is not. If NVidia GameWorks offered a compelling solution on consoles, guess what, it would see a bigger adoption as well, while right now I suspect it will be used only on projects where NVidia is directly involved.

Most projects already have to ship with an abstraction layer of sorts, many of these are available, in practice the idea of using OpenGL directly to ship products across platforms doesn't exist (except for very small projects and some research code).
It's always best to have to write (or use via third-party libraries) lower-level code on things that we understand that have to fight with very opaque, wildly different implementations of a supposedly standard API. 

In fact I bet that practically no (a very tiny number) gamedev knows even the basics of what a driver does and why certain API decisions led to slow CPU performance. Also the number of people not using third-party game engines especially for indie work is dwindling.

In theory a single API is better, in practice, today, it isn't and that's why the emergence of these low-level libraries is not just a marketing plot but actually a reasonable technical solution.