Search this blog

14 June, 2014

Where is my C++ replacement?

Nowadays I can safely say the OO fad, at least for the slice of the programming world I deal with, is over.
Not that we're not using classes anymore (and why should we not), but most good studios don't think OOP and thanks to a few high-profile programmers who spoke up (more amusing reads in the "The rest and the C++ box of chocolate" section here) people are thinking about what programs do (transform data) instead of how to create hierarchies.
I can't remember last time someone dared to ask about Design Patterns at a coding interview (or anywhere). Good.

Better yet, not only OOP has been under attack, but C++ as well. Metaprogramming via C++ templates? Not cool. Boost? Laughed at. I wouldn't be surprised if Alexandrescu even thought policies (via C++ templates) are crazy...
And not only we subset C++ into a manageable, almost-sane language (via coding standards and linters), but more and more people are even going back to a C-like C++ style.

So it begs the question. If we're so unhappy about OO and even recognize many of the faults of C++, where is the replacement? Why are we still all using C++?
I wrote a big, followed post on programming languages back in 2011 and I haven't updated it yet because I don't feel too much has changed...

Addendum: I didn't really mean to discuss language features, just success and adoption in my field and some of the reasons I believe are behind it. But there was something I wanted to add when it comes to languages and I wrote it here

- Engineers should know about marketing

And people. And entrepreneurship. Really. I'll be writing some of the same considerations I've expressed in my last post about graphics APIs, but it's not a surprise, because they are universal.

So, let's do it again. How close are "C++ replacements" of being viable for us? What do we want from a new language?
- Solve pain (big returns). Oh, a new multi-paradigm, procedural, object-oriented, functional, generic language with type inference and dependent types? Cool! How does it make me happier? How does it solve my problems?
- Don't create pain (low investment). Legacy is a wall for the adoption of any new language. How easy is your new language to integrate in my workflow? Does it work with my other languages? Tools? IDEs?

Now, armed again with this obvious metric, let's see how some languages fare from the perspective of rendering/AAA videogames...

- D language

D should be the most obvious candidate as a C++ replacement. D is an attempt at a C++ "done right", learning from C++ mistakes, complexity issues, bad defaults and so on while keeping the feeling of a "systems" language, C-like, compiled.
It's not a "high-performance" language (in the sense of numerical HPC, even if it does, at least, support 128bit SIMD as part of the -standard- library, so in that respect it's an evolution) but, like C++, is relatively low-overhead on top of C.

So why doesn't it fly (at least yet)? Well, in my opinion the problem is that nowadays "fixing" C++ is not quite enough to switch. We already "fixed" C++ largely by writing sane libraries, by having great compilers and IDEs, detecting issues with linters and so on.

Yes, it would be great to have a language without so many pitfalls, but we worked around most of them. What does D do that our own "fixed" C++ subsets don't? 
Garbage Collection, which is important for modularity but "systems" programmers hate (mostly out of prejudice and ignorance, really). Better templates to a community which is quite (rightfully) scared of meta-programming.

It doesn't even make adoption too hard, there are a number of compilers out there, even a LLVM based one (which guarantees good platform support also for the future), Visual Studio integration, it can natively call C functions with no overhead (but not C++ in general, even if it's an understandable decision).

It's good. But not compelling (enough) reason to switch. It quite clearly aims to be used for -any- code that C++ is used for by being prettier. That's like trying to replace EBay with a new site that has the same business plan as EBay but with a better interface (and no marketing)...

It almost seems to be made thinking that you can do something better and then people will flock to it because well, it's better. But things almost never go this way. Successful languages solve a need for some people and they often start with a focused niche of adopters and then if they're lucky they expand 
Java, JavaScript, Perl, Python, all started in such a way. Some languages do arguably succeeded at being "just better" (or anyhow started from scratch to replace some others), but these they had huge groups pushing them behind them, like Microsoft did with C#.

- Rust

Rust departs from C++ more than D and many people are looking at it with some hope it could be the systems language of the future. It's in its early stages of development still (v 0.10 as of today) but it starts well by having a big bold target: concurrency and safety, with low overhead via an ingenious type system.

The latter attracted the interest of gamedevs (even if today, in its early implementation, Rust is not super fast), as while most type-safe languages have to rely on Garbage Collection, Rust does without, employing a more complex static type system instead.

It's very interesting but for the time being and the foreseeable future for us (game/rendering programmers) Rust's aim is not so enticing.

We solved concurrency with a bunch of big parallel_for over large data arrays and some dependencies between a bunch of jobs carrying such loops.
We don't share data, we process arrays with very explicit flows and we know how to do this quite well already. Also, this organization is quite important for performance, a bunch of incoherent jobs would not use resources quite as well.

If we needed something "more" for less predictable computations (AI... gameplay...) we could employ messages (actors), but that kind of async computing is much slower. C++ doesn't make any of this trivial (of course!) but once it's up and running we don't have much to fear (that's also why fancy models like transactional shared memory are, I think, completely irrelevant to us).

Safety could be a bit more interesting as safer type system could save us some time, if they don't end up in increased complexity. But, even if it's true that sometimes we have to chase horrific bugs, considering that we're working on the least safe language in the world, I'd say we're not doing badly.
Or maybe we are, but just think about all the times you considered a big refactoring to make the code more safe, and didn't manage to justify it well enough in terms of returns... And that's a much less ambitious thing than changing language!

I'd like to maintain a database of bugs (time spent, bug category and so on) in our industry to data-mine, many people are "scared" of allocation and memory related one but to be honest I wonder how much impact they have, armed with a good debugging allocator (logging, guard pages, pattern and canary checking and so on).

Maybe certain games do care more about safety (e.g. online servers) and maybe I'm biased being a rendering engineer, our code has (should have) simple data flows and really hard bugs are usually related to hardware (e.g. synchronization with GPU).
Not that we would not love to have Rust's benefits, I simply don't think though they are important enough to pay the price of a new language. 

Nonetheless, it's a very interesting one to follow though, and it's still in its early stages, so I might change my ideas.

- Golang

Go is somehow similar to Rust at least as far as they are both C++ replacements born "out of the web" (even if Go was thought mostly for server-side stuff while Rust's first application aims to be a browser), but it could be a bit more interesting because of one of its objectives.

In many ways it's not a great language (especially right now) but it is promising.

On one hand it's quite a bit simpler, with a much more familiar type system (also due to the fact that it doesn't try to enforce memory safety without a GC), so it requires a smaller investment, not quite as ground-breaking, but very practical.

On the other hand it has at least one very enticing core design feature for us: it's built for fast iteration, explicitly, and that is, finally, something we do really strongly care about in our day to day work!

We go to great lengths to avoid long iteration times, and C++ is so terrible in that respect that we even sacrifice performance with scripting or worse with "data-driven" logic (not data-driven programming, but logic, that's to say with data that doesn't express a Turing-complete language but yet expresses some of the logic that we need, usually requiring some very badly written interpreter of sorts).

It's also backed by a huge corporation, so it solved the "early adopters" issue easily.

Yet, as it stands now there is still too much friction for us to consider it: it doesn't quite work in our environments, it has a slow C interop and moreover most of its language features are not too relevant for us to a degree where just using C would be not much different in terms of expressiveness.

It's a nifty, simple language that has a strong backing and will probably succeed, but hardly for us, even if in principle it starts going somewhere we really need languages to go...

- Irrelevance...

That's a big problem, a substantial reason about why I think we didn't find a C++ replacement.

It's not that all new languages don't understand what's needed for success, but most languages that do understand that are just interested in other fields. 

Web really won. Python, Javascript (and the many languages built on top of it), Go, Rust, Ruby, Java (and the many languages built on top of the JVM).

If you look around the key is not to find a C++ replacement, that already widely happened in many performance critical fields. It's to find our C++ replacement for our field that doesn't see anymore much language activity.

Application languages also left us behind, C# is great as a language, clean, advanced, fast iteration, modern support for tools (reflection, code generation, annotations...) and the one that flirted with games most closely... 
But it just seems that nobody is -really- concerned about making a static compiler for (most of) it that has the performance guarantees (contracts on stack, value-passing, inlining...) and the (zero cost) interoperability we'd like for it to really fly.

High-performance computing does many of the same things we do, going wide with parallel instruction (SIMD), threads, GPUs. But they are not concerned with meshing with C/C++ almost at all, they are not low-overhead systems languages. 
When you have to process arrays of thousand of elements, even the cost of interpreting each operation that will then be executed wide, is not important, so HPC languages tend to be much higher-level that we'd like.

Also, even when they are well integrated with C (i.e. C++AMP and OpenMP or the excellent ISPC, Julia is also worth a look), HPC takes care of small computational kernels which we know well how to code even all the way down to assembly, we're not too concerned about that.
Maybe in the future this will shift if we see an actual need of targeting heterogeneous architectures with a single code base, but right now that seems not too important.

Maybe mobile app development will save us, the irony. Not that I'm advocating Swift right now but it's certainly interesting that we see much more language dynamism there.

- In a perfect world...

How could a language really please us? What should the next C++ even look like to make us happy? C++ was a small set of macros on top of C that added a feature that people at the time wanted, OO. What's the killer feature for us, today?

Nice is not enough. D is nice. Rust has lots of nice features and we can debate a lot about nice language features we'd like to have, and things that should be fixed, and I do enjoy that and I do love languages.

But, I don't think that's how change happens, it doesn't happen because something is simply better. Not even if it's much better, not in big fields with lots of legacy (and not if "better" doesn't necessarily translate to making lots more money as well or spending lots less).

As engineers we sometimes tend to underestimate just how much something has to be conveniente in order to be adopted. It's not only the technical plane (not at all). It's not only, the tools, the code legacy, the documentation.
When all these are done, there is still the community to take care, the education, what your programmers know and what programmers you want to hire know... And when you have all these in line you still need to overcome people laziness, biases, irrationality (all defects I partake in myself). 
And even if all is there you simply might not have the resources to pay for the cost, even if the investment is positive in the long run, or, which is actually harder, be able to prove that such investment will make more money!

It's a mountain. That's why C++ survives for us.

Back to the beginning, cost/return, how can we find a disruptive change in that equation? I think for us a new language can succeed only if it fulfills two requirements.

One is to be very low-cost, preferably "free", like C++ was (C with Classes). Compiling down to C++ is a good option to have, makes us feel safe. That's why C++ superset and subset, are already very popular today: we lint, we parse, we code-generate... reflection, static-checking, enforcing of language subsets, extensions...

The other is to be so compelling for our use cases, that we can't do without. And in our industry that means I think something that saves order of magnitudes in effort, time and money.
We're good with performance already even if we have to sweat and we don't have standard vectors or good standard libraries and so on. 
We don't care (IMHO) enough about safety, that we are becoming better at achieving with tools and static checkers. Not concurrency, that we solved. Not even simplicity, because we can "simplify" already our work by ignoring complex stuff... But productivity, that is my bet.

- Speed of light

If I have to point at what is most needed for productivity, I'd say interactivity. Interactive visualization, manipulation, REPLs, exploratory programming, live-coding.

That's so badly needed in our industry that we often just pay the cost of integrating Lua (or craft other scripts), but that can work only in certain parts of the codebase...

Why did Lua succeed? It's a scripting language! Why aren't we hot-swapping D instead? We sacrificed runtime performance, to what? To both productivity and cost!
Lua is easy(-ish... with some modifications...) to integrate, maybe other languages could be as easy but crucially Lua being a portable interpreter guarantees it will work on any platform that supports C (or we can fix it to work, easily). And Lua is productive, allows interactive coding, it's even better than hot-reloading C++ in terms of iteration. 

Among the languages that are "safe", guaranteed to work with all our platforms (even in the futre) and that interop with C easily, and that allow live-coding, Lua is the fastest, so we picked it. Not for any language feature (actually the language itself is not really ideal and it heap-allocates a lot). It could have been gwbasic I think for what we cared about the syntax...

A language that meshes well with C/C++ codebases, that we can trust in its availability on all platforms (the option of a C/C++ codegen is a way to ensure that) but that offers fast iteration will succeed in our field. 
In fact I would gladly give up any of the C++11 features (even the few decent ones) for modules (preferably dynamic, but even static would increase code malleability), but of course the committee is a sad joke today so, they rather just add complexity to one of the most arcane languages out there.

I really think iteration time is the key, and approaching interactivity is a game changer. I would take any language, regardless of the details, if it's interactive. In fact I do, as a rendering engineer, I love shader programming even if shader languages are not great and their tools are not great, just because shaders are trivial to hot-swap.
It's such a disruptive advantage, and it's really the only thing that I can think of that is compelling enough for us to pay the price of a new language.

My best hope nowadays is LLVM, which seems it's more and more poised to be the common substrate for systems programming across platforms (windows is still not the best target though, but work is in progress). 
That could enable low-cost adoption of new languages, well integrated with C/C++ compiler and libraries, the same as JS is now the web common substrate for a lot of languages (or JVM is for server stuff).

03 June, 2014

Rate my API

Metal, Mantle, OpenGL's ADZO, GL|ES, DirectX 12... Not to mention the "secret" console ones. It's good to be a graphics API these days... And everybody is talking about them.
As I love to be "on trend" now you get my take on all this from hopefully a slightly different perspective.

To be honest, I initially wrote an article as a rant in reaction to the excellent post on modern OpenGL by Aras (Unity 3d) but then after Metal and some twitter chats I became persuaded I should write something a bit more "serious". Or at least, try...

- When is a graphics API sexy?

Various smart people are talking with nice detail about the technical merits of certain API design decision (e.g. Ryg and Timothy's exchanges on OpenGL: original, reply, re:re: and another one) so I won't add to that right now. I want instead to cast these discussion in a different and to me more relevant point of view. What do we really want from a API (or really any piece of software)? 

First and foremost will consider to adopt a technology if it's useful. It might seem obvious but apparently it's not. How many times have you seem projects that don't really work, yet spend time on aesthetic improvements?

Ease of use, documentation, great design, simplicity. All these attributes are completely irrelevant if the software doesn't do some compelling work. We can learn undocumented stuff, we can write our own tools, we are engineers and if there is something we need and there is a road open to obtaining it, we can achieve what we need. Of course we'd rather prefer not to endure pain, but pain is better than just not being able to do what we need to. Easy is better than hard, but hard is better than impossible.

Of course after we have something that does something we could be interested in, then if we'll adopt it or not depends on how much we want it divided how hard it is to achieve it. Cost/Benefit, unsurprisingly. To recap we want an API that is:
  • Working. Is actually implemented somewhere, the implementation actually works. If it's written on paper but it's not reliably deployed, we can safely ignore its existence. This is actually part of "useful" or of the benefits, but it's important enough I'd like to remark it here.
  • Useful. Does something that we need, in a market we're interested in. If I'm a AAA company and you make a great API that enables incredible graphics on a device that sold ten thousands units, that's not useful. If you provide a big speed improvement on a platform that is not performance bounds on my products, that's not so useful either. And so on.
  • Easy. Do I need to change my entire engine or workflow to adopt this API? That's the most pressing question. Then it comes documentation and support. Then tools. Then in general how nice the API design is. APIs usually work in a realm that is well-separated from the rest of the software, if your API requires to sacrifice a (virtual) goat each time I have to call it, it's probably still not going to make all my project bad, it's not going to "spread". If the bad API design "spreads" to the engine or the entire software then that's changing the workflow, so it goes back to the first, most important attribute.

Now we can go through a few graphics API on the table these days and see how they fare (in my humble opinion) according to this (obvious but sometimes forgotten) metric.

- OpenGL and AZDO

OpenGL has a long history, once upon a time was winning the graphics API war, started to lose ground and by the time DirectX9 was around, pretty much all games switched (a good history lesson was posted a while ago on stackoverflow).
That didn't stop the downward spiral, to the point that around the time DirectX11 came (2008, shipped with Windows 7 in 2009) even multi-plattform CG software (Maya, Max and the likes) moved to DirectX on Windows as the preferred frontend.
OpenGL  took years to catch up with a variety of patches to DirectX11 (g-truc reviews are awesome) and even longer to see robust implementation of these concepts. Still today the driver quality and the number of extensions supported varies wildly across vendors and OS (some examples here), ironically (and to make things worse) the platform where OpenGL has the best drivers across vendors today is Windows (that though doesn't even ship by default with OpenGL drivers but only an ancient OpenGL 1.1 to Dx layer) while OSX which is the best use-case for OpenGL in many ways, has drivers that tragically lag behind (but at least they are guaranteed to be updated with the OS!).

But, for all the faults it has, today OpenGL is offering something very worth considering, which is what cool people call AZDO (instance rendering on steroids): a way to reduce draw-call overhead by orders of magnitude by shifting the responsibility of working with resources from the CPU, generating commands that set said resources into the command buffer, to the GPU, that in this model follows a few indirections starting from a single pointer to tables of resources in memory.

To a degree AZDO is more a solution "around" OpenGL, rather than fixing OpenGL by creating an api that allows fast multithreaded command buffer generation, it provides a way to draw with minimal API/driver intervention.
In a way is a feat of engineering genius, instead of waiting for OpenGL to evolve its multithreading model it found a minimal set of extensions to work around it, on the other hand this probably will further delay the multithreading changes...

Results seem great, the downside of this approach is that all other modern competitors (DirectX12, Mantle, XBox One and PS4 libGNM) allow both to reduce CPU work by offloading state binding to GPU indirection and support fast CPU command buffer generation via multithreading and lower-level concepts, which map to more "conventional" engine pipelines a bit more easily. There is also a question about if the more indirect approach is always the fastest (i.e. when dealing with draws that generate little GPU work) but that's yet up to debate (as AZDO is very new and I'm not aware of comparisons pitting it against the other approach).

For AAA games. Today for most companies this means consoles first, Windows second, anything else is much less important. For these games having more performance on a platform that is not the primary authoring one and that is not often a performance bottleneck, at the cost of significant engine changes doesn't seem attractive at all (and with no debug tools, little documentation and so on...), especially considering that DirectX12 is coming, an alternative that promises to be as good but easier, better supported and that will also target Xbox One, thus covering two of the three target platforms.

A notable exception though are free-to-play games hugely popular in Asia that are not only usually Windows exclusive, but where Windows XP is still very relevant, which means no DirectX11 and even less DirectX12. For these games I guess OpenGL could be a great option today.
Note also that AZDO is currently not fully supported on Intel hardware (no bindless, MDO software emulated) so you'll probably need a fallback renderer as well, as Intel hardware is quite interesting for games at the lower end.

For applications. Most CGI applications are the worst-case scenarios for GPU efficiency, they tend to do lots of draws with very little actual work (wireframe drawing, little culling) and in not very optimized ways as well due to having to work with editable, unoptimized data and often also carrying legacy code or code not thought to achieve the best GPU performance. 
Also, shipping on multiple platforms is the norm while working across multiple vendors is less of a concern, NVidia has the golden share among CG studios and Intel is completely out of the picture, even only NVidia/Linux is probably a compelling enough target to consider "modern OpenGL there" and even more as Windows would benefit as well.
These things considered I would expect modern OpenGL to be something most applications will move towards, even if it might be a significant effort to do so.

Some more links:

- Mantle

AMD's Mantle is an clear example of a nice, good, easy API (exaggerated, but interesting praise here) that fails (in my opinion) to be really useful for shipped titles. On the technical level there's nothing to complain, it seems very reasonable and well done. 

For AAA games. Today Mantle works only on Windows with AMD hardware. That's a bit little, then again especially when DirectX12 is coming and AZDO is an alternative too. While it's most probably easier to deploy than AZDO (and I bet AMD is going to be willing to help, even if right now there might be no tools and so on), is also much less useful. Worse even if you consider that even on AMD hardware only certain CPU/GPU combinations are CPU limited.
It simply covers too little ground, I hoped at the beginning that AMD would come out sooner and with a PS4 layer as well, thus getting deployed by many projects that were looking for an easier way to target PS4 than figuring out libGNM. It didn't happen and that I think is the end of it. Some people were thinking it could have been a new cross-vendor standard, but it will -never- happen.
They did though score with Frostbite's support which pretty much means all EA games. But I would be very surprised if they didn't have to pay for that, and wonder how long it will last (as it's still a cost to support it, as it is supporting any platform)...

For applications. It's a bit more interesting there, as if you remove the consoles from your target then you're increasing the surface occupied by Windows. Also it's not unreasonable to think that Mantle could be ported on Linux. Unfortunately though NVidia is more popular than AMD for CG studios and that pretty much kills it.

For the people. There is something thought that needs to be praised a lot: AMD also has lots of great, public documentation about the working of its GPUs (Intel is not bad as well, NVidia is absolutely terrible, a sad joke) and tools that show the actual GPU shader working (i.e. shader disassembly) which is really great as it allows everybody to talk and share their findings without fearing NDAs.
This creates a positive ecosystem where everybody can work "close to the metal" and Mantle is part of that. Historically it just happens that the more people are able to hack, the most amazing things get created. See what happened after twenty years of C64 hacking (some examples here).
I expect all graphic researchers to focus on GCN from now on.

- DirectX12

It's hard to criticize DirectX11, especially if you consider that it was presented in 2008 and what was the state of the other APIs at that point. It changed everything, mapping better to modern GPU concepts, introduced Tessellation and Compute Shaders, looks great and easy, is reasonably well documented and supported, and it's very successful.

Arguably DirectX9 had better tools  (VSGD is horrible AND they killed Pix that was actually working fine), but that's hardly a fault of 11 and rather due to the loss of interest in PC gaming, nowadays things are getting much better. Consider that only now we're starting really to play with Compute Shaders for example, because next-gen consoles arrived, but we had them for five years now! It was so ahead of time that it needed only rather minor updates in 11.1 and 11.2.

The only, big issue with 11 is that Microsoft wants to make things simpler than they really should be, for no great reason. So 11 shipped with certain "contracts" in its multithreading model that don't seem really useful or needed but hugely impacted the performance of multithreaded drivers to the point where multithreading is useful only if your application and not the driver is the bottleneck. 
If your code is fast enough, multithreaded Dx11 will actually be slower than single-threaded, which is clearly an issue. I suspect it could still be technically possibly to carve "fast paths" for applications swearing not to exercise the API in certain ways but probably it was simply not important enough for the PC gaming market and now 12 is coming, probably just in time...

For everything Microsoft. DirectX won on windows and it also ships on Microsoft consoles. I can't comment much on 12 and it's not finished yet. Hardly it will be displaced on Windows though, especially for games.

- Metal

Metal is Apple's Mantle. On my very personal biased poll from the reactions I've read on my twitter feed, it has not been received with the same enthusiasm as AMD's initiative. Some explained to me it's because Mantle promised to be a multi-vendor API while Metal didn't. Oh Apple, outclassed at marketing from AMD, you don't know how to appeal the engineers, next time say it's designed to be open...

I've also seen many people complaining this is foul play designed only to create vendor lock-in, a mere marketing move. I don't agree, and if you think it's only marketing then you should prove that's possible to write an equally fast driver in today's OpenGL|ES.
I believe that's not technically possible and I believe OpenGL|ES is plagued by many of the same defects of desktop OpenGL, only much worse as it has no AZDO and it ships on platforms that are very resource constrained, so where performance and efficiency matters even more!
It would have probably been possible to carve fast-paths and patch ES with proprietary extensions that would have been a bit more friendly to the ecosystem (extension often get incorporated into the standard down the line), but if it reaches the point where most of the rendering would have gone through extensions what's the point, really?

Actually this might be for the best even for the overall ecosystem, as it's a bigger kick-in-the-nuts than everything else could have been, and when many vendors on Android are shipping drivers that are just the -worst- software ever and Khronos shows to be slow to evolve and ridden with politics, a hard kick is what's most needed.
It's very new as we speak and I haven't had an in depth look into it, so I might edit this section later on.

For games. iOS has still the golden share of mobile gaming, with many more exclusives and games shipping "first" of that system than the competitor, but the gap is not huge. Also, most games are still 2d and not too demanding on the hardware, so for a lot of people a degree of portability will matter more than a magnitude improvement in drawcall performance. 
But, for the games that do care about performance, Metal is just great, iOS is big enough that even if your game is not exclusive, it's very reasonable to think about spending money to implement unique features to make your game nicer on it. 
It's true that Metal won't be available on older Apple hardware but Apple has always succeeded in giving people reasons to update both their software and their hardware, so that's not probably a big concern.

- Conclusions

Learn AZDO, play with Mantle, ship with DirectX.

If you're doing an indie title do use a rendering library or engine (I keep pointing at https://github.com/bkaradzic/bgfx but it's just an example) so you'll still ship with the best API for each platform and with the least amount of headaches. If you really love toying with the graphics API directly then I guess a flavor of OpenGL that is supported across platforms could be nice (3.3 if you care about Intel/Linux right now).

If a market is interesting enough for a given application and the vendors there decide on their own API, like it's happening for Metal and happened for DirectX, I'd welcome that.

The problem with many of the APIs we're seeing is not that they divide the market, but that they try to do so in segments that are too small and uninteresting to specifically target. If for example Linux decided on its own 3d API for games I doubt that would be at all interesting...
If AMD shipped Mantle on consoles and PC then it could have been big enough of a segment to target, PC-only is not. If NVidia GameWorks offered a compelling solution on consoles, guess what, it would see a bigger adoption as well, while right now I suspect it will be used only on projects where NVidia is directly involved.

Most projects already have to ship with an abstraction layer of sorts, many of these are available, in practice the idea of using OpenGL directly to ship products across platforms doesn't exist (except for very small projects and some research code).
It's always best to have to write (or use via third-party libraries) lower-level code on things that we understand that have to fight with very opaque, wildly different implementations of a supposedly standard API. 

In fact I bet that practically no (a very tiny number) gamedev knows even the basics of what a driver does and why certain API decisions led to slow CPU performance. Also the number of people not using third-party game engines especially for indie work is dwindling.

In theory a single API is better, in practice, today, it isn't and that's why the emergence of these low-level libraries is not just a marketing plot but actually a reasonable technical solution.

17 May, 2014

Photographic inspiration

If materials, meshes, textures and lighting were "solved", what would be all the focus of (realistic) environment visuals? 

Architecture, photography, scenography. And the technology to serve these better.

Should we emulate cinema and still photography or does photography in a virtual world mean something different? Is the interactivity a possibility to create techniques that we don't even have in CG movies?

I don't know, but it's wise to know the past while thinking of the future.


When it comes to environments, I love the American cityscapes depicted by the early adopters of color photography. It's a feeling that is hard to describe, estrangement maybe, probably an effect that is amplified on me as I grew up in the super dense Southern Italian cities, where nothing has an end and people build all in the little space available, from the sea up to the Vesuvius.

Who are your favourite photographers?

Trent Parke
Richard Misrach
Gregory Crewdson
Todd Hido
Joel Meyerowitz
William Eggleston
Stephen Shore

26 April, 2014

Smoothen your functions

Do you have an "if", "step" or such? Replace with a saturate(multiply-add(x)).
Do you have a mad-saturate? Replace with a smoothstep.
Do you have a smoothstep? Replace with smootherstep...

Ok, kidding, but sort-of, I actually do often ed up replacing ramps (saturate/mad... the fairy dust of shading, I love sprinkling mads in shader code) instead of steps, I remember years ago turning a pretty much by-the-book crysis 1-style SSAO into a much better SSAO by just "feathering" the hard in/out tests (which is kinda what line sampling SSAO does btw).

If you think about it, it's a bit of "code smell". What shading functions should be discontinuous? True, most lighting has a max or saturate right? But why? Really we're considering infinitesimal lights, for physically realistic lights we would have an area of emission, and that area would be fractionally shadowed by a surface, so even there, the shadowing function wouldn't just be a step of the dot product. This might not be evident on diffuse, but already when you're trying to use half-angle based specular attention has to be taken when handling transition to the "nightside".

And of course even when reasonable, any "step" function (well -any- function!) in a shader should be anti-aliased... And of course everybody knows what's the convolution of a step with a box (pixel footprint) is... Texturing and Modeling, a Procedural Approach is the canonical text for this, but it's funny, googling around one of the first hits is this documentation page of Renderman on antialiasing, whose slides are horribly aliased. The OpenGL Orange Book also has examples, and I really want to mention this IQ's article on ray differentials even if it doesn't do analytic convolution...

Many times the continuity of derivatives is not that important (visible), that's why we can use saturated ramps (discontinuous in the first derivative) or saturated smoothsteps (discontinuous in the second), with the big exception of manipulating inputs to specular shading. In that case, even second-derivative discontinuities can very clearly show, thus the need of the famous "smootherstep".

Anyhow. I usually have a bunch of functions around to help with ramps, triangle ramps, smoothsteps and so on, most of them are trivial and can be derived on paper in a second or so. Lately I had to use a few I didn't know before, so I'll be writing them down here.

Yes, all this introduction was useless. :)

- Smooth Min/Max

log(pow(pow(exp(x),s) + pow(exp(y),s),1/s))

This will result is a smooth "min" between x and y for negative values of s (which controls the smoothness of the transition), "max" for positive values.

For s=-1 this results in the "smoothest" min:

log(exp(x+y)/(exp(x)+exp(y))

If you know that x,y are always positive a simpler formulation can be employed, as we don't need to go through the exponential mapping:

pow(pow(x,s) + pow(y,s),1/s)


Note also that if you need a soft minimum of more than two values, your expressions simplify, e.g. pow(pow(pow(pow(x,s) + pow(y,s),1/s),s)  + pow(z,s),1/s) = pow(pow(x,s) + pow(y,s)  + pow(z,s) ... ,1/s).

Note also the link between softmax and norm-infinity.

- A few notes on smoothsteps

Deriving smoothstep and smootherstep is trivial, just create a polynomial of the right degree (cubic or quintic) and impose f(0)=0, f(1)=1 and f'(0)=0, f'(1)=0 (and the same for f'' in case of smootherstep), solve and voila'.


Once you do that, it's equally trivial to start toying around and derive polynomials with other properties. E.g. imposing derivatives only at one extreme:


You can have a "smoothstep" with non-zero derivatives at the extremes:


Or a quartic that shifts the midpoint:


It would seem that the more "properties" you need to have the higher degree polynomial you need to craft. Until you remember that you can do everything piecewise...
Which is basically making small, specialized splines. For example, a quadric smoothstep can look like this:


This is helpful also because there are certain tradeoffs based on applications, especially as having continuous derivatives don't mean automatically that it will be nice looking...
You can make functions that impose more and more derivatives (and do you know that smoothsteps can be chained? smoothstep(smoothstep(x))...) but that doesn't mean the derivatives will "behave", as they can vary wildly in the domain and result in visible "wobbling" in shading.


Another thing that you might not have noticed is how close smoothstep is to a (shifted) cosine, I didn't before a coworker or mine, the all-knowing Paul Edelstein, mentioned it. Probably not too useful, but never know, in certain situations it might be applicable and cheaper.


- Sigmoid functions

Another class of functions that are widely useful are sigmoids, "s shaped functions"

Smooth Sigmoid: x/pow((pow(abs(x),s)+1),1/s)
Logistic: 1/(1+exp(-x))

Sigmoids are similar to smoothsteps, but usually reach zero derivatives at infinity instead at 0,1 endpoints.


They make nice "replacements" for "step" as they approach nicely their limits as they go to infinity:


But also for saturated ramps, especially the smooth sigmoid as it has f'(0)=1 as we have shown before.


Another sigmoid is the Gompertz function, which has nice and clear parameters:

asymptote*exp(-displacement*exp(-rate*x))

Beware though, it's not symmetric around its midpoint:


There are a ton more, but I'd say not as generic. If you look at the various tonemapping curves, most of them are sigmoids, but most of them are in exponential space and not symmetric.
In fact at a given point I made tonemapping curves out of sigmoids, piecewise sigmoids or other weird things glued together :)



- Bias and Gain (thanks to Steve Worley for reminding me of these)

Bias pow(x,(-log2(a))
Gain if x < 0.5 then 0.5*bias(2*x, a) else 1-0.5*bias(2-2*x, a) 

Schlick's Bias x/((1/a-2)*(1-x)+1)
Schlick's Gain if x < 0.5 then SBias(2*x,a)*0.5 else 1-0.5*SBias(2-2*x,a))

Bias is just a power (-log2(a) only maps 0...1 to the power), and Gain maps one power next to a mirrored copy around the midpoint, the easiest way you can construct a piecewise sigmoid (without imposing conditions on the derivatives and so on).

Schlick's versions were published in Graphics Gems IV, and are not only an optimization of the original Bias/Gain formulas (credited to Perlin's Hypertexture paper), but are symmetric over the diagonal, which is a nifty property (it also means that for parameter a the inverse curve is given by the same formula with 1-a)



- Smooth Abs


Obviously if you have any "smoothstep" you can shift it around zero to create a "smoothsign" and multiply by the original value to get a smoothed absolute. The rational polynomial sigmoid works quite well for that:


SmoothAbsZero d*x*x/sqrt(1+d*d*x*x)

If you don't need to reach zero at x=0 then you can simply add an epsilon to the square root of the square of your input, yielding this


SmoothAbs sqrt(x*x+e)


And that's all what I have for now, if you encountered other nifty functions for modelling and tinkering with procedurals and so on, let me know in the comments! 
I'm always looking for nifty functions that can be useful for sculpting mathematical shapes :)

- Bonus example: Soft conditional assignment


Some links:

08 April, 2014

How to make a rendering engine

Today I was chatting on twitter about an engine and some smart guys posted a few links, that I want to record here for posterity. Or really to have a page I can point to every time someone uses a scene graph.

Remember kids, adding pointer indirections in your rendering loops makes kitten sad. More seriously, if in DirectX 11 and lower your rendering code performance is not bound by the GPU driver then probably your code sucks. On a related note, you should find that multithreaded command buffers in DX11 make your code slower, not faster (as they can be used to improve the engine parallelism but they are currently only slower for the driver to use, and your bottleneck should be the driver).


Links below are all about the idea of ditching state machines for rendering, encoding all state for each draw and having fixed strings of bits as the encoding. I won't describe the concept here, just check out these references:

Some notes / FAQs answers. Because every time I write something about these system people start asking the same things... I think because so many books still talk about "immediate" versus "retained" 3d graphic APIs and the "retained" is usually some kind of scenegraph... Also scenegraphs are soooo OOP and books love OOP.
  • Bits in the keys are usually either indices in arrays of grouped state (e.g. camera/viewport/rendertarget state, texture set, etc...) or direct pointers to the underlying 3d API data structures
    • So we are following pointers anyways, aren't we? Yes of course, but the magic is in the sort, it not only will help minimize state changes but also guarantees that all accesses in the arrays are (as-)linear(-as possible)!
    • Of course if you for example sort strictly over depth (not in depth chunks), then you have to accept to jump between materials at each draw, and the accesses over these might very well be random.
      • If that's the case try to avoid indirections for these and store the relevant data bits directly in the draw structure.
      • Another solution for this example case is to sort the material data in a way that is roughy depth coherent, i.e. all materials in a room are stored near each other. In theory you could also dynamically sort and back-patch the pointers to the material data in the game code, but we're getting too complex now...
    • The same can't be guaranteed for resource pointers (GL, DX...), even if the pointers will be linearly ordered they might be far away in memory, that's unavoidable. On consoles you have control on where resources are allocated even for GPU stuff so you can pack them together, but even more importantly you can directly store the pointers that the GPU needs w/o intermediate CPU data structures
  • You don't need to have a single array of keys and sort it!
    • Use buckets, i.e. some bits of the key index which bucket to use. Bucketing per rendertarget/pass is wise
    • "Buckets", a.k.a. separate lists. In other words don't be shy to have a list per subsystem, nobody says there should be one solution for all the draws in your engine.
      • This is usually a good idea also because draw-emitting jobs can and should be sequenced by pass, e.g. in a deferred renderer maybe we want first a rough depth-prepass, then g-buffer, then shadows... These can be pulled in pass-order from the visibility system
      • Doing the emission per pass means we can kick the GPU as soon as the first pass is done. Actually if we don't care for perfect sorting, and we really care about kicking draws as soon as possible, we can even divide each pass in segments and kick draws as soon as the first segment is done.
      • I shouldn't say it but just in case... These systems allow to generate draws in parallel, obviously, and also to sort in parallel and to generate GPU commands in parallel, quite easily. Just keep lists per thread, sort per thread, then merge them all (the only sync point) then split in chunks and per thread create GPU command lists (if you have an API where these are fast...)
  • You don't need to use the same encoding for all keys!
    • Some bits can decide what the other bits mean. Typically per rendertarget/pass you need to do very different things, e.g. a shadowmap render pass doesn't need to care about materials but might want to use some more bits as a z-key for depth sorting
    • Similarly, you can and should have specialized decoding loops
  • Not all the bits in the key need to be used for sorting
    • Bits of state that directly map to the GPU and don't incur in overheads from setting them, should not be part of the sorting, they will just slow it down.
  • Culling: make the keys be part of the visibility system
    • When a bounding primitive is finally deemed to be visible, it should add all the keys related to drawing its contents
    • At that point you want also to patch in the bits related to the projected depth, for depth sorting
  • Hierarchical transforms
    • Many scenegraphs are used as a transformation hierarchy. It's silly, on most engines a tiny fraction of objects need that, mostly the animation/skinning system for its bones. Bones do express a graph, but it's not enough of a reason to base your -entire- rendering system on it.
  • Group state that is (almost) always set together in the same bits
    • E.G. instead of having separate bits (referring to separate state structures) for viewport, rendertarget, viewworldprojection constant data and so on, merge all that in a single state structure.
  • Won't I need other rendering "commands" in my list? Clears? Buffer copies? Async CPU jobs waits? Compute shaders? Async compute shaders...
    • All of these can be part of "on bind" properties of certain parts of the state. E.G. when the bits pointing to the rendertarget/pass change we look up in that state structure to see if the newly set rendertarget have to be cleared
    • In practice as you should "bucket" your keys into different arrays processed by different decode loops, these decode loops will know what to do (e.g. the shadowmap decode will make sure the CPU skinning jobs are finished before trying to draw and so on)
  • Are there other ways?
    • Yes but this is a very good starting point...
    • Depends on the game. A system like this is good when you don't know what draws you'll have, typically because they come from a visibility system which can't spit them in the right order and/or because of parallel processing.
    • Games/systems where you can easily generate GPU commands in the right order and you exactly know which state changes are needed, obviously can sidestep all this architecture. E.G. Fifa, being a soccer game, doesn't need to do much visibility and knows exactly how each player is made in terms of materials, thus the code can be written to exactly process things in the right order... Something like this would be reasonable for Frostbite, but you won't use Frostbite for Fifa...