Search this blog

Showing posts with label Graphic rants. Show all posts
Showing posts with label Graphic rants. Show all posts

07 August, 2017

Tiled hardware (speculations)

Over at Siggraph I had a discussion with some mobile GPU engineers about the pros and cons of tiled deferred rasterization. I prompted some discussion over Twitter (and privately) as well, and this is how I understand the matter of tiled versus immediate/"forward" hardware rasterization so far...

Thanks for all who participated in said discussions!

Disclaimer! I know (almost) nothing of hardware and electronics, so everything that follows is likely to be wrong. I write this just so that people who really know hardware can laugh at the naivity of mere mortals...



Tiled-Based Deferred Rendering.

My understanding of TBDR is that it works by dividing into tiles all the incoming dispatches aimed at a given rendertarget.

For this to happen, you'll have at the very least to invoke the part of the vertex shader that computes the output screen position of the triangles, for all dispatches, figure out which tile(s) a given triangle belongs to, and memorize in a per-tile storage the vertex positions and indices.
Note: Considering the number of triangles nowadays in games, the per-tile storage has to be in main memory, and not on an on-chip cache. In fact, as you can't predict up-front how much memory you'll need, you will have to allocate generously and then have some way to generate interrupts in case you end up needing even more memory...

Indices would be rather coherent, so I imagine that they are stored with some sort of compression (probably patented) and also I imagine that you would want to try to already start rejecting invisible triangles as they enter the various tiles (e.g. backface culling).

Then visibility of all triangles per tile can be figured out (by sorting and testing against the z-buffer), the remaining portion of the vertex shader can be executed and pixel shaders can be invoked.
Note: while this separation of vertex shading in two passes makes sense, I am not sure that the current architectures do not just emit all vertex outputs in a per-tile buffer in a single pass instead.

From here on you have the same pipeline as a forward renderer, but with perfect culling (no overdraw, other than, possibly, helper threads in a quad - and we really should have quad-merging rasters everywhere, don't care about ddx/ddy rules!).

Vertex and pixel work overlap does not happen on single dispatches, but across different output buffers, so balancing is different than an immediate-mode renderer.
Fabian Giesen also noted that wave sizes and scheduling might differ, because it can be hard to fill large waves with fragments in a tile, you might have only few pixels that touch a given tile with a given shader/state and more partial waves wasting time (not energy).

Pros.

Let's start with the benefits. Clearly the idea behind all this is to have perfect culling in hardware, avoiding to waste (texture and target) bandwidth for invisible samples. As accessing memory takes a lot of power (moving things around is costly), by culling so aggressively you save energy.

The other benefit is that all your rendertargets can be stored in a small per-tile on-chip memory, which can be made to be extremely fast and low-latency.
This is extremely interesting, because you can see this memory as effectively a scratch buffer for multi-pass rendering techniques, allowing for example to implement deferred shading without feeling too guilty about the bandwidth costs.

Also, as the hardware always splits things in tiles, you have strong guarantees of what areas of the screen a pixel-shader wave could access, thus allowing to turn certain vector operations (wave-wide) into scalar ones, if things are constant in a given tile (which would be very useful for example for "forward+" methods).

As the tile memory is quite fast, programmable blending becomes feasible as well.

Lastly, once the tile memory that holds triangle data is primed, in theory one could execute multiple shaders recycling the same vertex data, allowing further ways to split computation between passes.

Cons.

So why do we still have immediate-mode hardware out there? Well, the (probably wrong) way I see this is that TBDR is really "just" a hardware solution to zero overdraw, so it's amenable to the same trade-offs one always have when thinking of what should be done in hardware and what should be programmable.

You have to dedicate a bunch of hardware, and thus area, for this functionality. Area that could be used for something else, more raw computational units.
Note though that even if immediate renderers do not need the sophistication of tiling and sorting, they still need space for rendertarget compression which is less needed on a deferred hardware.

Immediate-mode rasterizers do not have to overdraw necessarily. If we do a full depth-prepass for example then the early-z test should cull away all invisible geometry exactly like TBDR.
We could even predicate the geometry pass after the prepass using the visibility data obtained with it, for example using hardware visibility queries or a compute shader. We could even go down to per-triangle culling granularity!

Also, if one looks at the bandwidth needed for the two solutions, it's not clear where the tipping point is. In both cases one has to go through all the vertex data, but in one case we emit triangle data per tile, in the other we write a compressed z-buffer/early-z-buffer.
Clearly as triangles get denser and denser, there is a point where using the z-buffer will result in less bandwidth use!

Moreover, as this is a software implementation, we could always decide for different trade-offs, and avoid doing a full depth pass but just heuristically selecting a few occluders, or reprojecting previous-frame Z and so on.

Lastly I imagine that there are some trade-offs between area, power and wall-time.
If you care about optimizing for power and are not limited much by the chip area, then building in the chip some smarts to avoid accessing memory looks very interesting.
If you only care about doing things as fast as possible then you might want to dedicate all the area to processing power and even if you waste some bandwidth that might be ok if you are good at latency hiding...
Of course that wasted bandwitdh will cost power (and heat) but you might not see the performance implications if you had other work for your compute units to do while waiting for memory.

Conclusions.

I don't quite know enough about this to say anything too intelligent. I guess that as we're seeing tiled hardware in mobiles but not on the high-end, and vice-versa, tiled might excel at saving power but not at pure wall-clock performance versus simpler architectures that use all the area for computational units and latency hiding.

Round-tripping geometry to main RAM seems to be outrageously wasteful, but if you want perfect culling you have to compare with a full-z prepass which reads geometry data twice, and things start looking a bit more even. 

Moreover, even with immediate rendering, it's not that you can really pump a lot of vertex attributes and not suffer, these want to stay on chip (and are sometimes even redistributed in tile-like patterns) so practically you are quite limited before you start stalling your pixel-shaders because you're running out of parameter space...

Amplification, via tessellation or instancing though can save lots of data for an immediate renderer, and the second pass as noted before can be quite aggressively culled and in an immediate renderer allows to balance in software how much one wants to pay for culling quality, so doing the math is not easy at all.

The truth is that for almost any rendering algorithm and rendering hardware, there are ways to reach great utilization, and I doubt that if one looked at that the two architectures were very far apart when fed appropriate workloads. 
Often it's not a matter of what can be done as there are always ways to make things work, but how easily is to achieve a given result.

And in the end it might even be that things are they way they are because of the expertise and legacy designs of the companies involved, rather than objective data. Or that things are hard to change due to myriads of patents, or likely a bit of all these reasons...

But it's interesting to think of how TBDR could change the software side of the equation. Perfect culling and per-tile fast memory would allow some cute tricks, especially in a console where we could have full exposure of the underlying hardware... Could be fun.

What do you think?

Post Scriptum.

Many are mentioning NVidia's tiled solution, and AMD has something similar as well now. I didn't talk about these because they seem to be in the end "just" another way to save rendertarget bandwidth.
I don't know if they even help with culling (I think not for NVidia, while AMD mentions they can do pixel-shading after a whole tile batch has been processed) but certainly they don't allow to split rendering passes more efficiently via an on-chip scratch, which to me (on the software side of things...) is the most interesting delta of TBDR. 

Of course you could argue that tiles-as-a-cache instead of tiles-as-a-scratch might still save enough BW, and latency-hide the rest, that in practice it allows to do deferred for "free". Hard to say, and in general blending units always had some degree of caching...

Lastly, with these hybrid rasters, if they clip triangles/waves at tile boundaries (if), one could still in theory get some improvements in e.g. F+ methods, but it's questionable because the tile sizes used seem too big to allow for the light/attribute screenspace structures of a F+ renderer to match the hardware tile size.

External Links.

Apple's "GPU family 4" - notice the "imageblocks" section

23 February, 2017

Tonemapping on HDR displays. ACES to rule ‘em all?

HDR displays are upon us, and I’m sure rendering engineers worldwide are trying to figure out how to best use them. What to do with post-effects? How to do antialiasing. How to deal with particles and UI. What framebuffer formats to use and so forth.


Well, it appears that in this ocean of new research, some standards are emerging, and one solution that seem to be popular is to use the ACES tone-mapping curve (RRT: Reference Rendering Transform) with an appropriate HDR display curve (ODT: Output Display Transform).

To my dismay though I have to say I’m a bit baffled, and perhaps someone will persuade me otherwise in the future, but I don’t see why ACES would be a solid choice. 


First of all, let’s all be persuaded we indeed need to tone-map our HDR data. Why can’t we just apply exposure and send linear HDR to a TV?
At first, it could seem that should be a reasonable choice: the PQ encoding curve we use to send the signal to TVs peaks at 10.000 nits, which is not too bad, it could allow to encode a scene-referred signal and let the TV do the rest (tone-map according to their characteristics).

This is not what TVs do, though. Leaving the transform from scene values to display would allow for lots of flexibility, but would also give to the display too much responsibility over the final look of the image.
So, the way it works instead is that TVs do have some tone-mapping functionality, but they are quite linear till they reach their peak intensity, where they seem to just have a sharp shoulder.

How sharp that shoulder is can depend, as content can also send along meta-data telling what’s the maximum nits it was authored at: for content that matches the TV, in theory no rolloff is needed at all, as the TV will know the signal will never exceed its abilities (in practice though, said abilities change based on lots of factors due to energy limits).

Some TVs will also expose silly controls, like gamma in HDR: what it seems is that these alter their response curve in the “SDR” range of their output, for now let's ignore all that.
Regardless of these specifics, it's clear that you’re supposed to bring your values from scene-referred to display-referred, and to decide where you want your mid-gray to be, and how to roll highlights from there. You need tone mapping in HDR.

Ok, so let’s backtrack a second. What’s the goal a tone-mapping curve? I think it depends, but you might have one or more of the following goals:
  1. To compress dynamic range in order to best use the available bits. A form of mu-law encoding.
  2. To provide a baseline for authoring. Arguably that should be a “naturalistic”, perceptual rendition of an HDR scene, but it might even be something closer to the final image.
  3. To achieve a given final look, on a given display.
HDR screens add a fourth possible objective creating a curve that makes possible for artists on SDR monitor to easily validate HDR values. 
I'd argue though that this is a fool's errand though, so we won't investigate it. A better and simpler way to author HDR values on SDR monitors is by showing out-of-range warnings and allowing to easily see the scene at various exposures, to check that shadows/ambient-diffuse-highlight-emissive are all in the ranges they should be.

How does ACES fit in all this? 

It surely was not designed with compression in mind (unlike for example, the PQ curve), albeit it might somewhat work, the RRT is meant to be quite “wide” (both in dynamic range and gamut), because it’s supposed to then be further compressed by the ODT. 
Compression really depends on what do you care about and how many bits you have, so a one-size-fits all curve is in general not probably going to cut it. 
Moreover, the RRT is not meant to be easily invertible, much simpler compression curves can be applied, if the goal is to save bits (e.g. to squish the scene into a range that can then be manipulated with the usual color-grading 3d LUTs we are accustomed to).

It wasn’t designed to be particularly perceptually linear either, preserve colors, preserve brightness: the RRT is modelled after film stock.

So we’re left with the third option, a curve that we can use for final output on a display. Well, that’s arguably one of the most important goals, so if ACES does well there, it would be plenty.

At a glance also, that should be really its strength, thanks to the fact that it couples a first compression curve with a second one, specific to a given display (or rather, a display standard and its EOTF: electro-optical transfer function). But here’s the problem. Is it reasonable to tone-map to given output levels, in this situation?

With the old SDR standards (rec.709 / BT.1886) one could think that there was a standard display and viewing ambient we targeted, and that users and TVs would compensate for specific environments. It would have been a lie, but one could have hand-waved things like that (in practice, I think we never really considered the ambient seriously).

In HDR though this is definitely not true, we know different displays will have different peak nits, and we know that the actual amount of dynamic range will vary from something not too different than SDR, in the worst ambients, to something wide enough that could even cause discomfort if too bright areas are displayed for too long. 
ACES itself has different ODTs based on the intended peak nits of the display (and this also couples with the metadata you can specify together with your content).

All this might work, in practice today we don’t have displays that exceed 1000 nits, so we could use ACES, do an ODT to 1000 nits and if we can even send the appropriate meta-data, leaving all the eventual other adjustments to the TV and its user-facing settings. Should we though?
If we know that the dynamic range varies so much, why would we constrain ourselves to a somewhat even complex system that was never made with our specific needs in mind? To me it seems quite a cop-out.

Note: for ACES, targeting a fixed range makes a lot of sense, because really once a film is mastered (e.g. onto a blue-ray) the TM can't change, so all you want to do is make sure what the director saw on the reference screen (that had a given peak nits) matches the output, and that's all left to the metadata+output devices. In games though, we can change TM based on the specific device/ambient... I'm not questioning the virtues of ACES for movies, the RRT even was clearly devised as something that resembles film so that the baseline TM would look like something that movie people are accustomed to.

Tone-mapping as display calibration.

I already wasn't a fan of blindly following film-based curves and looks in SDR, I don’t see why this would be the best for the future as well.
Sure, filmic stocks evolved over many years to look nice, but they are constrained to what is achievable with chemicals on a film…

It is true that these film stocks did define a given visual language we are very accustomed to, but we have much more freedom in the digital world today to exploit.
We can preserve colors much better, we control how much glare we want to add, we can do localized tone-mapping and so on. Not to mention that we got so much latitude with color grading that even if a filmic look is desired, it's probably not worth delegating the responsibility of achieving it to the TM curve!

To me it seems that with HDR displays the main function of the final tone-mapping curve should be to adapt to the variability of end displays and viewing environments, while specific “looks” should be achieved via grading, e.g. with the help of compression curves (like s-log) and grading 3d LUTs.

Wouldn’t it be better, for the final display, to have a curve where it’s easy for the end user to tweak the level at which mid-grays will sit, while independently control how much to roll the highlights based on the capabilities of the TV? Maybe even having two different "toes" for OLED vs LCDs...

I think it would even be easier and safer to even just modify our current tone-mapping curves to give them a control over highlight clipping, while preserving the same look we have in SDR for most of the range.
That might avoid headaches with how much we have to adjust our grading between SDR and HDR targets, while still giving more flexibility when it comes to display/ambient calibration.

HDR brings some new interesting problems, but so far I don't see ACES solving any of them. To me, the first problem, pragmatically, today, is calibration.
A more interesting but less immediate one is how much HDR changes perception, how to use it not just as a special effects for brighter highlights, but to really be able to create displays that look more "transparent" (as in: looking through a window).

Is there a solid reason behind ACES, or are we adopting it just because it’s popular, it was made by very knowledgeable people, and we follow?

Because that might not be the worst thing, to follow blindly, but in the past so many times did lead to huge mistakes that we all committed because we didn’t question what we were doing… Speaking of colors and displays, a few years ago, we were all rendering using non-linear (gamma transformed) values, weren’t we?

Update!

The ACES committee recognized some of the issues I described there, and are working on a new standard. A document "ACES Retrospectives and Enhancements" was made, which I find very agreeable.

I'm still not sure what would be the benefit to go towards ACES for us (we don't mix and match shots from different sources, we really don't care about a standard).
FWIW I'm now completely persuaded we should just do a simple, fixed-shape TM step to compress our render output into a space that allows to color-grade, perform all artistic decision in grade then do a final TM step to adapt to the TV output.
This step is also described very well in Timothy Lottes GDC presentation "Advanced Techniquesand Optimization of VDR Color Pipelines".

Furthermore, games should set on a neutral default grading that is used through production as a baseline for asset review - and that should not be based on ACES, but on a curve that tries to be as neutral/naturalistic as possible.

12 March, 2016

Beyond photographic realism

Service Note: if you're using motion-blur, please decouple rotational, translational and moving object amounts. And test the effect on a variety of screen sizes (fields of view). In general, motion blur shouldn't be something you notice, that registers consciously.
I've been recently playing Firewatch (really good!), which is annoyingly blurred when panning the view on my projector, and that's not an uncommon "mistake". The Order suffers from it as well, at least, to my eyes/my setup. When in doubt, provide a slider?

Ok, with that out of the way, I wanted to try to expand on the ideas of artistic control and potential of our medium I presented last time, beyond strict physical simulation with some color-grading slapped on top. But first, allow me another small digression...

This winter I was back in my hometown, visiting Naples with my girlfriend and her parents. It was their first time in Europe, so of course we went to explore some of the history and art of the surroundings (a task for which a lifetime won't be enough, but still). 

One of the landmarks we went to visit is the Capodimonte art museum, which hosts a quite vast collection of western paintings, from the middle ages up to the 18th century. 

Giotto. Nativity scene. The beginnings of the use of perspective (see also Cimabue).
I've toured this museum a few times with my father in the past, and we always follow a path which illustrates the progress of western art from sacred, symbolic illustrations, where everything is placed according to an hierarchy of importance, to the beginnings of study of the natural world, of perspective, sceneries, all the way to commissioned portraits, mythological figures and representation of common objects and people.

Renaissance painting by Raphael. Fully developed perspective.
What is incredibly interesting to me in this journey is to note how long does it take to develop techniques that nowadays we take for granted (the use of perspective, of lighting...), and how a single artist can influence generations of painters after.

Caravaggio. The calling of Saint Matthew.
When is the next wave coming, for realistic realtime rendering? When are we going to discover methods beyond the current strict adhesion to somewhat misunderstood, bastardized ideas borrowed from photography and cinematography?

Well, first of all, we ought to discuss why this question even matters, why there should be an expectation; couldn't photography be all there is to be in terms of realistic depiction? Maybe that's the best that can be done, and artistic expression should be limited to the kind of scene setups that are possible in such medium. 

In my view, there are two very important reasons to consider a language of realistic depiction than transcends physical simulation and physical acquisition devices:

1) Perception - Physical simulation is not enough to create perceived realism when we are constrained to sending a very limited stimuli (typically, LDR monitor output, without stereopsis or tracking). Studying physiology is important, but does not help much. A simple replica of what our vision system would do (or be able to perceive) when exposed to a real-world input is not necessarily perceived as real when such visual artefacts happen on a screen, instead of as part of the visual system. We are able to detect such difference quite easily, we need to trick the brain instead!

A tone-mapped image that aims to reproduce the detail perceived
by the human visual system does not look very realistic.
2) Psychology - Studying perception in conjunction with the limits of our output media could solve the issue of realism, but why perceptual realism matters in the first place? I'd say it's such a prominent style in games because it's a powerful tool for engagement, immersion. The actual goal is emotional, games are art. We could trick the brain with more powerful tools than what we can achieve by limiting ourselves to strict (perceptual) realism.

In other words the impression of seeing a realistic scene is in your brain, reproducing only at the physics of light transport on a monitor is not enough to make your brain fire in the same way as when it's looking at the real world...

So it's this all about art then? Why is this on a rendering technology blog? The truth is, artists are often scientists "in disguise", they discover powerful tools to exploit the human brain, but don't codify them in the language of science... Art and science are not disjointed, we have should understand art, serve it.

I've been very lucky to attend a lecture by professor
Margaret Livingstone recently, it's a must-see.

Classical artists understood the brain, if not in a scientific way, in an intuitive one. Painters don't reproduce a scene measuring it, they paint what it -feels- like to see a given scene.

Perceptual tricks are used in all artistic expressions, not only in painting but in architecture or in sculpture. Michlangelo's David has its proportions altered to look pleasing from a top-down viewing angle, enlarging the head and the right hand. And there is an interesting theory according to which Mona Lisa's "enigmatic" smile is a product of different frequencies and eye motions (remember that the retina can detect high-frequency details only in a small area near the center...)

Analyzing art through the lens of science enquiry can reveal some of these tricks, which is interesting to researchers as they can tell something about how our brain and visual system work, but should be even more interesting for us, if we can codify these artistic shortcuts in the language of science, turning talent into tools.

Edward Hopper understood light. And tone mapping!
(painting has a much more limited dynamic range than LCD monitors)
Cinematography has its own tricks. Photography, Set design, all arts develop their own specific languages. And real-time rendering has much more potential, because we control the entire pipeline, from scene to physics simulation to image transfer, and we can alter these dynamically, in reaction to player's inputs.
Moreover, we are an unique blend of science and art, we have talents with very different backgrounds working together on the same product. We should be able to create an incredibly rich language!
The most wonderful thing about working in lighting is the people that you encounter. Scientists and artists; engineers and designers; architects and psychologists; optometrists and ergonomists; are all concerned about how people interact with light. It is a topic that is virtually without boundaries, and it has brought me into contact with an extraordinary variety of people from whom I have gathered so much that I know that I cannot properly acknowledge all of them. - From "Lighting by Design", Christopher Cuttle.
We have full control over the worlds we display, and yet so far we author and control content with tools that simulate what can be done with real cameras, in real sets. And we have full control over the -physics- of the simulations we do, and yet we are very wary of allowing tweaks that break physical laws. Why?


James Turrell's installations play with real-world lights and spaces
I think that a lot of it is a reaction, a very understandable and very reasonable reaction, against the chaos that we had before physically-based rendering techniques. 

We are just now trying to figure everything we need to know about PBR, and trying to educate our artists to this methodology, and that has been without doubt the single most important visual revolution of the last few years in realtime (and even offline) graphics. 
PBR gives us certainty, gives us ways to understand visual problems in quantitative terms, and gives us a language with less degrees of freedom for artists, with parameters that are clearly separated, orthogonal (lights, materials, surfaces...).

This is great, when everything has a given degree of realism by construction, artists don't have to spend time trying just to achieve realism through textures and models, they can actually focus on higher level goal of focusing on -what- to show, of actual design.

But now, as we learn more of the craft of physically based models, now is the time to learn again how and why to programmatically break it! We have to understand that breaking physics is not a problem per se, the problem is breaking physics for no good reason. 
For example, let's consider adjusting the "intensity" of global illumination, which is something that is not uncommon in rendering engines, it's often a control that artists ask for. The problem is entirely of math, or correctness, but of intent. 


Lighting (softness/bounces) can be a hint of scale
Why are we breaking energy conservation? Is it because we made some mistakes in the math, and artists are correcting? Is it because artists did create worlds with incorrect parameters, for what they are trying to achieve? Or is it because we consciously want to communicate something with that choice, for example, distorting the sense of scale of the scene? The last, is a visual language choice, the former are just errors which should be corrected, finding the root cause instead of adding a needless degree of freedom to our system.

Nothing should be off the table, if we understand why we want certain hacks, the opposite, we should start with physical simulations and find what we can play with, and why. 
Non-linear geometric distortions? Funky perspective projections (iirc there were racing games in the far past that did play with that)? Color shifts? Bending light rays

And even more possibilities are open with better displays, with stereopsis, HDR, VR... What if we did sometimes kill, or even invert the stereo projection, for some objects? Or change their color, or shading, across eyes? 


3D printed dress by Iris Van Herpen
All other artistic disciplines, from fashion to architecture, rush at new technologies, new tools, trying to understand how they can be employed, hacked, bent, for new effects. 

We are still in our infancy, it's understandable, realistic realtime rendering is still young (even if we look at -gameplay- in games, that arguably is much more studied and refined than visuals as an art), but it's time to start being more aware, I'd say, of our limits, start experimenting more.

24 January, 2016

Color grading and excuses

I started jotting down some notes for this post a month ago maybe, after watching bridge of spies on a plane to New York. An ok movie if you ask me, with very noticeable, heavy-handed color chocies and for some reasons a heavy barrel distortion in certain scenes. 

Heavy barrel distortion, from the Bridge of Spies trailer. Anamorphic lenses?
I'm quite curious to understand the reasoning behind said distortion, what it's meant to convey, but this is not going to be a post criticizing the overuse of grading, I think that's already something many people are beginning to notice and hopefully avoid. Also I'm not even entirely sure it's really a "problem", it might be even just fatigue

For decades we didn't have the technology to reproduce colors accurately, so realistic color depiction was the goal to achieve. With digital technology perfect colors are "easy", so we started experimenting with ways to do more, to tweak them and push them to express certain atmospheres/emotions/intentions, but nowadays we get certain schemes that are repeated over and over so mechanically it becomes stale (at least in my opinion). We'll need something different, break the rules, find another evolutionary step to keep pushing the envelope.

Next-NEXT gen? Kinemacolor
What's more interesting to me is of course the perspective of videogame rendering. 

We've been shaping our grading pretty much after the movie pipelines, we like the word "filmic", we strive to reproduce the characteristics and defects of real cameras, lenses, film stocks and so on. 
A surprising large number of games, of the most different genres, all run practically identical post-effect pipelines (at least in the broad sense, good implementations are still rare). You'll have your bloom, a "filmic" tone mapping, your color-cube grading, depth of field and motion blur, and maybe vignette and color aberration. Oh, and lens flares, of course... THE question is: why? Why we do what we do? 

Dying light shows one of the heavier-handed CA in games
One argument that I hear sometimes is that we adopt these devices because they are good, they have so much history and research behind them that we can't ignore. I'm not... too happy with this line of reasoning. 
Sure, I don't doubt that the characteristic curves of modern film emulsions were painstakingly engineered, but still we should't just copy and paste, right? We should know the reasoning that led to these choices, the assumptions made, check if these apply to us. 
And I can't believe that these chemical processes fully achieved even the ideal goals their engineers had, real-world cameras have to operate under constraints we don't have.
In fact digital cameras are already quite different than film, and yet if you look at the work of great contemporary photographers, not everybody is rushing to apply film simulation on top of them...

Furthermore, did photography try to emulate paintings? Cross-pollination is -great-, but every media has its own language, its own technical discoveries. We're really the only ones trying so hard to be emulators; Even if you look at CGI animated movies, they seldom employ many effects borrowed from real-world cameras, it's mostly videogames that are obsessed with such techniques.

Notice how little "in your face" post-fx are in a typical Pixar movie...
A better reason someone gave me was the following: games are hard enough, artists are comfortable with a given set of tools, the audience is used to a given visual language, so by not reinventing it we get good enough results, good productivity and scenes that are "readable" from the user perspective.

There is some truth behind this, and lots of honesty, it's a reasoning can lead to good results if followed carefully. But it turns out that in a lot of cases, in our industry, we don't even apply this line of thinking. And the truth is that more often than not we just copy "ourselves", we copy what someone else did in the industry without too much regard with the details, ending up in a bastard pipeline that doesn't really resemble film or cameras.

When was the last time you saw a movie and you noticed chromatic aberrations? Heavy handed flares and "bloom" (ok, other than in the infamous J.J.Abrams  Star Trek, but hey, he apologized...)? Is the motion blur noticeable? Even film grain is hardly so "in your face", in fact I bet after watching a movie, most of the times, you can't discern if it was shot on film or digitally.
Lots of the defects we simulate are not considered pleasing or artistic, they are aberrations that camera manufacturers try to get rid of, and they became quite versed at it! Hexagonal-shaped bokeh? Maybe on very cheap lenses...


http://www.cs.ubc.ca/labs/imager/tr/2012/PolynomialOptics/
On the other hand lots of other features that -do- matter are completely ignored. Lots of a lens "character" comes from its point spread function, a lens can have a lower contrast but high resolution or the opposite, field curvature can be interesting, out of focus areas don't have a fixed, uniform shape across the image plane (in general all lens aberrations change across it) and so on. We often even leave the choice of antialiasing filters to the user...

Even on the grading side we are sloppy. Are we really sure that our artists would love to work with a movie grading workflow? And how are movies graded anyways? With a constant, uniform color correction applied over the entire image? Or with the same correction applied per environment? Of course not! The grading is done shot by shot, second by second. It's done with masks and rotoscoping, gradients, non-global filters...

A colorist applying a mask
Lots of these tools are not even hard to replicate, if we wanted to; We could for example use stencils to replicate masks, to grade differently skin from sky from other parts of the scene. 
Other things are harder because we don't have shots (well, other than in cinematic sequences), but we could understand how a colorist would work, what an artist could want to express, and try to invent tools that allow a better range of adjustment. Working in worldspace or clipspace maybe, or looking at material attributes, at lighting, and so on.

Ironically people (including myself sometimes) are sometimes instinctively "against" more creative techniques that would be simple in games on the grounds that they are too "unnatural", too different from what we think it's justified by the real camera argument, that we pass on opportunities to recreate certain effects that would be quite normal in movies instead, just because they are not exactly in the same workflow.

Katana, a look development tool.

Scene color control vs post-effect grading.

I think the endgame though is to find our own ways. Why do we grade and push so much on post effects to begin with? I believe the main reason is because it's so easy, it empowers artists with global control over a scene, and allows to do large changes with minimal effort. 

If that's the case though, could we think of different ways to make the life of our artists easier? Why can't we allow the same workflows, the same speed, to operations on source assets? With the added benefit of not needlessly breaking physical laws, thus achieving control in a a more believable way....


Neutral image in the middle. On the right: warm/cold via grading, on the left a similar effect done editing lights. 
Unlike in movies and photography for us it's trivial to change the colors of all the lights (or even of all the materials). We can manipulate subsets of these, hierarchically, by semantic, locally in specific areas, by painting over the world, interpolating between different variants and so on...
Why did we push everything to the last stage of our graphics pipeline? I believe if in photography or movies there was the possibility of changing the world so cheaply, if they had the opportunities we do have, they would exploit them immediately.

Gregory Crewdson

Many of these changes are "easy" as they won't impact the runtime code, just smarter ways to organize properties. Many pipelines are even pushing towards parametric material libraries and composting for texture authoring, which would make even bulk material edits possible without breaking physical models.

We need to think and experiment more. 



P.S. 
A possible concern when thinking of light manipulation is that as the results are more realistic, it might be less appropriate for dynamic changes in game (e.g. transitions between areas). Where grading changes are not perceived as changes in the scene, lighting changes might be, thus potentially creating a more jarring effect.

It might seem I'm very critical of our industry, but there are reasons why we are "behind" other medias, I think. Our surface area is huge, engineers and artists have to care about developing their own tools -while using them-, making sure everything works for the player, make sure everything fits in a console... We're great at these things, there's no surprise then that we don't have the same amount of time to spend thinking about game photography. Our core skills are different, the game comes first.

03 June, 2014

Rate my API

Metal, Mantle, OpenGL's ADZO, GL|ES, DirectX 12... Not to mention the "secret" console ones. It's good to be a graphics API these days... And everybody is talking about them.
As I love to be "on trend" now you get my take on all this from hopefully a slightly different perspective.

To be honest, I initially wrote an article as a rant in reaction to the excellent post on modern OpenGL by Aras (Unity 3d) but then after Metal and some twitter chats I became persuaded I should write something a bit more "serious". Or at least, try...

- When is a graphics API sexy?

Various smart people are talking with nice detail about the technical merits of certain API design decision (e.g. Ryg and Timothy's exchanges on OpenGL: original, reply, re:re: and another one) so I won't add to that right now. I want instead to cast these discussion in a different and to me more relevant point of view. What do we really want from a API (or really any piece of software)? 

First and foremost will consider to adopt a technology if it's useful. It might seem obvious but apparently it's not. How many times have you seem projects that don't really work, yet spend time on aesthetic improvements?

Ease of use, documentation, great design, simplicity. All these attributes are completely irrelevant if the software doesn't do some compelling work. We can learn undocumented stuff, we can write our own tools, we are engineers and if there is something we need and there is a road open to obtaining it, we can achieve what we need. Of course we'd rather prefer not to endure pain, but pain is better than just not being able to do what we need to. Easy is better than hard, but hard is better than impossible.

Of course after we have something that does something we could be interested in, then if we'll adopt it or not depends on how much we want it divided how hard it is to achieve it. Cost/Benefit, unsurprisingly. To recap we want an API that is:
  • Working. Is actually implemented somewhere, the implementation actually works. If it's written on paper but it's not reliably deployed, we can safely ignore its existence. This is actually part of "useful" or of the benefits, but it's important enough I'd like to remark it here.
  • Useful. Does something that we need, in a market we're interested in. If I'm a AAA company and you make a great API that enables incredible graphics on a device that sold ten thousands units, that's not useful. If you provide a big speed improvement on a platform that is not performance bounds on my products, that's not so useful either. And so on.
  • Easy. Do I need to change my entire engine or workflow to adopt this API? That's the most pressing question. Then it comes documentation and support. Then tools. Then in general how nice the API design is. APIs usually work in a realm that is well-separated from the rest of the software, if your API requires to sacrifice a (virtual) goat each time I have to call it, it's probably still not going to make all my project bad, it's not going to "spread". If the bad API design "spreads" to the engine or the entire software then that's changing the workflow, so it goes back to the first, most important attribute.

Now we can go through a few graphics API on the table these days and see how they fare (in my humble opinion) according to this (obvious but sometimes forgotten) metric.

- OpenGL and AZDO

OpenGL has a long history, once upon a time was winning the graphics API war, started to lose ground and by the time DirectX9 was around, pretty much all games switched (a good history lesson was posted a while ago on stackoverflow).
That didn't stop the downward spiral, to the point that around the time DirectX11 came (2008, shipped with Windows 7 in 2009) even multi-plattform CG software (Maya, Max and the likes) moved to DirectX on Windows as the preferred frontend.
OpenGL  took years to catch up with a variety of patches to DirectX11 (g-truc reviews are awesome) and even longer to see robust implementation of these concepts. Still today the driver quality and the number of extensions supported varies wildly across vendors and OS (some examples here), ironically (and to make things worse) the platform where OpenGL has the best drivers across vendors today is Windows (that though doesn't even ship by default with OpenGL drivers but only an ancient OpenGL 1.1 to Dx layer) while OSX which is the best use-case for OpenGL in many ways, has drivers that tragically lag behind (but at least they are guaranteed to be updated with the OS!).

But, for all the faults it has, today OpenGL is offering something very worth considering, which is what cool people call AZDO (instance rendering on steroids): a way to reduce draw-call overhead by orders of magnitude by shifting the responsibility of working with resources from the CPU, generating commands that set said resources into the command buffer, to the GPU, that in this model follows a few indirections starting from a single pointer to tables of resources in memory.

To a degree AZDO is more a solution "around" OpenGL, rather than fixing OpenGL by creating an api that allows fast multithreaded command buffer generation, it provides a way to draw with minimal API/driver intervention.
In a way is a feat of engineering genius, instead of waiting for OpenGL to evolve its multithreading model it found a minimal set of extensions to work around it, on the other hand this probably will further delay the multithreading changes...

Results seem great, the downside of this approach is that all other modern competitors (DirectX12, Mantle, XBox One and PS4 libGNM) allow both to reduce CPU work by offloading state binding to GPU indirection and support fast CPU command buffer generation via multithreading and lower-level concepts, which map to more "conventional" engine pipelines a bit more easily. There is also a question about if the more indirect approach is always the fastest (i.e. when dealing with draws that generate little GPU work) but that's yet up to debate (as AZDO is very new and I'm not aware of comparisons pitting it against the other approach).

For AAA games. Today for most companies this means consoles first, Windows second, anything else is much less important. For these games having more performance on a platform that is not the primary authoring one and that is not often a performance bottleneck, at the cost of significant engine changes doesn't seem attractive at all (and with no debug tools, little documentation and so on...), especially considering that DirectX12 is coming, an alternative that promises to be as good but easier, better supported and that will also target Xbox One, thus covering two of the three target platforms.

A notable exception though are free-to-play games hugely popular in Asia that are not only usually Windows exclusive, but where Windows XP is still very relevant, which means no DirectX11 and even less DirectX12. For these games I guess OpenGL could be a great option today.
Note also that AZDO is currently not fully supported on Intel hardware (no bindless, MDO software emulated) so you'll probably need a fallback renderer as well, as Intel hardware is quite interesting for games at the lower end.

For applications. Most CGI applications are the worst-case scenarios for GPU efficiency, they tend to do lots of draws with very little actual work (wireframe drawing, little culling) and in not very optimized ways as well due to having to work with editable, unoptimized data and often also carrying legacy code or code not thought to achieve the best GPU performance. 
Also, shipping on multiple platforms is the norm while working across multiple vendors is less of a concern, NVidia has the golden share among CG studios and Intel is completely out of the picture, even only NVidia/Linux is probably a compelling enough target to consider "modern OpenGL there" and even more as Windows would benefit as well.
These things considered I would expect modern OpenGL to be something most applications will move towards, even if it might be a significant effort to do so.

Some more links:

- Mantle

AMD's Mantle is an clear example of a nice, good, easy API (exaggerated, but interesting praise here) that fails (in my opinion) to be really useful for shipped titles. On the technical level there's nothing to complain, it seems very reasonable and well done. 

For AAA games. Today Mantle works only on Windows with AMD hardware. That's a bit little, then again especially when DirectX12 is coming and AZDO is an alternative too. While it's most probably easier to deploy than AZDO (and I bet AMD is going to be willing to help, even if right now there might be no tools and so on), is also much less useful. Worse even if you consider that even on AMD hardware only certain CPU/GPU combinations are CPU limited.
It simply covers too little ground, I hoped at the beginning that AMD would come out sooner and with a PS4 layer as well, thus getting deployed by many projects that were looking for an easier way to target PS4 than figuring out libGNM. It didn't happen and that I think is the end of it. Some people were thinking it could have been a new cross-vendor standard, but it will -never- happen.
They did though score with Frostbite's support which pretty much means all EA games. But I would be very surprised if they didn't have to pay for that, and wonder how long it will last (as it's still a cost to support it, as it is supporting any platform)...

For applications. It's a bit more interesting there, as if you remove the consoles from your target then you're increasing the surface occupied by Windows. Also it's not unreasonable to think that Mantle could be ported on Linux. Unfortunately though NVidia is more popular than AMD for CG studios and that pretty much kills it.

For the people. There is something thought that needs to be praised a lot: AMD also has lots of great, public documentation about the working of its GPUs (Intel is not bad as well, NVidia is absolutely terrible, a sad joke) and tools that show the actual GPU shader working (i.e. shader disassembly) which is really great as it allows everybody to talk and share their findings without fearing NDAs.
This creates a positive ecosystem where everybody can work "close to the metal" and Mantle is part of that. Historically it just happens that the more people are able to hack, the most amazing things get created. See what happened after twenty years of C64 hacking (some examples here).
I expect all graphic researchers to focus on GCN from now on.

- DirectX12

It's hard to criticize DirectX11, especially if you consider that it was presented in 2008 and what was the state of the other APIs at that point. It changed everything, mapping better to modern GPU concepts, introduced Tessellation and Compute Shaders, looks great and easy, is reasonably well documented and supported, and it's very successful.

Arguably DirectX9 had better tools  (VSGD is horrible AND they killed Pix that was actually working fine), but that's hardly a fault of 11 and rather due to the loss of interest in PC gaming, nowadays things are getting much better. Consider that only now we're starting really to play with Compute Shaders for example, because next-gen consoles arrived, but we had them for five years now! It was so ahead of time that it needed only rather minor updates in 11.1 and 11.2.

The only, big issue with 11 is that Microsoft wants to make things simpler than they really should be, for no great reason. So 11 shipped with certain "contracts" in its multithreading model that don't seem really useful or needed but hugely impacted the performance of multithreaded drivers to the point where multithreading is useful only if your application and not the driver is the bottleneck. 
If your code is fast enough, multithreaded Dx11 will actually be slower than single-threaded, which is clearly an issue. I suspect it could still be technically possibly to carve "fast paths" for applications swearing not to exercise the API in certain ways but probably it was simply not important enough for the PC gaming market and now 12 is coming, probably just in time...

For everything Microsoft. DirectX won on windows and it also ships on Microsoft consoles. I can't comment much on 12 and it's not finished yet. Hardly it will be displaced on Windows though, especially for games.

- Metal

Metal is Apple's Mantle. On my very personal biased poll from the reactions I've read on my twitter feed, it has not been received with the same enthusiasm as AMD's initiative. Some explained to me it's because Mantle promised to be a multi-vendor API while Metal didn't. Oh Apple, outclassed at marketing from AMD, you don't know how to appeal the engineers, next time say it's designed to be open...

I've also seen many people complaining this is foul play designed only to create vendor lock-in, a mere marketing move. I don't agree, and if you think it's only marketing then you should prove that's possible to write an equally fast driver in today's OpenGL|ES.
I believe that's not technically possible and I believe OpenGL|ES is plagued by many of the same defects of desktop OpenGL, only much worse as it has no AZDO and it ships on platforms that are very resource constrained, so where performance and efficiency matters even more!
It would have probably been possible to carve fast-paths and patch ES with proprietary extensions that would have been a bit more friendly to the ecosystem (extension often get incorporated into the standard down the line), but if it reaches the point where most of the rendering would have gone through extensions what's the point, really?

Actually this might be for the best even for the overall ecosystem, as it's a bigger kick-in-the-nuts than everything else could have been, and when many vendors on Android are shipping drivers that are just the -worst- software ever and Khronos shows to be slow to evolve and ridden with politics, a hard kick is what's most needed.
It's very new as we speak and I haven't had an in depth look into it, so I might edit this section later on.

For games. iOS has still the golden share of mobile gaming, with many more exclusives and games shipping "first" of that system than the competitor, but the gap is not huge. Also, most games are still 2d and not too demanding on the hardware, so for a lot of people a degree of portability will matter more than a magnitude improvement in drawcall performance. 
But, for the games that do care about performance, Metal is just great, iOS is big enough that even if your game is not exclusive, it's very reasonable to think about spending money to implement unique features to make your game nicer on it. 
It's true that Metal won't be available on older Apple hardware but Apple has always succeeded in giving people reasons to update both their software and their hardware, so that's not probably a big concern.

- Conclusions

Learn AZDO, play with Mantle, ship with DirectX.

If you're doing an indie title do use a rendering library or engine (I keep pointing at https://github.com/bkaradzic/bgfx but it's just an example) so you'll still ship with the best API for each platform and with the least amount of headaches. If you really love toying with the graphics API directly then I guess a flavor of OpenGL that is supported across platforms could be nice (3.3 if you care about Intel/Linux right now).

If a market is interesting enough for a given application and the vendors there decide on their own API, like it's happening for Metal and happened for DirectX, I'd welcome that.

The problem with many of the APIs we're seeing is not that they divide the market, but that they try to do so in segments that are too small and uninteresting to specifically target. If for example Linux decided on its own 3d API for games I doubt that would be at all interesting...
If AMD shipped Mantle on consoles and PC then it could have been big enough of a segment to target, PC-only is not. If NVidia GameWorks offered a compelling solution on consoles, guess what, it would see a bigger adoption as well, while right now I suspect it will be used only on projects where NVidia is directly involved.

Most projects already have to ship with an abstraction layer of sorts, many of these are available, in practice the idea of using OpenGL directly to ship products across platforms doesn't exist (except for very small projects and some research code).
It's always best to have to write (or use via third-party libraries) lower-level code on things that we understand that have to fight with very opaque, wildly different implementations of a supposedly standard API. 

In fact I bet that practically no (a very tiny number) gamedev knows even the basics of what a driver does and why certain API decisions led to slow CPU performance. Also the number of people not using third-party game engines especially for indie work is dwindling.

In theory a single API is better, in practice, today, it isn't and that's why the emergence of these low-level libraries is not just a marketing plot but actually a reasonable technical solution.