HDR displays are upon us, and I’m sure rendering engineers worldwide are trying to figure out how to best use them. What to do with post-effects? How to do antialiasing. How to deal with particles and UI. What framebuffer formats to use and so forth.
Well, it appears that in this ocean of new research, some standards are emerging, and one solution that seem to be popular is to use the ACES tone-mapping curve (RRT: Reference Rendering Transform) with an appropriate HDR display curve (ODT: Output Display Transform).
- Infamous Second Son uses it.
- Rise of the Tomb Raider (and it's the advice NVidia is giving).
- Krzysztof Narkowicz wrote some very good articles and found good approximations.
- Recently it was even added to Unreal Engine.
To my dismay though I have to say I’m a bit baffled, and perhaps someone will persuade me otherwise in the future, but I don’t see why ACES would be a solid choice.
First of all, let’s all be persuaded we indeed need to tone-map our HDR data. Why can’t we just apply exposure and send linear HDR to a TV?
At first, it could seem that should be a reasonable choice: the PQ encoding curve we use to send the signal to TVs peaks at 10.000 nits, which is not too bad, it could allow to encode a scene-referred signal and let the TV do the rest (tone-map according to their characteristics).
At first, it could seem that should be a reasonable choice: the PQ encoding curve we use to send the signal to TVs peaks at 10.000 nits, which is not too bad, it could allow to encode a scene-referred signal and let the TV do the rest (tone-map according to their characteristics).
This is not what TVs do, though. Leaving the transform from scene values to display would allow for lots of flexibility, but would also give to the display too much responsibility over the final look of the image.
So, the way it works instead is that TVs do have some tone-mapping functionality, but they are quite linear till they reach their peak intensity, where they seem to just have a sharp shoulder.
How sharp that shoulder is can depend, as content can also send along meta-data telling what’s the maximum nits it was authored at: for content that matches the TV, in theory no rolloff is needed at all, as the TV will know the signal will never exceed its abilities (in practice though, said abilities change based on lots of factors due to energy limits).
Some TVs will also expose silly controls, like gamma in HDR: what it seems is that these alter their response curve in the “SDR” range of their output, for now let's ignore all that.
Regardless of these specifics, it's clear that you’re supposed to bring your values from scene-referred to display-referred, and to decide where you want your mid-gray to be, and how to roll highlights from there. You need tone mapping in HDR.
Ok, so let’s backtrack a second. What’s the goal a tone-mapping curve? I think it depends, but you might have one or more of the following goals:
- To compress dynamic range in order to best use the available bits. A form of mu-law encoding.
- To provide a baseline for authoring. Arguably that should be a “naturalistic”, perceptual rendition of an HDR scene, but it might even be something closer to the final image.
- To achieve a given final look, on a given display.
HDR screens add a fourth possible objective creating a curve that makes possible for artists on SDR monitor to easily validate HDR values.
I'd argue though that this is a fool's errand though, so we won't investigate it. A better and simpler way to author HDR values on SDR monitors is by showing out-of-range warnings and allowing to easily see the scene at various exposures, to check that shadows/ambient-diffuse-highlight-emissive are all in the ranges they should be.
How does ACES fit in all this?
It surely was not designed with compression in mind (unlike for example, the PQ curve), albeit it might somewhat work, the RRT is meant to be quite “wide” (both in dynamic range and gamut), because it’s supposed to then be further compressed by the ODT.
Compression really depends on what do you care about and how many bits you have, so a one-size-fits all curve is in general not probably going to cut it.
Moreover, the RRT is not meant to be easily invertible, much simpler compression curves can be applied, if the goal is to save bits (e.g. to squish the scene into a range that can then be manipulated with the usual color-grading 3d LUTs we are accustomed to).
It wasn’t designed to be particularly perceptually linear either, preserve colors, preserve brightness: the RRT is modelled after film stock.
So we’re left with the third option, a curve that we can use for final output on a display. Well, that’s arguably one of the most important goals, so if ACES does well there, it would be plenty.
At a glance also, that should be really its strength, thanks to the fact that it couples a first compression curve with a second one, specific to a given display (or rather, a display standard and its EOTF: electro-optical transfer function). But here’s the problem. Is it reasonable to tone-map to given output levels, in this situation?
With the old SDR standards (rec.709 / BT.1886) one could think that there was a standard display and viewing ambient we targeted, and that users and TVs would compensate for specific environments. It would have been a lie, but one could have hand-waved things like that (in practice, I think we never really considered the ambient seriously).
In HDR though this is definitely not true, we know different displays will have different peak nits, and we know that the actual amount of dynamic range will vary from something not too different than SDR, in the worst ambients, to something wide enough that could even cause discomfort if too bright areas are displayed for too long.
ACES itself has different ODTs based on the intended peak nits of the display (and this also couples with the metadata you can specify together with your content).
All this might work, in practice today we don’t have displays that exceed 1000 nits, so we could use ACES, do an ODT to 1000 nits and if we can even send the appropriate meta-data, leaving all the eventual other adjustments to the TV and its user-facing settings. Should we though?
If we know that the dynamic range varies so much, why would we constrain ourselves to a somewhat even complex system that was never made with our specific needs in mind? To me it seems quite a cop-out.
Note: for ACES, targeting a fixed range makes a lot of sense, because really once a film is mastered (e.g. onto a blue-ray) the TM can't change, so all you want to do is make sure what the director saw on the reference screen (that had a given peak nits) matches the output, and that's all left to the metadata+output devices. In games though, we can change TM based on the specific device/ambient... I'm not questioning the virtues of ACES for movies, the RRT even was clearly devised as something that resembles film so that the baseline TM would look like something that movie people are accustomed to.
Tone-mapping as display calibration.
Note: for ACES, targeting a fixed range makes a lot of sense, because really once a film is mastered (e.g. onto a blue-ray) the TM can't change, so all you want to do is make sure what the director saw on the reference screen (that had a given peak nits) matches the output, and that's all left to the metadata+output devices. In games though, we can change TM based on the specific device/ambient... I'm not questioning the virtues of ACES for movies, the RRT even was clearly devised as something that resembles film so that the baseline TM would look like something that movie people are accustomed to.
Tone-mapping as display calibration.
I already wasn't a fan of blindly following film-based curves and looks in SDR, I don’t see why this would be the best for the future as well.
Sure, filmic stocks evolved over many years to look nice, but they are constrained to what is achievable with chemicals on a film…
It is true that these film stocks did define a given visual language we are very accustomed to, but we have much more freedom in the digital world today to exploit.
We can preserve colors much better, we control how much glare we want to add, we can do localized tone-mapping and so on. Not to mention that we got so much latitude with color grading that even if a filmic look is desired, it's probably not worth delegating the responsibility of achieving it to the TM curve!
Sure, filmic stocks evolved over many years to look nice, but they are constrained to what is achievable with chemicals on a film…
It is true that these film stocks did define a given visual language we are very accustomed to, but we have much more freedom in the digital world today to exploit.
We can preserve colors much better, we control how much glare we want to add, we can do localized tone-mapping and so on. Not to mention that we got so much latitude with color grading that even if a filmic look is desired, it's probably not worth delegating the responsibility of achieving it to the TM curve!
To me it seems that with HDR displays the main function of the final tone-mapping curve should be to adapt to the variability of end displays and viewing environments, while specific “looks” should be achieved via grading, e.g. with the help of compression curves (like s-log) and grading 3d LUTs.
Wouldn’t it be better, for the final display, to have a curve where it’s easy for the end user to tweak the level at which mid-grays will sit, while independently control how much to roll the highlights based on the capabilities of the TV? Maybe even having two different "toes" for OLED vs LCDs...
I think it would even be easier and safer to even just modify our current tone-mapping curves to give them a control over highlight clipping, while preserving the same look we have in SDR for most of the range.
That might avoid headaches with how much we have to adjust our grading between SDR and HDR targets, while still giving more flexibility when it comes to display/ambient calibration.
HDR brings some new interesting problems, but so far I don't see ACES solving any of them. To me, the first problem, pragmatically, today, is calibration.
A more interesting but less immediate one is how much HDR changes perception, how to use it not just as a special effects for brighter highlights, but to really be able to create displays that look more "transparent" (as in: looking through a window).
That might avoid headaches with how much we have to adjust our grading between SDR and HDR targets, while still giving more flexibility when it comes to display/ambient calibration.
HDR brings some new interesting problems, but so far I don't see ACES solving any of them. To me, the first problem, pragmatically, today, is calibration.
A more interesting but less immediate one is how much HDR changes perception, how to use it not just as a special effects for brighter highlights, but to really be able to create displays that look more "transparent" (as in: looking through a window).
Is there a solid reason behind ACES, or are we adopting it just because it’s popular, it was made by very knowledgeable people, and we follow?
Because that might not be the worst thing, to follow blindly, but in the past so many times did lead to huge mistakes that we all committed because we didn’t question what we were doing… Speaking of colors and displays, a few years ago, we were all rendering using non-linear (gamma transformed) values, weren’t we?
Update!
The ACES committee recognized some of the issues I described there, and are working on a new standard. A document "ACES Retrospectives and Enhancements" was made, which I find very agreeable.
I'm still not sure what would be the benefit to go towards ACES for us (we don't mix and match shots from different sources, we really don't care about a standard).
FWIW I'm now completely persuaded we should just do a simple, fixed-shape TM step to compress our render output into a space that allows to color-grade, perform all artistic decision in grade then do a final TM step to adapt to the TV output.
This step is also described very well in Timothy Lottes GDC presentation "Advanced Techniquesand Optimization of VDR Color Pipelines".
Furthermore, games should set on a neutral default grading that is used through production as a baseline for asset review - and that should not be based on ACES, but on a curve that tries to be as neutral/naturalistic as possible.
Update!
The ACES committee recognized some of the issues I described there, and are working on a new standard. A document "ACES Retrospectives and Enhancements" was made, which I find very agreeable.
I'm still not sure what would be the benefit to go towards ACES for us (we don't mix and match shots from different sources, we really don't care about a standard).
FWIW I'm now completely persuaded we should just do a simple, fixed-shape TM step to compress our render output into a space that allows to color-grade, perform all artistic decision in grade then do a final TM step to adapt to the TV output.
This step is also described very well in Timothy Lottes GDC presentation "Advanced Techniquesand Optimization of VDR Color Pipelines".
Furthermore, games should set on a neutral default grading that is used through production as a baseline for asset review - and that should not be based on ACES, but on a curve that tries to be as neutral/naturalistic as possible.