## 03 November, 2012

### Addendum to Mathematica and Skin Rendering

Update: there was a typo in the shader code, I didn't update the screenshot...

A shader code snippet following my work on Penner's scattering approximation. I'm actually rather sure it can be done better and for cheaper, especially if you "clamp" the radius range to a narrower window, currently it goes too broad on one side while trying to go back to a standard, not wrapped N dot L for large radii, it's not really useful. But, I've already given the Mathematica code so you can do better...

float3 PSSFitFunction(float NdL, float r)
{
float3 a0 = {0.0605, 0.2076, 0.2243};
float3 a1 = {0.0903, 0.1687, 0.2436};
float3 a2 = {-0.0210, -0.0942, -0.1116};
float3 a3 = {0.6896, 0.6762, 0.6480};
float3 a4 = {-0.1110, -0.5023, -0.6703};
float3 a5 = {0.8177, 0.9119, 0.9209};

float3 t = NdL.xxx * (a0*r+a1) + (a2*r+a3);

}

## 02 November, 2012

### How does GPU Compute work?

A presentation I made over the last month, an introduction to GPUs and GPU compute for the "non GPU" programmer. A couple of slides target the videogame industry but otherwise it's pretty general. If you want a good testbed to start playing, this previous post might help.

http://www.scribd.com/doc/111876741/GPU-Compute

 Preparing for live presentation
 Notes and ideas for the presentation...

## 08 October, 2012

### Notes on optimizing a deferred renderer

Oh my, I forget to post things... Space Marine, a game that I'm very fond of having been part of. The second third person action done by Relic, and the first multi-platform console game of a studio well known for incredible PC RTS titles. It came out maybe a bit short on content (cuts, time...) but it's a technically excellent game and it's plenty of fun too (it's one of the few games I had fun playing multiplayer on the 360).

Its rendering does most of the things a modern deferred should do, quite a few things that only an handful of other titles manage to pull off (i.e. almost zero latency) and a couple of novel things as well (the way it does Oren-Nayar, its "world occlusion", the DOF/MB filter, the hair lighting trickery and some other details).

The people working on it were a-m-a-z-i-n-g, I was "overseeing" the rendering performance across the platforms and I was surprised to see that we managed to more than double rendering performance in the six months before shipping, to a solid thirty (I would have bet it was not possible, when I joined. They proved me wrong).
Most of the work described in the notes was done near the end of the product (i.e. shadows, post effects, ssao were rewritten from scratch, software occlusion added, SIMD and threading everywhere, I had a list of more than twenty tasks per each platform and I'd say more than 80% of them were done by the end).

This presentation was done a while ago now, it started as something I wrote internally as a post-mortem for other studios to see, then I removed some implementation details and presented it (thanks to Relic's openness when it comes to sharing knowledge) at a very informal meeting of rendering professionals I organize sporadically here in Vancouver. The version I'm uploading was cleaned up even more (well, censored... mostly replaced images with public screenshots of the game) to be able to publish it online... but then forgot, until today, when I got someone asking me for this material again.

It's not much of a "presentation", it's more notes written in powerpoint, as it was originally meant not to be presented live but to just be read by people.

## 07 October, 2012

### Supersampling and antialiasing distance fields

Just found this note cleaning up my stuff, thought I might as well post it...

We had some issues with using signed distance fields for font rendering and antialiasing. The idea is to conceptually similar to doing a two dimentional marching squares and then computing the area of the resulting convex polygon.

If you can't (or don't want to read) my horrible note, the "algorithm" samples four taps of the distance field (use ddx/ddy UV for the width) on a (unit) square and then walks the vertices and edges of the square counterclockwise (first a vertex, then an edge, then the next vertex and so on).

The polygon is constructed by taking the coordinates of all the vertices that are inside the distance field and computing an intersection point for all the edges that are between two in-out vertices. The polygon area (which equals the coverage) is computed incrementally using the determinant method. All unrolled in a shader.

Jason Peng, an incredibly talented UBC student implemented this at Capcom Vancouver. He tells me it worked :)

P.S. You'll notice that I write counterclockwise and then draw all the arrows clockwise... :) Stupid me.

## 11 September, 2012

### A hunch

Have you noticed how nice realtime raytracing looks? I was just watching this recent video by Sam Lapere done with Jakko Bikker's famous Brigade renderer. Yes it's noisy, and the lighting is simple, and the models are not complex. But it has a given quality to it, for sure, it feels natural and well lit.

I have a hunch.

It's not (mostly) because of the accuracy in the visibility solution. Nor it is because of the accuracy (which, at least in that scene, does not even seems to be a factor) of the BRDF and material representation. I think that with our skimpy realtime GPU rasterization hacks we are technically capable of producing materials and occlusions of a good enough quality.

I suspect that where games often fall is on finding the right balance. Raytracing does not need this balancing at all, diffuse, ambient specular, they all bounce around as a single entity, and light simply is. In the realtime rasterization world, this unity does not exist, we have components and we tune them and shape them. We start with a bit of diffuse and subtract shadows and add ambient and subtract its occlusion and sprinkle specular on top. Somehow... And sometimes this complex mix is just right, most often, plain wrong.

It's artist's fault? Is it too complex to get right? I think, not, it should not be. In many cases, it's not hard to take references for example, and meaningful ones, measures and measures that split a real world scene into components, devise experiments and validate our effects. Know what we are doing...

It's that often us, rendering engineers, work on technical features and not their quality. We do "SSAO" and add some parameters and if the artists say they're happy we move on, this is our relationship with the image, it's a producer-consumer one.

I think this is irresponsible, and we should be a little more responsible than that, work a bit more closely together. Observe, help, understand, light is a technical and artistic matter, and often we underestimate how much complexity and technicality there is into things that are not strictly complex routines in code. If you think about it, until a couple of years ago, we were still doing all wrong math on colors, and the solution is a simple pow(col,2.2) in code, but still, we did it spectacularly wrong, and called ourselves engineers. We should really understand way better what we do, both its physics and the perception of the physics.

## 10 September, 2012

### Follow-up: Why classes.

Originally I planned to start writing something far more interesting, but all Saturday and some of Sunday I spent playing with Mathematica to refine my tone mapping function, so, no time for blog articles. But I still wanted to write down this little follow up, I had some discussions about my previous article with some friends, and I hope this helps. It surely helps me. You don't really have to read this. You most probably, know already :)

So... Let's assume, reasonably, that we do use our language constructs as we need them. We move across abstraction layers when we need so in order to make our work easier and our code simpler. So we start:

"Pure" functions. Structured programming won many years ago, so this is a no brainier, we start with functions. Note that I write pure here not in the functional sense of purity, as that's violated already with stack variables, we could talk about determinism here, but I don't think formalism matters.

We need state > Functions and pointers to state. This is were we left last time. We could use globals or local static data as well, at least if we need a single instance of state. Global state has a deserved bad reputation because it can cause coupling, if exposed, and both do not play well with threads. What is worse though, is that it hinders readability. A function call to a routine with static state looks like any other at the call site, but it behaves differently. For these reasons we usually don't like static state and we pass it explicitly instead, it's a trade off between being more verbose but more readable.

We need many instances of state, with complex lifetimes > Destructors and Classes. Here, this is when we _need_ classes. The inheritance and OOP things are better served, and it should not be news, by purely virtual interfaces, so we won't discuss this here (nor later, we'll leave the rest out and stop at this level).

Having methods in a class, public and private visibility, const attributes, all these are not much more than typographic conventions, they are not a very compelling reason to use classes. A function with a "this" pointer and a method call are not dissimilar in terms of expressive power, there are some aesthetic differences between the two, but functionally they are the same, methods do not offer more power, or safety, or convenience.

What we really gain from classes is lifetime control of their instances: constructors, destructors, copy constructors. We can now safely create instances on the stack, in local scopes, have collections and so on.

The price we pay for this, in C++, is that even thought classes are structures, we can't forward declare their members, nor we have a way of creating interfaces without paying the price of virtual calls, so in order to get all the advantages of destructors, the entire class has to be made visible. Moreover, C++ has no way of telling the size of a class if it doesn't see the full declaration, so even if we had a way of creating interfaces*, we still need to disclose all the details about our class internals.

This is where the all evil lies. And to be clear, it's not because we're "shy" of the implementation internals, we don't want other programmers to see them or such aesthetic considerations. It's because it creates coupling and dependencies, everyone that sees the class declaration has to also know about the declarations of all the member types and so on, recursively, until the linker dies compiling templates.

Now, I know, we can pay a little price and do pimpl. Or we can cheat and use pure virtual but do certain compile arrangements in our "release" version so the compiler knows that virtual always has only one implementation and resolves all calls statically. Yes, it's true, and here is were the previous article starts, if you wish.

The beauty of multiparadigm languages is that they offer you an arsenal of tools to express your computation, and of course, funny exercises and Turin tarpits aside, some map better to certain problems than others. Now, what does "map better" mean? It might seem trivial, but it's the reason people argue over these things. So right before starting, let's me say again what I think it's the most important quality metric: malleability. If your field, your experience or so calls for different metrics, fine, you can stop here.

Quickly now! Malleability = Simplicity / Coupling, roughly. Simplicity = Words * Ease_Of_Reading * Ease_Of_Writing. Some clarifying examples, it's often easy to create constructs that are compact but very foreign (think, most abuses of operator overloads), or that are readable but very hard to write or change (most abuses of templates fit this).

*Note: For the hacker, if you wanted to scare and confuse your coworkers, the debugger and tools, you could achieve non-virtual interfaces in C++. Just declare you class with no other members than the public interface, then in the new operator you can allocate more space than the size of the class and use that to store a structure with all your internals. This fails of course for classes on the stack or as members of other structures, it's a "valid" hack only if we disallow such uses...

## 02 September, 2012

### Doing some homework. C-Style and pain.

So, lately I've been doing some coding at home for a couple of projects, which is not as usual as I would like to be for me. One of these involved creating a little testbed for some DX9 rendering, pretty standard stuff and I would normally use one of the things I have in C# but this time I had reasons to use C++ so I opened Visual Studio and created an empty application with the wizard. Around two in the night I had most of what I wanted and I closed the case.

What usually happens when I code is that I really don't like the coding itself, even more if I'm not working in a particularly expressive framework. Writing code is a chore, but after I start a project I come to think about it a lot while I do all other things, and there is where most of the improvements happen, in my head (usually while I walk back home from work).

This time was no different, what I found worth blogging though is how I happened to structure the code itself. I sometimes used classes, sometimes I used "C-style" objects (functions, passing as the first parameter the "state" or what it could have been the this pointer in C++) and I even wrote a few templates.

I didn't code this with any particular stylistic goal or experiment in mind, the only difference between this and work is that I was not constrained by a preexisting framework, and hundreds of thousands of line of code around my changes.

So, what guides these decisions? Of course, while I code I work out of experience and a given sense of aesthetic. Now my aesthetic is mostly being lazy, mixed with a sense that what I do has to be readable by someone else. For some reason, that was always with me since I was a kid building Lego, I wanted to create things to last, so even in my laziness I think I'm not too sloppy.

What I think it happens is like speaking a language fluently, you don't reason in terms of rules, these do exist of course but they become an intuition in your mind, and indeed these rules are not arbitrary, they are there to codify hundreds of years of practice and evolution, guided by the very same logic that builds up in your brain after practice.

This evolution goes from rules (education) to practice, to new rules, and in a similar way I started thinking about my code and trying to understand if some rules can be derived from practice. Now, don't expect anything earth shattering, with all the discussions about OO and against it, I think there is nothing new to be said really. As for all my posts, I'm just writing down some thoughts.

So, where did I use C style and where did I use objects? Well, in this very small sample, it turns out all the graphics API I had to write was C style.

Think something like gContext = Initialize() somewhere in the main application, then most calls require gContext, and there is of course an explicit teardown method. The type for this context is not visible from the application, it's just a forward declaration of a struct and everything happens passing pointers.

Nothing really fancy, right, what would this achieve, other than some nostalgia of pre-C++ days? Well, indeed it does not achieve anything, what I find is how many constructs we invented to do the very same, wrapped into a C++ class. Let's see...

First of all this entity I was coding, in the specific case the API for the graphics subsystem, has very few instances. Maybe one, or one per thread or process... I've already blogged about the uselessness of design patterns and about the king of useless patterns... you know what I'm talking about, the singleton. Once upon a time, a lead programmer told me that in games he didn't think singletons were not really about a single instance, but control over creation and destruction times of certain global subsystems. Yes. Just like a pointer, with new and delete. It might sound crazy, but if you look around you'll find many "smart" singletons with explicit create and destroy calls. Here, done that!

Now, a second thing that happens with singletons that experts will tell you to avoid, is that once you include the singleton class with all the nice methods you want to call, you also have direct access to the singleton instance. So everybody can call your subsystem directly, everywhere, and you won't see said subsystem being "passed around", an innocent looking function can call in its implementation all the evil and you'll never know. So, you learn to practice "dependency injection". Here, C-Style by default does not encourage that, you can include the API but you'll still need a pointer to the context, and this pointer can be made not accessible from the application who created it, so the application can pass it around explicitly only to the function it wants. Done that, too!

Third? Well, what about pimpl, facades and such? Yes, decoupling implementation from interface, hiding implementation details. C++ does not provide any convenient way of doing so. You might be as minimal as possible, ban private methods (starting for private static, which have no place to exist) and use all implementation side functions. You can include in your class only the minimal API needed for the object, and you should, but no matter how you dice it, you can't hide private member variables. This is bad not only "aesthetically", because you have a sense of hiding the internals, it hurts more concretely in terms of dependencies and how much you have to include just to let another file use a given API. I won't delve into the details of the template usage, but the C style allowed me to use all templates only implementation side, which is a great thing. I even used a bit of STL :)

More? Well, in most coding guidelines and best practices you learn the need to always declare your class copy constructor, and it turns out most times if you follow this practice, you will end up declaring them private with no implementation. You don't want subsystems to be cloned around, and here, this is done too "automatically" by this new invention. What about "mixins" or splitting your subsystem implementation in multiple files? What for all the methods which need more than one "context"? In which class should they live? Here, you have functions, this new invention avoids all these questions.

So what? Am I dissing again C++? Or in general OO? Well not really it turns out, not this time, not as my main point... It just that it's incredible to observe how often we complicate needlessly. I do understand why this happens, because I was once "seduced" by this too. "Advanced" techniques, cutting edge big words and hype. Especially when you have more "knowledge" than reasoning and experience, I mean, out of university, OO is the thing right, maybe even design patterns...

And we lose sight, so easily, about what really matters, being lazy, using concepts only when they actually happen to save us something, lowering our code complexity. Going back to said project, I can have a look at where I did use classes or structures with methods. Every time the object lifetime is more complex I started using constructors and especially destructors, I remember in an implementation needing to type a few times the call to a teardown of a given private structure, and moving that code into a destructor. I wrote templates for some simple data structures just not to have that code intermixed with other concepts in my implementation. I would have used classes also if I needed virtual interfaces and multiple implementations of a number of methods, operator overloading can be useful when it's not confusing, and so on and on.

Really, the key is to be lazy, don't reach for structures because they are fancy, or you think you _might_ need them in some eventual future, predictions work almost never (that's why we all work on "agile" and such methods). Certainly don't use things becuase of their hype, try to understand. It also helps if you consider coding a form of pain. And if you hate your language, I think *. Also, you might consider reading this from Nick Porcino on Futurist Programming.

I love blogsy. And the week of Vancouver summer we get.

*Note: Many of the times one sees crazy things, wild operator overloads, templates that require hours to understand, compile or debug and so on, it's because someone loved the language itself, and the possibility of doing such things, more than how he loved his time or other's people time.

P.S. What bit of STL did I use? std::sort, because it's great really. Until you need to iterate on your own weird data structures, which might happen because at least in my line of work, STL implements only some of the least frequently used ones (before someone asks - fixed_vector and hashmap and yes I know about c++11, caches, pools and lists made of linked chunks. One, two, and maybe three are worth a look and avoid Boost). Yes, you can implement your own random access iterator, it's not a titanic job. Still, in this specific case, it would have easily taken more time and code than what the data structures themselves took. Clearly STL was not made by a lazy person :)

## 26 August, 2012

### OT: My photoretouch routine

Another "offtopic" post. I'm writing this really as a reminder for my spouse, as she wants to learn Photoshop a bit better, but I think it could be useful as many renderers do seem to dabble with photography (and it's surely a good thing to do so).

Now, two disclaimers
1. This photo was taken for fun after a fashion show, it wasn't a photoshoot nor anything that was intended to be published. I've chosen this one to serve as an example of retouching, it's not the best one I've ever done and more importantly, it's not the most pleasing shot of the model as well. Also, the retouching work is partial at best, much more time could have been spent with it, I stopped when I reached a point that illustrates well enough what I do, not a "final" point for the photo itself.
2. I'm not a professional retoucher and I certainly didn't spend much time researching this, so I'm not as confident about my practice as I am when I talk about rendering. Most of this is purely out of experience and does not even derive from reading books and tutorials from experienced photographers, which is strange when I think about it, as I tend to be uncomfortable if I don't know "everything" before starting doing a given thing. I was very surprised to see that this routine I "evolved" is not too far of from what pro retoucher Pratik Naik sketches here, even if my results are not in the same league, the theory should not be too far off :)
Prerequisites
1. Photoshop of course. Unfortunately you'll need CS, even if day to day you'll probably use 5% of CS capabilities, the ones that are really useful are not the same that you'll find in Elements, which is aimed at people who don't want to spend a lot of time with their photos. I also use Lightroom, now all the raw global adjustments that you can do in Lightroom are the same as the ones you get from the Camera Raw when you open a photo in Photoshop. The benefit of Lightroom is not in the adjustments, but in the workflow. With digital cameras, shooting hundreds of photos at each shoot, the basic adjustments and selection of good candidates is really time consuming. That's also why I have Photosmith on the iPad which can import Lightroom collections.
2. I tend to work in layers and all the work is non-destructive (because of my computer science background I guess, I still have a way of thinking about technical things which do not really matter I guess), this creates quite big photoshop files (I also always use 16bpp ProPhotoRGB colour space) so you'll need a computer with lots of RAM, mine is a iMac 27' that I bought for "cheap" refurbished with 4gb of ram and then immediately upgraded to 16.
3. A tablet is a must. It took me a while to get used to one, I started years ago (my first tablet connected over a serial cable... my first Photoshop was 2.5) and cheap tablets might dissuade you (i.e. I find too soft nibs to be really frustrating) but trust me, get a Wacom and live happily. You don't have to spend lots of money, I have an old Intuos 3 A5 and an even older Graphire 3 A5.
4. A calibrated monitor. I don't use the shiny iMac monitor as shiny monitors are useless :| so I have next to it another 27', from the "high-end" Dell series. It's very decent and great quality for the money.
Tablet setup

The fundamental keys I always use are the modifiers (shift, control, command, alt) which access different options for any given tool. The right mouse button is mapped as the main button on the pen, that accesses the brush options while drawing and you'll change these every second. The other frequently accessed key is the 'x' button. This switches between background and foreground color when painting. While retouching we will always painting layer masks, so having these colors set to black and white and alternating between them with the x key makes this task very effective.

As I have two monitors and as you have to map only one to your tablet (be sure to avoid any stretching in the mapping setup), I reserve the second button to the "switch monitor" operation for the tablet. That's because I keep all the photoshop dialogs on the second monitor, so this button helps to avoid having to reach for the mouse every time I want to go on the second monitor (which I still do time to time).

Photoshop setup and learning prerequisites

As I have lots of space, especially on the iMac monitor which is higher resolution than my second "main" one I tend to have all the windows open. If you have less space, what you want to have always visible are the layers, the history and maybe the navigator, other than the main tool bar.

The things you absolutely need to know for this kind of retouch work are the following, you can use the Photoshop manual, experiement a bit or go on youtube and follow some basic tutorials for these but they are fundamental.
1. Layers: layer masks and adjustment layers. Blending options are useful to know too.
2. Adjustment curves at least. Levels are important too but out of lightroom I already export an image with the right levels and white balance. All the other image adjustments are also important but curves are the king.
3. Selection tools: at least the rectangle and the polygon lasso. I don't need too precise selections because I copy and paste rough patches and then precisely mask them out using layer masks.
4. Image and selection transforms (edit - free transform)
5. How to copy and paste from a layer and how to copy and paste from all layers (copy merged)
6. Drawing tools: the brush and its options (especially opacity and how it can be controlled with the numeric row of keys), clone tool (how to use, and the sample - current and below option), the healing brush (how it works and the sample all layers option, note that you don't get "current and below" option here, for some reason)
7. Brush options (right mouse button)
8. Zoom tool, navigator and shortcuts for image zoom and move.
9. The liquify image filter.
I don't customize Photoshop much, which probably I should. I don't even use all the shortcuts I could, this is similar to what I do when programming, I don't focus too much on the speed of doing the actions, I will (probably) never learn VI. What does that mean is that you don't need to do much to replicate my setup :)

The most important thing is how I set the tablet to work with the various tools, and what I do is extremely simple: I let the tablet pressure drive the paint opacity only. You do this by going in the brush window and selecting "transfer" and under "opacity jitter" select "control: pen pressure". You also might want to disable any other pen controls you might have in the brush (i.e. no "shape dynamics") and save this as a custom brush preset (the brush options are different for each tool so you might need to do this a few times or select the same preset for all the drawing tools you use).

The reason for this is simple, I want to always precisely control the size of the painting area, so I enter that using the brush options on the right mouse button (size and hardness), while for the intensity of the paint the tablet works great, together with the master opacity setting you can change via the keyboard numeric row (which sets the maximum opacity you will get at full pressure)

First step: General image composition

As a general rule, I will layer the various corrections from the most fundamental up to the most "artistic", this is so you build your work upon a reasonable base image. The corrections at the bottom of the stack, done first, are the ones that won't change regardless of the final image style, the ones at the top are the most "flexible" and due to changes.

This rule, I immediately break it with the very first layer, the background image. Now, if you're working for a client, you might not want to do this, but as I work for myself and I find reasonable to have all the general qualities roughly in place before retouching, as this will provide a better reference of what to retouch and what is most important.

That's why my base image out of Lightroom is already "close to final" interpretation, what I do is often adjust it to what I like in Lr and select the shot using "final" settings and crops, then back out a bit if they are too extreme (burn blacks or whites too much, or have weird coloring and so on) so I have a more flexible image in photoshop (one that still has all the detail and tones so I can push them locally later in Photoshop in a more controlled way). You could start with a very neutral image instead and decide your final art direction last, after all the skin work, and probably it is a good idea. I don't do that.

Similarly, I do first all the major liquify changes and if I'm compositing multiple images, replacing the face or the hair from one into another and so on, I do it here. You might want to liquify last if you're working for a client as they tend to change idea often, or just do a base liquify here and more extreme "artistic" changes later. Liquify is always a destructive operation so if you do it last, copy merged and paste the entire picture in another top level layer and liquify that copy, if you need to change you can delete said layer and do it again.

Second step: Major skin defects

 After this step
Now here is where I "cheat" a bit by taking some shortcuts. Retouching skin can be done in a hundred of ways, and it goes from quick and plastic to lots of pain and incredibly natural results. I'm sure most professional retouchers do their skin work all by painting adjustment layers at the skin pore detail level.

This might or might not be practical if you have a more limited time there are more "automated" tools which save time but at the expense of quality. You might find many tutorials on the net on how to use things like "smart blur" or "frequency separation" via the highpass filter and stuff like that. I don't generally do these (even if you should know at least the latter) filter-based things but I still don't paint everything as well as it could.

So what I do as a first skin step is to create a layer or two which will see the use of three techniques:
1. The healing brush.
2. The cloning brush.
3. Copy and paste pieces and then mask them out.
Step one is the easiest. Just try to keep the size as small as possible, locally removing defects without destroying the texture detail or creating weird textures.

The cloning brush comes into play when the healing brush is not able to do its job, and for bigger areas. This is tricky to use because you have to clone a source that has a similar overall skin tone and skin transitions, but doing so you're most surely cloning a source with a texture pattern that is different from the area you're working on, so I don't do this much at all on the skin.
In this photo it was used a bit on the hair for some spots where the skin was showing through and I use it a lot on the background and to remove stray hairs. That's because the cloning tool there will do a better job than a straight airbrush with no texture (there is still some texture in most backgrounds, even if plain color, and the difference will often show in print).

Instead of cloning, most often I prefer to use the third technique, the copy and paste, if I have to patch bigger skin regions. You get better control this way because you can rotate, align (free transform) and liquify (scale a bit even, not too much as that will mess with the texture) the skin patch to match where it needs, also you can apply other corrections on the layer (curves, local dodge and burn using the tools...) also to find the best match, then with the layer mask control were the skin is applied and how it fades.
It can take a few such patches to cover a region with colors that make sense, in this picture you can see an example of this technique at work on the neck and again in the hair (the hair is probably the worst part of this particular retouch though) and to cover a nipple that was showing next to the arm. In this latter case I pasted some hair, and the mask for the arm was done quickly via the polygonal lasso whose edges where then smoothed using select/modify/smooth and a tiny bit of feather.

Third: Dodge and burn for skin!

This is where you will spend a chunk of your time and it's the actual skin detail fixing. The idea is that in order to remove skin wrinkles and other subtle defects, we can lighten the shadows caused by such wrinkles or darken the bad specular highlights that these surfaces create.

In order to do that, we set two adjustment layers, one that should lighten the image (dodge) and the other that darkens it (burn). They are both made with curves and a layer mask which is then immediately inverted (so the by "default" the layers will do nothing, then we'll paint when we need them).
There are various youtube videos which can help understanding this, i.e. this one or this. It might seem really hard to pull off and incredibly tedious, but it works really well and most importantly, it fills only as needed and following the natural skin detail, so in the end the skin does not look like "fake" but just "better" which is what we want.

This work takes time, patience, and a good eye. Skin defects appear at different scales, you have very fine wrinkles and you have discoloration and other marks.
I often change from a very zoomed in view and a 2/3 pixel hardness zero brush to a more zoomed out view to and a 5/7 pixels broader brush (you can even keep these two different scales organized in separate layers), and I even keep a "notes" layer sometimes where I can mark where I need to work still, when you zoom in it's easy to lose the "whole picture" perspective and spend much time in some areas that are not too bad, so zooming out, squinting or even temporary creating a copy of the image and blurring it helps.
Also, before starting such work I create an "exaggeration" layer, a curve that pushes the contrast of the skin (it depends on the picture) and converts to black and white, to make defects easier to see during this process.

So. Why curves? How exactly to do the two curves? Not everybody uses the same technique, I've seen people using a single 50% gray layer set to "overlay" blending to do both dodge and burn (I think that's harder), and the people who do use curves, how exactly do they make them?
Well, I use my eye. I would like to be more scientific and I did made some javascript things to help me, but I didn't find the perfect recipe yet. The idea to me is that you want the two curves to be "gentle" and never rise or fall too "fast", and you want to push them far enough to give you range for changes but not too much that they will start clipping the whites or blacks into a single, non textured shade.

Unfortunately, whatever nonlinear change you apply in the curves, it will change the hue and saturation of the areas you paint. This is a problem, and it's why you'd better care about colors after dodge and burning.
Wait a minute! Can't we set the layer blending mode on the burn and dodge to "luminosity" only? Yes we can and you can, but it won't terribly help as the skin itself changes hue based on its luminosity so if you keep the original hue and you lighten an area, it won't be the "right" light-hue-skin tone it should have been. Are there ways around this? I think so, but I have to experiment more. In practice it often does not matter too much as the changes you're making here are not huge.

This coloring problem is also why it's wise not to export out of lightroom a photo with split toning and other wild coloring, as these will be incorrect while dodging and hard to fix.

Fourth: Color corrections

Honestly I'm not great at this. I use a few different things but this is probably the weakest part of my entire process. Still, corrective layers. A saturation one (hue/saturation) for areas where you dodged too much and lost some color. Then for large areas of wrongness (i.e. a patch that is too red and so on) I do rough selections with the polygonal lasso, then feather the selection out and use ad hoc corrections.

Recently I started using a gradient map set to hue blending with colors sampled from the skin to apply a quick fix over some areas which go wrong. The idea is that you could create a layer or set of layers which from the black and white (luminosity) they reconstruct the right colors (colorization). A first step for this is to use a gradient map which associates colors with different luminosities. This alone creates a very uniform and weird skin but used with a mask only here and there, it's decent.
I even wrote a script that samples all the colors from a selection of the skin and creates some "buckets" of color based on the various luminosities. If you're curios this is a description of the technique, I'm not great at it so I won't write more.

After your skin is good enough you can start having fun and caring about shine, volume, altering the makeup and so on. In this picture I did very little, there are two extra dodge and burn curve layers, and I used them a tiny bit in the hair, eyes, eyebrows and shoulders. I also used a hue/saturation adjustment to bump a little the makeup around the eyes.

Forums like RetouchPro and ModelMayhem often have some interesting inspiration.

The crazy coloring is done with two layers set to screen, filled with a gradient (linear one for the blue, and a round one for the red, then with some wide brushing and wide gaussian blur). Then the gray ramp was moved to something more interesting using separate r/g/b curves in the curve adjustment and some hue and saturation.

I have an action I recorded for printing/exporting, so if I have to share this I flatten layers, apply a curve that moves blacks to around 5 and whites to around 250, then do a detail unsharp mask a tiny bit of a wider unsharp mask, convert everything to sRGB and 8bits.

## 22 August, 2012

### Wanna play with OpenCL?

OpenCL Studio is a nice wrapper of OpenCL and Lua packed in a ad-hoc IDE. It’s a nice way to explore OpenCL and it has some video tutorials, but you might want to delve into it a bit faster than that.

Here’s a quick guide through its particles.3Dm example (OpenCLStudio 2.0). It won’t teach you anything of OpenCL, but it will guide you through a few things in OpenCLStudio and past that you should be at home if you know the basics of OpenGL and OpenCL (and Lua).

In fact (and that’s true for most my posts), I’m writing this as notes to self while I explore OpenCLStudio for the first time…

1) The Project tree and the main Scene window
Here you can create, view and edit (see the properties tab on the bottom of the application) all the main resources in the project (OpenGL, OpenCL and some OpenCLStudio specific things like its GUI and the Script Processor). Each resource is reflected and available to the scripting system (Lua).

Also notice on the top of the project tree two buttons, one to run the application, the other to reset it. The reset button executes all the non per frame scripts, so if you modify one of these you’ll need to reset to see the changes, all other scripts and shader code are automatically hotswapped after a successful compile.

Project tree elements can be added only if the project is in its reset state. In this state you get access also to some other edit abilities which don’t work after you run the application, for example if you create some OpenGL “geometry” it will be editable with manipulators in the scene, if the project is reset, but the manipulators won’t appear otherwise.

The scene window is pretty self explanatory, you can see here your 3d world (through the camera you have in the project tree) and all the GUI elements if you have them.

Notice that if you have the project running and you click on an OpenCL source in the tree (i.e. clParticles), in its properties you’ll get the timing of its various kernels. Nifty!

2) Script walkthrough

The tab named "script” shows the contents of one of the “script processors” (see again the project tree). The way this works is a bit unusual, the scripts are bound to objects in the scene tree (callbacks of said objects to be precise) and for some reason still unclear to me, each script processor can host only 32 objects and these are arranged in a 4x8 grid of slots in the processor.

You can create an arbitrary number of processors though and the position of a given object in a processor grid does not seem to mean anything, so in practice, if you want to script an object behavior, you just drag it into an empty slot of a processor and write some code for a given callback.

Most scripts associated with resources are to initialize them. Start looking at bfForce for example and its “OnDeposit” (creation) callback, and how it initializes its memory to zeroes. The “event” instance contains the information about the object whose callback is attached to, and its properties change depending on the object, you can read about them in help/inbuilt/classes.

Notice that all the windows with some code in them have a small red downward arrow button, that is the compile one. Kernels and shaders will be hotswapped after a compile, Lua scripts will be swapped as well but if they don’t execute every frame you won’t notice until you reset.

All the methods have a decent documentation that pops up automatically while autocompleting (or you can find it going in the Help tab, Inbuilt, and on the bottom selecting the Modules tab… OpenCLStudio seems to have some nice features but they are often slightly rough…).

Next, have a look at bfParams OnDeposit, which is a bit more complicated. Here it prepares this small param buffer and you can notice how the event is not read-only, in fact here it’s sizing the buffer based on struct.size(“System”) thus the size you declare in the object properties in the project tree won’t really matter (try to change it, reset the project, and see).

Ok, now it’s time to peek at the Global window in our Script tab… always visible this will get executed before everything else, and thus can declare some support structures for the entire script processor… Here you can see it creates the struct named “System” with some fields (struct is a way OpenCLStudio has to map raw data arrays to c-like structures) and then some global variables. Now it should be clear that bfParams OnDeposit takes the OpenCL buffer of the bfParams object, sizes it and then assigns to it an “intepretation” saying that it will correspond to the “System” structure which is used to pass to the OpenCL kernels the application parameters, then it maps it, telling that the buffer will be writable from the application (CPU).

You can peek around all the OnDeposit/OnRemove members, when you’re satisfied, go to the meat of the application, which is the per frame execution contained in OpenCL OnTime. This does everything, and really it’s in the OpenCL object but it doesn’t access the “event” so it could be everywhere in an OnTime, you can move it into the OpenGL object for example and you’ll notice it will still work.

This main application code is very straightforward and should not surprise you at all.

3) GPU Code and how the particle system works
The code tab is where the GPU code is. If you select and OpenCL module or an OpenGL shader in the project tree, the code tab will display its code.

So, how is this particles.3Dm example implemented? Well, looking around, starting from the OpenCL OnTime script, you’ll notice how things are arranged.

Particles are split into an OpenGL representation, made of three streams: Position, Color and Velocity. Position and Velocity are updated by OpenCL kernels (clEnqueueAcquireGLObjects…), Color is set via a Lua OnDeposit method and never changed. In this example the OpenGL shaders do not use the velocity stream, so it could have even not be declared as an OpenGL buffer.

The integration is done using a simpler Euler step (clIntegrate kernel followed by clClipBox), then particles get sorted via a spatial hash (clHash/radix sort/reorder kernels) into the bfSortedVel and Pos arrays.

Notice how the radix sort does not require all the clSetKernelArg calls and so on, it’s part of the libCl opensource library (by the same authors of OpenClStudio) and has been wrapped by in a way that takes care of all the binding outside Lua.

After sorting, particles are assigned to buckets, which are represented by a start and end index into the sorted particle array (bfCellStart and End). This is done via the clBounds kernel, which per each particle it compares its hash with the hash of the next particle (remember, they are sorted by now) and when a difference is found it knows a particle block ended and it writes cellEnd and cellStart. This kernel is interesting as it uses the thread local space to avoid fetching twice from global memory the hashes (which on a modern card is not actually faster, but still...). Notice how the local space is assigned in the Lua script, clSetKernelArg reserves space for one unsigned int per thread plus another global int (see how the work is grouped into 256 thread units in clEnqueueNDRangeKernel).

All this bucketing is done obviously to speed up particle/particle collisions, done via clCollideParticles is a pretty unsurprising way, notice how in the global Lua script the CELLSIZE is twice the particle size. After that, analytic collisions with the scene’s big unmovable spheres is done, this again is fairly trivial and will also take care of getting the data back into the unsorted rendering arrays.

As you can see this example, I think for sake of clarity, does many more passes (kernels) than it could have, particles2.3Dm is not better, it just adds more eye candy, so it’s a nice starting point for your own experiments. Have fun!

## 18 August, 2012

### Next-gen: Quality vs Quantity

Pixar's Luxo JR, 1986.

Is this rendered "offline"? Would we mistake this frame as something our realtime engines are capable of? We can go any number of years in the past and look at offline outputs for movies and shorts, up to the very beginnings of computer graphics, and still recognize a gap between videogames and movie productions. How comes?

It's a matter of quality. Siggraph ended last week and Johan Andersson delivered yet again a clear, precise and technically superb presentation on what are the key challanges of realtime rendering. And yet again at number one he identified "cinematic quality" as the priority.

Now of course at a technical level, this implies research on antialiasing solutions (Have you ever noticed how -some games- do look incredible by just bumping up on PC the sampling rate?), spatially and temporally. His presentation is great so if you haven't read it yet, stop reading right now and fetch that first (and be sure also to follow the links and references he included).

But, I would argue there is more to it than our technical limitations, it is an attitude and a way of working. When creating a realtime rendering experience we are always constrained by our hardware and human resources. We have to make compromises, all the time, and one fundamental compromise we make is between quantity and quality. Should we add or refine?

This is indeed a very fundamental tradeoff, we don't encounter it only in rendering but all across our production and in all activities beyond the videogame industry as well. Isn't it the same compromise that lends to code rot and technical debt in our software? I would dare to make a comparison between Apple and other companies, but that's probably not the best and I could end up having to remove too many comments... Leica versus Canon?

Ok, going back on track, are we doing our best job at this compromise? When we add a visual feature to our rendering, how conscious are we of where it falls on the quality versus quantity line? Is quality even really considered as a target, or do we still reason in terms of features on the "back of the box"?

Of course if you ask that question, everybody would indeed say that quality is their focus. Quality sounds good as a word, it's the same as "workflows" and "tools" and "iteration", everybody wants to have these things, especially if the come for free. We have a "great" team, so quality is a given, right?
Have you ever finished a task for a rendering feature, and it works in game. And it's considered "done"? Or gets tested by an artist, it doesn't crash, it's understood, done? When does quality ever enter this process? I won't rant on otherwise I'll spend half of the post just talking about similar things and companies that value their tools by assigning the most junior programmers to the task and such horrors that really deserve their own space another time.

Do we really know what matters anyways? It is easier for everyone really to go for the quantity, add another feature, it looks like progress, it's easy to track on the schedule, it's easy to show in a trailer, caption and tell the world, we spent money in this new thing, regardless of how it feels. Next-generation HDR and bloom, rings a bell? Crank the parameters to a million and show your work...
Great rendering technology is not supposed to be seen, without trying to be romantic, it should be "felt". The moment you're capable of identifiying rendering techniques in an image, and not layers visual elements in the image (i.e. volume, texture, silouhette, shadows) we already have failed somehow.

If you can't do it well, don't do it at all. I think about Toy Story and how Pixar chose not to make a story with organic creatures, believing they could not yet deliver that level of complexity. And then of Crysis 2, a great game, with stable graphics (one of the first things I noticed, almost defect free), that I bought and finished.
I make this example often really, because it's such a popular game that most especially renderers have played, which does something peculiar, it fades your LODs pretty aggressively, especially it seems on small detail objects, like rocks on the ground which dissolve into nothing after a few meters. And to me that's surprising, in such a well crafted game, why does it has to show me that rock and then fade it? I can see it, I can see the trick, and I don't want to. I can live without the rock, I would not complain about it not being there, but now I can see it dissolving, I can see the render tech.

The truth is, from my experience at least, we are still far from understanding what makes great realtime visuals. Not only how to manage them, how to produce them and how technically how to solve these problems but also exactly what matters and what doesn't. So you see (and I'm making that up, right now I'm not thinking at one specific title) games with yellow skin highlights and glowing cyan sky that spent time into a dubious water simulation at a scale where it won't anyways be sold as right.
And then when it's all done and badly, we layer on top thick pass of bloom (possibly temporally unstable) and vignette and tinting and lensflares that would make Instagram blush, and we call it done.

In part it's a lack of education, we're a naive, young industry that only now is transitioning into a mature craft with fierce competition and we often just lack the right skills in a team. In part, it's truly a dark art with a lot of unknowns, far from being a science.
Do I really need that? Could I bake that other? What approximation of this is the most reasonable? If you're lucky, you work closely with artists, and great artists and art directors, with the right workflows and somehow magic happens. And some companies do seem to deliver this magic consistently. But as an industry? We're far. Both in practice and in theory, as I wrote some time ago.

There are people saying that rendering is less relevant, or even "solved" (which is crazy, as it's far to be solved even offline, even in theory). That behavior and motion are the next big challenges (and undeniably, they are becoming crucial, and if you have some time read this three, in increasing order of complexity: motion graphs, motion fields and their successor) but we are still far not only from photorealism but from proper understanding and application of the techniques we have, and from proper understanding of how to manage the creative process and how to achieve predictable levels of quality. We probably have more techniques and technical knowledge than experience and "visual taste".

I truly suspect that if we were to start an engine from the basics, forward rendering and mostly baked, a sprinkle of stable CSM shadows and a directional light, and stretch it to quality, we would already have enough tasks to figure out inputs (Textures? What resolution, what filtering. Compressed how? Normals? What are these? Specular? Occlusions?), materials (Specular and prefiltering? Fresnel, how? With Blinn? Diffuse, how? What matters, what does not work...), outputs (Antialiasing, HDR, Tonemapping*, Exposure... there are books written on tone mapping alone) and what matters were (Perception, A/B testing, biometrics? Eye tracking? LODs how etc...) to keep a rendering team busy for a couple of game cycles, write some research papers in the process and create the best visual experience of any generation...

-

*Note: It took way too long and a few articles in books to get this whole gamma/degamma thing to a "mainstream" level. Let's not wait another ten years to get the idea that HDR needs to be PROPERLY tonemapped to make any sense. And by HDR I don't mean an HDR framebuffer in post, even if you write just to a LDR 8bit target, you can still do it in the shader, right? Right? Ok. If you don't know what I'm talking about, go and read and USE this.

This is probably not news, and I don't often post links or summaries of conferences, as there are many people that do a better job than I would at that, but since I posted a bad sketch of an half-tested idea a while ago about caching cascaded shadowmaps I got some inquiries about it. So why didn't I wrote something decent? Let me digress :)

At that time our team at work was trying to optimize a game's GPU performance, and among the various things, shadows were a problem. We were already experimenting with some forms of caching of static objects, we had this idea (inspired by looking at Crysis 2 in action... if you look closely you can see this) of rendering our far cascaded every other frame (five in total, in a frame we would
update the first one and two of the remaining four).

This worked only so-so due to dynamic casters being able to walk "into" their own shadows, so we tried to cache the static objects and render only the dynamic objects every frame (for the two cached cascades). This turned out not to be a win with the size of our shadowmaps, we were basically using half of the shadow generation time in bandwidth/resolve, so the caching didn't really bring anything.
This consideration, killed for us the incentive to go further implementing other caching schemes, and left me wondering if on this generation of consoles this scheme could really turn out to be faster.

Well, wonder no more! Luckily, decent ideas tend to eventually be discovered many times and recently Mike Day published an excellent work (presented by Mike Acton) on something that he implemented which is closely related to the bad idea sketch I posted on the blog. His work is very detailed and provides all the information on how to implement the technique, so go and read this paper if you haven't already.

He does the caching by reprojecting the old information both in UV and in depth before splatting in the dynamic occluders, as at the time I was already concerned about the bandwidth, I was speculating of using the stencil to tag the z-near and z-far used to render a given region (that though would have worked only on 360 and ps3 where with some trickery you can access the stencil information while sampling the depth shadowmap, not on dx9) and using other hacks which are probably not worth their complexity as they would still result in the same worst-case scenario.

P.S. You might have noticed the little "posted with Blogsy" logo at the bottom of this article, this is the first time I use my iPad for the blog and I have to say, I'm pleased. This little app lets you write the post with all the formatting features (which I don't use) and integrates a browser and image functionalities so you don't have to fight with the broken (absent) multitasking of iOS.
And here, let me go full hipster with a photo taken with HDR Camera on Android, then abused on Instagram and uploaded to the blog via its Picasa account... It will burn your eyes :)