Search this blog

18 November, 2017

"Coder" color palettes for data visualization

Too often when programmers want to visualize data (which they should do often!), we simply resort to so called "coder-colors", encoding values directly into RGB channels (e.g. R = data1, G = data2 ...) without much consideration.

This is unfortunate, because it can both significatively distort the data, rendering it in a non perceptually linear fashion and biasing certain data columns to be more important than others (e.g. the blue channel is much less bright than the green one), and make the visualization less clear as we leverage only one color characteristic (brightness) to map the data.

The idea here is to build easy to use palette approximations for data visualization that can be coded as C/Java/Shader/etc... functions and replace "coder colors" with minimal effort.

Features we're looking for:

  • Perceptual linearity 
    • The palette steps should be equal in JND units
    • We could prove this by projecting the palette in color space made for appearance modeling (e.g. CIELAB) and looking at the gradient there. 
  • Good range 
    • We want to use not just brightness, but color variations as well.
    • We could even follow curved paths in a perceptually linear color space, we are not restricted to straight lines..
    • The objective is to be able to clearly distinguish >10 steps.
  • Intuitive for the task at hand, legible
    • E.g sequential data (0...1) versus diverging or categorical data (-1...1).
  • Colorblind aware
    • The encoding should primarily rely on brightness variation, color variation should be used only to try to increment the range/contrast and using colorblind safe colors.
Now, before I dump some code, I have to disclaim that albeit I tried to follow the principles listed above, I don't claim I am absolutely confident in the end results... Color appearance modelling is quite hard in practice, it depends on the viewing environment and the overall image being displayed, and there are many different color spaces that can be used.


The following palettes were done mostly by using CIELAB ramps and/or looking at well-known color combinations used in data visualization. 
The code below is GLSL, but I avoided on purpose to use GLSL vectors so it's trivial to copy and paste in C/Java/whatever else...

One-dimensional data.

vec3 ColorFn1D (float x)
{
x = clamp (x, 0.0, 1.0);
float r = -0.121 + 0.893 * x + 0.276 * sin (1.94 - 5.69 * x);
float g = 0.07 + 0.947 * x;
float b = 0.107 + (1.5 - 1.22 * x) * x;
return vec3 (r, g, b);
}

This palette is similar to R's "Viridis", even if it wasn't derived from the same data. You can notice the sine in one of the channels, it's not unusual for most of these palettes to be well approximated using sine waves because the most straightforward way to derive a brighness-hue-saturation perceptual color space is to use cylindrical transforms of color spaces that are rotated so one axis represents brightness, and the other two are color components (e.g. that's how CIELAB works with the related cylindrical transforms like CIELCH and HSLUV)

Palette, example use and sRGB plot

Note how the palette avoids stretching to pure black. This is wise both because the bottom range of sRGB is not great in terms of perceptual uniformity, and because lots of output devices won't do particularly great when dealing with blacks.

One-dimensional data, diverging.

vec3 ColorFn1Ddiv (float y)
{
y = clamp (y, -1.0, 1.0);
#if 0
float r = 0.569 + (0.396 + 0.834 * y) * sin (2.15 + 0.93 * y);
float g = 0.911 + (-0.06 - 0.863 * y) * sin (0.181 + 1.3 * y);
float b = 0.939 + (-0.309 - 0.705 * y) * sin (0.125 + 2.18 * y);
#else
float r = 0.484 + (0.432 - 0.104 * y) * sin(1.29 + 2.53*y);
float g = 0.334 + (0.585 + 0.00332 * y) * sin(1.82 + 1.95*y);
float b = 0.517 + (0.406 - 0.0348 * y) * sin(1.23 + 2.49*y);
#endif
return vec3 (r, g, b);
}

Palette, example use and sRGB plot

One-dimensional data, two categories.

Essentially, one dimensional data + a flag. It choses between two palettes that are designed to be similar in brightness but always quite easy to distinguish, at any brightness level.

vec3 ColorFn1DtwoC (float x, int c)
{
x = clamp (x, 0.0, 1.0);
float r, g, b;
if (c == 0)
{
r = max (0.0, -0.724 + (2.52 - 0.865*x)*x);
g = 0.315 + 0.589*x;
b = x > 0.464 ? (0.302*x + 0.641) : (1.27*x + 0.191);
}
else
{
r = 0.539 + (1.39 - 0.965 * x) * x;
g = max (0.0, -0.5 + (2.31 - 0.878*x)*x);
b = 0.142 + 0.539*x*x*x;
}
return vec3 (r, g, b);
}

Two examples, varying the category at different spatial frequencies
and the two palettes in isolation.

These palettes can't go too dark or too bright, because otherwise it won't be easy to distinguish colors anymore.
The following is a (very experimental) version which supports up to five different categories:

vec3 ColorFn1DfiveC (float x, int c)
{
x = clamp (x, 0.0, 1.0);
float r, g, b;
switch (c)
{
case 1 :
r = 0.22 + 0.71*x; g = 0.036 + 0.95*x; b = 0.5 + 0.49*x;
break;

case 2 :
g = 0.1 + 0.8*x;
r = 0.48 + x * (1.7 + (-1.8 + 0.56 * x) * x);
b = x * (-0.21 + x);
break;

case 3 :
g = 0.33 + 0.69*x; b = 0.059 + 0.78*x;
r = x * (-0.21 + (2.6 - 1.5 * x) * x);
break;

case 4 :
g = 0.22 + 0.75*x;
r = 0.033 + x * (-0.35 + (2.7 - 1.5 * x) * x);
b = 0.45 + (0.97 - 0.46 * x) * x;
break;

default :
r = g = b = 0.025 + 0.96*x;
}
return vec3 (r, g, b);
}

Two dimensions

Making a palette to map two dimensional data to color is not easy, really depends on what we're going to use it for. 

The following code implements a variant on the straightforward mapping of the two data channels to red and green, designed to be more perceptually linear.

vec3 ColorFn2D (float x, float y)
{
x = clamp (x, 0.0, 1.0);
y = clamp (y, 0.0, 1.0);

// Optional: gamma remapping step
x = x < 0.0433 ? 1.37 * x : x * (0.194 * x + 0.773) + 0.0254;
y = y < 0.0433 ? 1.37 * y : y * (0.194 * y + 0.773) + 0.0254;

float r = x;
float g = 0.6 * y;
float b = 0.0;

return vec3 (r, g, b);
}

Two-channel mapping and example use contrasted with naive
red-green direct mapping (rightmost image)

As an example of a similar palette designed with a different goal, the following was made to highlight areas where the two data sources intersect, by shifting towards white (with the mapping done via the red and blue channels, primarily, instead of red and green).
Beware of how this one is used, because it could be easily misinterpreted for a conventional red-blue channel mapping as we're so accustomed to these kinds of direct mappings.

vec3 ColorFn2D (float x, float y)
{
x = clamp (x, 0.0, 1.0);
y = clamp (y, 0.0, 1.0);

float r = x;
float g = 0.5*(x + 0.6)*y;
float b = y;

return vec3 (r, g, b);
}

Another two-channel mapping and example use contrasted 
with naive red-blue direct mapping (rightmost image)

Lastly, a (very experimental) code snippets for two-dimensional data where one dimension is divergent:


vec3 ColorFn2Ddiv (float x, float div)
{
x = clamp (x, 0.0, 1.0);
div = clamp (div, -1.0, 1.0);

#if 0
div = div * 0.5 + 0.5;
float r1 = (0.0812 + (0.479 + 0.267) * x) * div;
float g1 = (0.216 + 0.407 * x) * div;
float b1 = (0.323 + 0.679 * x) * div;

div = 1.0 - div;
float r2 = (0.0399 + (0.391 + 0.196) * x) * div;
float g2 = (0.232 + 0.422 * x) * div;
float b2 = (0.0910 + (0.137 - 0.213) * x) * div;
    
return vec3(r1, g1, b1) + vec3(r2, g2, b2);
#else
float r = 0.651 + (-0.427 - 0.138*div) * sin(0.689 + 1.95*div);
float g = 0.713 + 0.107*div - 0.0565*div*div;
float b = 0.849 - 0.13*div - 0.233*div*div;
    
return vec3 (r, g, b) * (x * 0.7 + 0.3);
#endif
}

DataLog & TableLog


What:
  • A simple system to serialize lists of numbers. 

Why: 

  • Programmers should use visualization as an everyday tool when developing algorithms. 
    • Most times if you just look at the final results via some aggregate statistics, for non trivial code, you end up missing important details that could lead to better solutions. 
    • Visualize often and early. Visualize the dynamic behaviour of your code!
  • What I used to do for the most part is to printf() from C code times values in a simple csv format, or directly as Mathematica arrays.
    • Mathematica is great for visualization and often with a one-liner expression I can process and display the data I emitted. Often I even copy the Mathematica code to do so as a comment in the C source.
    • Sometimes I peek directly in the process memory...
  • This hack’n’slash approach is fine, but it starts to be very inconvenient when you need to dump a lot of data and/or if the data is generated by multiple threads or in different stages in the program.
    • Importing the data can be very slow as well!
  • Thus, I finally decided I needed a better serialization code...

Features:

  • Schema-less. Serializes arrays of numbers. Supports nested arrays, no need to know the array dimensions up-front. Can represent any structure.
  • Compact. Stores numbers, internally, in the smallest type that can contain them (from 8-bit integers to double-precision floating point). Decodes always as double, transparently.
  • Sample import code for Processing.
  • Can also serialize to CSV, Mathematica arrays and UBJSON (which Mathematica 11.x can import directly)
  • Multi-thread safe.
    • Automatically sorts and optionally collates together data streams coming from different threads.
  • Not too slow. Usable. I would probably rewrite it from scratch now that I understand what I can do better - but the current implementation is good enough that I don't care, and the interface is ok.
  • Absolutely NOT meant to be used as a "real" serialization format, everything is meant to be easy to drop in an existing codebase, zero dependencies, and get some data out quickly, to then be removed...

Bonus: "TableLog" (included in the same source)
  • A system for statistical aggregation, for when you really have lots of data...
  • ...or the problem is simple enough that you know what statistics to extract from the C code!
  • Represents a data table (rows, columns).
    • Each row should be an independent "item" or experiment.
    • Each column is a quantity to be measured of the given item.
    • Multiple samples (data values) can be "pushed" to given rows/columns.
    • Columns automatically compute statistics over samples.
    • Each column can aggregate a different number of samples.
    • Each column can be configured to compute different statistics: average, minimum, maximum, histograms of different sizes.
  • Multithread-safe.
    • Multiple threads can write to different rows...
    • ...or the same row can be "opened" globally across threads.
    • Columns can be added incrementally (but will appear in all rows).
--- Grab them here! ---

DataLog: C code - computing & exporting data


DataLog: Processing code - importing & visualizing data


TableLog: C code
TableLog: Data imported in Excel


More visualization examples...

22 October, 2017

The curse of success. Why nothing great is ever "good".

- The wedding.

A few weeks ago I flew back to Italy. My best friend was getting married, and I had to be his best man. 

The Amalfi coast is great for a wedding.

One night a few days before the wedding we were spending some time together when suddenly he starts getting phone calls from work. Some of the intranet infrastructure in this huge industrial oil and gas site he works in started failing, and even if he is not an IT expert, he happens to be the chief of maintenance for the entire site, and that entails also making sure their server room has everything it needs to keep functioning.

A few months back he commissioned a partial remodel of the room to improve the airflow, as they started experiencing some problems with the cooling efficiency of the air conditioners in the room. Fresh of that experience, immediately a fear dawns on his face: the servers are overheating and that’s why they started experiencing loss of functionality, started with emails.

He sends a technician in the room and his fears were confirmed: none of the four AC units are working, there are more than fifty degrees in the room. Two of them are completely off, the other two have their control panels on but the pumps are not working. To add insult to injury, they didn't receive any notification apparently because the email server was the first to fail. 
After instructing the technician to open all windows in the room, it’s decided that he has to go on site to follow the situation. And as I didn’t have much better to do, I followed...

What came after was a night of speculations, experiments, and deductions that you might see in an episode of House M.D., but applied to heavy industrial machinery. Quite interesting to see from the perspective of a software engineer, debugging problems in code is not quite as exciting...

In the end, the problem turned out to be that one of the phases of a tri-phase outlet was missing, and in the end the culprit was found: one cable in the power line went completely bust, possibly due to some slow decay process that has been going on for years, maybe triggered by some slight load imbalance, till in the end an electric arc sparked between two contacts and fried immediately the system.

Fried connectors.

The two units that appeared still to be powered on had their controls wired to the two working phases of the tri-phase, but even for these the pumps would not work because they require all three phases to be present.

4 am, we're back in the car going home and I was asking questions. Why things went the way they did? What could have been done to prevent downtime of an apparently critical piece of IT? Why was that piece of IT even that critical, apparently there was a mirror unit at another location. What is exactly that server room doing? It seems that obviously there would be better ways to handle all that...

And then it dawned on me - this huge industrial site has a ton of moving parts, at any given time there are many ongoing maintenance projects going on, even just monitoring them all is a challenge. Nobody knows everything about everything. Nothing is perfect, lots of things are not even good, in some ways it seems to be barely getting by, in others, it looks fairly sci-fi... You keep the machine going, you pick your battles. Certain things will rot, some stuff will be old and obsolete and wasteful, some other will be state of the art.

Which happens to be exactly how we make videogames. And software in general! I've never been in a game company where there weren't parts of the technology that were bad. Where people didn't have anything to complain about. 
Sometimes, or often even, we complain only because we easily accommodate to a given baseline, anything good becomes just the way things are, and anything bad stands out. 
But often times we have areas where things are just objectively terrible, old, primitive, cumbersome, slow, wasteful, rotten. And the more successful is the game, the more used is the engine, the bigger and better the end results, the more we risk let some parts fall behind.

- The best products are not made with the "best" tools in the "best" ways.

It's easy to understand how this "curse of success" takes place. Production is a monster that devours everything. Ed Catmull and Amy Wallace describe this in chapter seven of the excellent "Creativity Inc." talking of production pressures as the "hungry beast". When you're successful you can't stop, you can't break things and rebuild the world, there's less space for architecture and "proper" engineering.

People want what you're making, so you'll have to make more of it, add features, make things bigger and better; quicky all your resources are drained trying to chase that dragon. On the other hand, the alternative is worse: technology that is perfectly planned, perfectly executed, and perfectly useless.

Engineers and computer scientists are often ill-equipped to deal with this reality. We learn about mathematical truths, hard science, all our education deals with rigorous theoretical foundations, in an almost idealized, Platonic sense of beauty. In this world, there is always a perfect solution to a problem, demonstrably so, and the goal of the engineer is to achieve it.
The trivialities of dealing with people, team, and products are left to human resources or marketing.

Of course, that's completely wrong as there are only two kinds of technology: the kind that serves people, and the useless kind. But this doesn't mean there is no concept of quality in technology either! Not at all! But, we'll have to redefine the concept of "great technology" and "proper" engineering. Not about numbers, features, and algorithms, but about happiness and people: problems solved, results delivered, needs addressed...

The gorgeous gothic church of San Lorenzo Maggiore, a patchwork of styles

Great technology then seems not to be defined by how perfect and sparkly clean it is on the inside (even if sometimes that can be a mean for the goal) but by a few things that make it unique, and lots of hard work to keep everything in working order. 
If you are very aggressive in prioritizing the end product, inevitably how it's done, the internals, will suffer. 

But the alternative is clearly wrong, isn't it? If you prioritize technical concerns over end-user features, you're making something beautiful maybe, but useless. It's gradient diffusion: the farther you are from the output, the more your gradient vanishes.
The product and its needs are the ones that drive the gradient of change the most, the tools that are used to make the product are one step farther, they still need to adapt and be in good shape in order to accomodate the product needs, but the gradient is smaller, and so on, the tools that are made to make the tools for the products have an even smaller gradient until it vanishes and we deal with technology that we don't even care to write or ever change, it's just there for us (e.g. our workstation's OS, Visual Studio internals, what is Mathematica doing when I'm graphing something, how Outlook works etc...)

The thing that I happen to say most often nowadays when discussing clever ideas is "nice, but what problem it is solving, in practice, for people, today?".

The role of the engineer should then mostly be to understand what your product needs, what makes a difference, and how to engineer solutions to keep things going, how to prioritize your efforts when there are thousands of things that could be done, very little time to do them, and huge teams of people using your tools and technologies that can't stop making forward progress...
That also explains why in practice we never found a fixed recipe for any complex system: there always are a lot of different ways to reach success: engineering is not about trying to find -the- best solution, but managing the ride towards a good one.

- Unreasonable solutions to pragmatic goals.

All this though does not mean that we should be myopic, and just create things by getting precise measures of their effects, and thus optimize for the change that yields the biggest improvement. You can do well, certainly, by aggressively iterating upon your technology, polishing it, climbing the nearest hill. There is value to that, but to be truly excellent one has also to make space for doing crazy things, spending time chasing far-fetched ideas, down avenues that seem at first wasteful.

I myself work in a group that does a lot of research, thus lives by taking risks (if you're not scared, you're not doing research, by definition you're just implementing a solution). And most if not all of the research we do is very hard to correlate to either sales or user engagement. When we are lucky, we can prove some of our stuff saved some time (thus money) for production.

And at one point, one might even need to go explore areas of diminishing returns...

It's what in optimization is called the "exploration versus exploitation" tradeoff: sometimes we have to trust the fact that in order to achieve success we don't have to explicitly seek it, we have to stop explicitly looking at these measures. But that does not mean that the end goal stops to be very pragmatic!
What it means is that sometimes (sometimes) to be truly the best one has to consciously dedicate time to play around, to do things because they are interesting even if we can't prove they will lead anywhere. Know how to tame the production beast.

In practice, it's a tricky balance and a lot of exploration is something that not many teams can realistically achieve (while surviving). Great engineering is also about understanding these tradeoffs and navigating them -consciously-.

07 August, 2017

Tiled hardware (speculations)

Over at Siggraph I had a discussion with some mobile GPU engineers about the pros and cons of tiled deferred rasterization. I prompted some discussion over Twitter (and privately) as well, and this is how I understand the matter of tiled versus immediate/"forward" hardware rasterization so far...

Thanks for all who participated in said discussions!

Disclaimer! I know (almost) nothing of hardware and electronics, so everything that follows is likely to be wrong. I write this just so that people who really know hardware can laugh at the naivity of mere mortals...



Tiled-Based Deferred Rendering.

My understanding of TBDR is that it works by dividing into tiles all the incoming dispatches aimed at a given rendertarget.

For this to happen, you'll have at the very least to invoke the part of the vertex shader that computes the output screen position of the triangles, for all dispatches, figure out which tile(s) a given triangle belongs to, and memorize in a per-tile storage the vertex positions and indices.
Note: Considering the number of triangles nowadays in games, the per-tile storage has to be in main memory, and not on an on-chip cache. In fact, as you can't predict up-front how much memory you'll need, you will have to allocate generously and then have some way to generate interrupts in case you end up needing even more memory...

Indices would be rather coherent, so I imagine that they are stored with some sort of compression (probably patented) and also I imagine that you would want to try to already start rejecting invisible triangles as they enter the various tiles (e.g. backface culling).

Then visibility of all triangles per tile can be figured out (by sorting and testing against the z-buffer), the remaining portion of the vertex shader can be executed and pixel shaders can be invoked.
Note: while this separation of vertex shading in two passes makes sense, I am not sure that the current architectures do not just emit all vertex outputs in a per-tile buffer in a single pass instead.

From here on you have the same pipeline as a forward renderer, but with perfect culling (no overdraw, other than, possibly, helper threads in a quad - and we really should have quad-merging rasters everywhere, don't care about ddx/ddy rules!).

Vertex and pixel work overlap does not happen on single dispatches, but across different output buffers, so balancing is different than an immediate-mode renderer.
Fabian Giesen also noted that wave sizes and scheduling might differ, because it can be hard to fill large waves with fragments in a tile, you might have only few pixels that touch a given tile with a given shader/state and more partial waves wasting time (not energy).

Pros.

Let's start with the benefits. Clearly the idea behind all this is to have perfect culling in hardware, avoiding to waste (texture and target) bandwidth for invisible samples. As accessing memory takes a lot of power (moving things around is costly), by culling so aggressively you save energy.

The other benefit is that all your rendertargets can be stored in a small per-tile on-chip memory, which can be made to be extremely fast and low-latency.
This is extremely interesting, because you can see this memory as effectively a scratch buffer for multi-pass rendering techniques, allowing for example to implement deferred shading without feeling too guilty about the bandwidth costs.

Also, as the hardware always splits things in tiles, you have strong guarantees of what areas of the screen a pixel-shader wave could access, thus allowing to turn certain vector operations (wave-wide) into scalar ones, if things are constant in a given tile (which would be very useful for example for "forward+" methods).

As the tile memory is quite fast, programmable blending becomes feasible as well.

Lastly, once the tile memory that holds triangle data is primed, in theory one could execute multiple shaders recycling the same vertex data, allowing further ways to split computation between passes.

Cons.

So why do we still have immediate-mode hardware out there? Well, the (probably wrong) way I see this is that TBDR is really "just" a hardware solution to zero overdraw, so it's amenable to the same trade-offs one always have when thinking of what should be done in hardware and what should be programmable.

You have to dedicate a bunch of hardware, and thus area, for this functionality. Area that could be used for something else, more raw computational units.
Note though that even if immediate renderers do not need the sophistication of tiling and sorting, they still need space for rendertarget compression which is less needed on a deferred hardware.

Immediate-mode rasterizers do not have to overdraw necessarily. If we do a full depth-prepass for example then the early-z test should cull away all invisible geometry exactly like TBDR.
We could even predicate the geometry pass after the prepass using the visibility data obtained with it, for example using hardware visibility queries or a compute shader. We could even go down to per-triangle culling granularity!

Also, if one looks at the bandwidth needed for the two solutions, it's not clear where the tipping point is. In both cases one has to go through all the vertex data, but in one case we emit triangle data per tile, in the other we write a compressed z-buffer/early-z-buffer.
Clearly as triangles get denser and denser, there is a point where using the z-buffer will result in less bandwidth use!

Moreover, as this is a software implementation, we could always decide for different trade-offs, and avoid doing a full depth pass but just heuristically selecting a few occluders, or reprojecting previous-frame Z and so on.

Lastly I imagine that there are some trade-offs between area, power and wall-time.
If you care about optimizing for power and are not limited much by the chip area, then building in the chip some smarts to avoid accessing memory looks very interesting.
If you only care about doing things as fast as possible then you might want to dedicate all the area to processing power and even if you waste some bandwidth that might be ok if you are good at latency hiding...
Of course that wasted bandwitdh will cost power (and heat) but you might not see the performance implications if you had other work for your compute units to do while waiting for memory.

Conclusions.

I don't quite know enough about this to say anything too intelligent. I guess that as we're seeing tiled hardware in mobiles but not on the high-end, and vice-versa, tiled might excel at saving power but not at pure wall-clock performance versus simpler architectures that use all the area for computational units and latency hiding.

Round-tripping geometry to main RAM seems to be outrageously wasteful, but if you want perfect culling you have to compare with a full-z prepass which reads geometry data twice, and things start looking a bit more even. 

Moreover, even with immediate rendering, it's not that you can really pump a lot of vertex attributes and not suffer, these want to stay on chip (and are sometimes even redistributed in tile-like patterns) so practically you are quite limited before you start stalling your pixel-shaders because you're running out of parameter space...

Amplification, via tessellation or instancing though can save lots of data for an immediate renderer, and the second pass as noted before can be quite aggressively culled and in an immediate renderer allows to balance in software how much one wants to pay for culling quality, so doing the math is not easy at all.

The truth is that for almost any rendering algorithm and rendering hardware, there are ways to reach great utilization, and I doubt that if one looked at that the two architectures were very far apart when fed appropriate workloads. 
Often it's not a matter of what can be done as there are always ways to make things work, but how easily is to achieve a given result.

And in the end it might even be that things are they way they are because of the expertise and legacy designs of the companies involved, rather than objective data. Or that things are hard to change due to myriads of patents, or likely a bit of all these reasons...

But it's interesting to think of how TBDR could change the software side of the equation. Perfect culling and per-tile fast memory would allow some cute tricks, especially in a console where we could have full exposure of the underlying hardware... Could be fun.

What do you think?

Post Scriptum.

Many are mentioning NVidia's tiled solution, and AMD has something similar as well now. I didn't talk about these because they seem to be in the end "just" another way to save rendertarget bandwidth.
I don't know if they even help with culling (I think not for NVidia, while AMD mentions they can do pixel-shading after a whole tile batch has been processed) but certainly they don't allow to split rendering passes more efficiently via an on-chip scratch, which to me (on the software side of things...) is the most interesting delta of TBDR. 

Of course you could argue that tiles-as-a-cache instead of tiles-as-a-scratch might still save enough BW, and latency-hide the rest, that in practice it allows to do deferred for "free". Hard to say, and in general blending units always had some degree of caching...

Lastly, with these hybrid rasters, if they clip triangles/waves at tile boundaries (if), one could still in theory get some improvements in e.g. F+ methods, but it's questionable because the tile sizes used seem too big to allow for the light/attribute screenspace structures of a F+ renderer to match the hardware tile size.

External Links.

Apple's "GPU family 4" - notice the "imageblocks" section

21 July, 2017

A programmer's sightseeing tour: Machine Learning and Deep Neural Networks (slides)

Made some slides for a lunch'n'learn following the content of my previous blog posts on machine learning and deep neural networks.


Unfortunately, had to censor practically all of the last part which was talking about some applications which we haven't yet published or might have details we didn't publish (and yes, I could have just taken them out, but it was fun to paint black splotches instead).

Also, when I do these things I tend to spice things up by trying new tools and styles (often the same for one-man programming projects). This time I made the slides (in one day) using Paper on an iPad Pro, was quite fun!

Enjoy!

20 May, 2017

The technical interview.

I like interviewing people. I like going to interview. This is how I do it.

1 - Have your objectives clear.

There is no consensus on what an interview is for. Companies work in different ways, teams have different cultures, we have to accept this, there is no "one true way" of making software, of managing creative teams and of building them.

Some companies hire very slowly, very senior staff. Others hire more junior positions more often. Some companies are highly technical, others care more about creativity and the ability for everyone to contribute to the product design. Some prefer hyper-specialized experts, others want to have only generalists and so on.

You could craft an interview process that is aimed at absolutely having zero false positives, probing your candidates until you know them more than they know themselves, and accept to have a high rate of false negatives (and thus piss some candidates) because the company is anyhow very slow in creating new openings.
Or you could argue that a strength of the company is to give a chance to candidates that would otherwise be overlooked, and thus "scout" gems.

Regardless, this means that you have to understand what your objectives are, what the company is looking for and what specifically your role in the overall interview process. 
And that has to be tailored with the specific opening, the goals in an interview for a junior role are obviously not the same as the goals you have when interviewing a senior one.

Personally, I tend to focus on the technical part of the interview process, that's both my strongest suit and also I often interview on the behalf of other teams to provide an extra opinion. So, I don't focus on cultural fit and other aspects of the team dynamics.

Having your objectives clear won't just inform you on what to ask and how, but also how to evaluate the resulting conversations. Ideally one should always have an interview rubric, a chart that clearly defines what is that we are looking for and what traits correspond to what levels of demonstrated competence.

It's hard to make good standardized interviews as the easiest way to standardize a test is to dumb it down to something that has very rigid, predefined answers, but that doesn't mean that we should not strive to engineer and structure the interview in a way that makes it as standard, repeatable and unbiased as possible.

2 - What (usually) I care about.

Almost all my technical interviews have three objectives:
  • Making sure that the candidate fulfills the basic requirements of the position.
  • Understanding the qualities of the candidate: strengths and weaknesses. What kind of engineer I'm dealing with.
  • Not being boring.
This is true for basically any opening, what changes are the specific things I look for.

For a junior role, my objective is usually to hire someone that, once we account for the time spent by the senior staff for mentoring, still manages to create positive value, that doesn't end up being a net loss for the team.

In this case, I tend not to care much about previous experience. If you have something on your resume that looks interesting it might be a good starting point for discussion, but for the most part, the "requirement" part is aimed at making sure the candidate knows enough of the basics to be able to learn the specifics of the job effectively.

What university where you in (if any), what year, what scores, what projects you did, none of these things matter (much) because I want to directly assess the candidate's abilities.

When it comes to the qualities a junior should demonstrate, I look for two characteristics. First: its' ability to unlearn, to be flexible. 
Some junior engineers tend (and I certainly did back in my times) to misjudge their expertise, coming from high education you get the impression that you actually know things, not that you were merely given the starting points to really learn. That disconnect is dangerous because inordinate amounts of time can be spent trying to change someone's mindset.

The second characteristic is, of course, to be eager and passionate about the field, trying to finding someone that really wants to be there, wants to learn, experiment, grow.

For a senior candidate, I still will (almost) always have a "due diligence" part, unless it would truly be ridiculous because I do truly -know- already that the candidate fulfills it. 
I find this to be important because in practice I've seen how just looking at the resume and discussing past experience does not paint the whole picture. I think I can say that with good certainty. In particular, I've found that:
  • Some people are better than others at communicating their achievements and selling their work. Some people might be brilliant but "shy" while others might be almost useless but managed to stay among the right people and absorb enough to present a compelling case.
  • With senior candidates, it's hard to understand how "hands on" one still is. Some people can talk in great detail about stuff they didn't directly build.
  • Some people are more afraid of delving into details and breaking confidentiality than others.
  • Companies have wildly different expectations in given roles. In some, even seniors do not daily concern themselves with "lower level" programming, in others, even juniors are able to e.g. write GPU assembly.
When it comes to the qualities of a senior engineer, things are much more varied than juniors. My aim is not to fit necessarily a given stereotype, but to gain a clear understanding, to then see if the candidate can fit somewhere.
What are the main interests? Research? Management? Workflows? Low-level coding? The tools and infrastructure, or the end-product itself?

Lastly, there is the third point: "not be boring", and for me, it's one of the most interesting aspects. To me, that means two things:
  • Respect the candidate's time.
  • Be aware that an interview is always a bi-directional communication. The interviewer is under exam too!
This is entirely an outcome that is guided by the interviewer's behavior. Nonetheless, it's a fundamental objective of the interview process: you have to demonstrate that you, and your company, are smart and thoughtful.

You have to assume that any good candidate will have many opportunities for employment. Why would they chose you, if you wasted their time, made the process annoying, and didn't demonstrate that you put effort into its design? What would that telegraph about the quality of the other engineers that were hired through it?

You have to be constantly aware of what you're doing, and what you're projecting.

3 - The technical questions.

I won't delve into details of what I do ask, both because I don't want to, and because I don't think it would be of too much value, being questions catered to the specific roles I interview for. But I do want to talk about the process of finding good technical questions, and what I consider to be the main qualities of good questions.

First of all, always, ALWAYS use your own questions and make them relevant to the job and position.  This is the only principle I dare to say is universal, regardless of your company and objectives, should be adhered to.

There is no worse thing than being asked the same lazy, idiotic question taken from the list of the "ten most common questions in programming interviews" off the internet.

It's an absolute sin. 
It both signals that you care so little about the process that just recycled some textbook bullshit, and it's ineffective because some people will just know the answer and be able to recite it with zero thought.
That will be a total waste of time, it's not even good for "due diligence" because people that just prepared and learned the answer to that specific question are not necessarily really knowledgeable in the field you're trying to assess.

The "trick" I use to find such questions is to just be aware during my job. You will face time to time interesting, general, small problems that are good candidates to become interview questions. Just be aware not to bias things too much to reflect your specific area of expertise.

Second, always ask questions that are simple to answer but hard to master. The best question is a conversation starter, something that allows many follow-ups in many different directions.

This serves two purposes. First, it allows people to be more "at ease", relaxed as they don't immediately face an impossibly convoluted problem. Second, it allows you not to waste time, as you established a problem setting, asking a couple follow-ups is faster than asking two separate questions that need to be set-up from scratch.

Absolutely terrible are questions that are either too easy or too hard and left there. On one extreme you'll get no information and communicate that your interview is not aimed at selecting good candidates, on the other you will frustrate the candidate which might start closing up and missing the answers even when you scale back the difficulty.

Also, absolutely avoid trick questions: problems that seem should have some very smart solution but really do not. Some candidates will be stuck thinking there ought to be more to the question and be unable to just dare try the "obvious" answer.

Third, avoid tailoring too much to your knowledge and expectations. There are undoubtedly things that are to be known for a given role, that are necessary to be productive and that are to be expected from someone that actually did certain things in the past. But there are also lots of areas where we have holes and others are experts.

Unfortunately, we can't really ask questions about things we don't know, so we have to choose among our areas of expertise. But we should avoid specializing too much, thinking that what we know is universal.

A good trick to sidestep this problem (other than knowing everything) is to have enough options that is possible to discuss and find an area where the both you and your candidate have experience. Instead of having a single, fixed checklist, have a good pool of options so you can steer things around.
This also allows you to make the interview more interactive, like a discussion where questions seem a natural product of a chat about the candidate's experience, instead of being a rigid quiz or checklist.

Lastly, make sure that each question has a purpose. Optimize for signal to noise. There are lots of questions that could be asked and little time to ask them. A good question leaves you reasonably informed that a given area has been covered. You might want to ask one or two more, but you should not need tens of questions just to probe a single area.

This can also be achieved, again, by making sure that your questions are steerable, so if you're satisfied e.g. with the insight you gained about the candidate's mastery of low-level programming and optimization, you could steer the next question towards more algorithmic concerns instead.

Conclusions.

I do not think there is a single good way of performing interviews. Some people will say that technical questions are altogether bad and to be avoided, some will have strong feelings about the presence of whiteboards in the room or computers, of leaving candidates alone solving problems or being always there chatting.

I have no particular feeling about any of these subjects. I do ask technical questions because of the issues discussed above, and for the record, I will draw things on a whiteboard and occasionally have people collaborating on it, if it's the easier way to explain a concept or a solution.

I do not use computers because I do not ask anything that requires real coding. The only exception is for internships, due to the large number of candidates I do pre-screen with an at-home test. That's always real coding, candidates are tasked to make some changes to a small, and hopefully fun and interesting codebase, which either I make ad-hoc from scratch or extract from suitable real-world programs).

I do not think any of this matters that much. What it really matters, is to take time, study, and design a process deliberately. In the real world the worst interviews I've seen were never because they chose a tool or another, but because no real thinking was behind them.