Search this blog

26 February, 2019

C++, it’s not you. It’s me.

How I learned not to worry and tolerate C++

If you follow the twitter-verse (ok, and you happen to be in the same small circle of grumpy gamedevs that forms my bubble) you might have noticed lately a rise of rage and sarcasm against C++ and the direction it's taking.

I don't want to post all the relevant bits, but the crux of the issue, for the lucky among you who don't do social media, is the growing disconnect between people woking on big, complex, performance-sensitive and often monolithic and legacy-ridden codebases that we find in game development, and the ideas of "modernity” of the C++ standard community.

Our use-case is perhaps peculiar. We maintain large codebases, with large teams, but we never did great at modularization. This is our fault, and I’m not sure why it happened. Maybe it’s a combination of factors, including certain platforms and compilers not working well with dynamic libraries, or performance concerns. But most likely, it’s also the product of a creative environment where experimentation is a necessity, planning is inherently hard and architecture work is often simply neglected due to other production pressures.

The (AAA) gamedev use-case

Whatever are the historical reasons, the reality is that we live in an environment where most often than not:
  • We don’t care about the STL, not its performance improvements. We developed our own bespoke containers, both because we need very ad-hoc algorithms tuned to specific problem sizes, and because of design issues of the STL itself which made it impossible to use (e.g. its historical reluctance to play well with things like memory alignment).
  • We do care about all declinations of “performance”. Not only the final, all-optimizations enabled, code generation, but also performance in debug builds, and compiler performance. Iteration times.
  • We care about being able to reason about code. Simplicity, not counted as lines of code, but as the ability to clearly understand what a line of code does.   We often would prefer verbose, even in the eyes of some more arcane code, to unpredictable code, where to understand the relation between what we wrote and what happens requires a lot of context, of “global” information.

Given this scenario, it should be clear to see how most of the C++ additions since C++11 have gone in the “wrong” direction. They are typically not adding any expressive power for our use-cases, but instead, bring lots of complexity to an already almost impossible-to-master language. Complexity that also “trickles” down to tooling, compilers, compile times, debuggers and so on.

It’s hard to overstate how bad this chasm is growing, with some direction being truly infuriating, but let me just bring a concrete example. Take the modernization of STL containers, say r-value references, initializer lists and all the ecosystem around that. Clearly, a huge feature for C++, allowing even significant savings for certain uses of the STL. And what you would call “a zero cost abstraction” - if you don’t use it, you don’t see it. Everyone’s happy, right?

Quite the contrary! The prime example of something that is entirely useless for people who already know about the cost of constructors, temporaries and the like, designed their code thinking of how to layout and transform bits, instead of higher level concepts sketched in a UML diagram.
Useless but dangerous, as these concepts increased the complexity of the language exponentially, to the point that very few can really claim to understand all their nuances, and yet are still hard to entirely avoid in projects made by lots of people, and where you do not necessarily even have the control of all the code due to external libraries.

And when all this humongous effort is taken on one side, features that would truly help performance sensitive large “system” programming, are still completely ignored. C++ still doesn’t have a “restrict” keyword. Strict aliasing is incredibly troublesome and even getting worse with certain proposals. Threads and memory alignment were implemented first (and arguably better) in the good old C.
We still don’t have vector types, not to mention fancier features like the ability of “transposing” the memory layout of arrays of structures (into SOAs or AOSOA and so on). We don’t have anything to help tooling, or compile times, like standardized reflection, serialization or proper modules. And we keep adding metaprogramming (templates) features before tacking complexity and usability issues (e.g. concepts, that now are scheduled to be in C++20).

What’s C++ anyway?

I think part of the disconnect, and even anger in the community, is due to a misinterpretation of what C++ is and always has been.
There is this myth that C++ is a low-level, high-performance/system programming language. It’s false, and it has always been false.

Clearly, a language that didn’t bother until recently to implement threads, is not a language for high-performance anything. There is nothing in C++ that concerns with system programming either, that is the C part of it, really. And so on and so forth, the more you look the more evident that truth is. Even the ancient Fortran can say it’s more concerned with performance than C++. C++ is not ISPC, nor Cuda. It doesn’t come with algorithms and data structure for low-latency, constrained memory use-cases, neither does care about large-scale data cases. Even python can be seen as a better language for high-performance computation, due to its ability to quickly implement embedded-DSLs. Nowadays, a lot of high-performance code leverages specialized compilers that use are embedded in relatively high-level languages via reflection. None of that is possible in C++.

C++ is a “zero cost abstraction” language. That part is true. But “zero cost” is not about performance. In fact, the guarantee that something won’t cost you anything when it’s not used doesn’t really mean that it will be fast when you do use it! What it gives, instead, is peace-of-mind. Bjarne said, do you like C? And its ecosystem? Great! Use this one, I guarantee I won’t screw with C, and if you like anything else I added, you get to use it.

And don’t get me wrong. This is genius. And a fundamental lesson that so many other languages still today don’t understand. Making a nice language is the easy part. If you’re even just a tourist of computer languages, have been exposed with concepts from veterans like Lisp and ML and so on, it’s not that hard to come up with a perfectly pleasant little language. And in fact, we have so, so many of them today. The hard part is selling such language! And if you don’t have a community with a strong need that nobody but you serve, that means you have to persuade people to hop over from whatever what they’re currently using. That is almost impossible.

Bjarne understood that the ecosystem is what matters most and C++ succeeded by being a drop-in replacement for C. Unfortunately now C++ is so complex, that creating a new language that seamlessly can work and replace it is a tall order, but it would also be an incredibly powerful tool for adoption.

Why things changed and what can we do about it?

Bjarne got marketing right, he probably understood people more than languages, which is not an insult, to the contrary! People are all that matters.

But if this is the premise, it wouldn’t even be surprising that C++ is drifting away from systems programming, from what C programmers care about. Nowadays, these use-cases are a minority. 
It is not unreasonable, from that point of view, to now try to appease to the cool kids writing web stuff. People who might be using python, or java, or go and so on. Programmers who are accustomed to working fast, gluing together frameworks and libraries more than they write any bespoke algorithm and data structure. In fact, when you think about it, adding OO to C was just what was trendiest at the time. It was a hype and marketing based decision, not really a smart one, as we now maybe see more clearly as the OO hype dissipates.

But I don’t even think it’s necessarily a conscious design decision. This language is huge today. Its community its huge, and it decided to go the way of the design by committee. Should you never do that? In my circles, the answer is definitely no, design by committee is the death of technology. But let’s present a more positive way of looking at it.

We know that democracy has a cost, a huge cost. It’s definitely not the most efficient way of governing, nor it produces the most brilliant decisions. It can be in fact, incredibly dumb. This is not news, even in the Roman Republic there were provisions for senators to elect a dictator for a temporary period when strong leadership was deemed necessary. Of course, the risk is for a dictator to become a tyrant, as Romans learned. So, democracies trade efficiency for risk aversion, variance for mean.

And that is what C++ is today. It’s an old language, even if it wants to play cool, it fundamentally is in maintenance mode, listening to a lot of people with a lot of ideas and going in the direction of the majority, not of strong design decisions. Sometimes people argue that the solution would be to have more representation of certain use cases in the committee. 
Perhaps a bit would help, but for how much I do like politics, I really don’t care about language ones. If I could vote on something, I would vote to remove people from the committee, make it a smaller one, not a noisier and bigger one, even if the noise was to argue “in my favor”. I simply don’t think you can have a lot of compromise in technology without ending up with something mediocre.

So? So we live with C++. Not because we like it, but because we need the ecosystem. The compilers, the IDEs, the language extensions, the low-level intrinsics, the legacy code. We restrict our usage to mostly C, and grab whatever rare new feature happens to integrate decently in our workflows. Maybe one day some new language will come, we already use bits and pieces of other ones when needed.

And maybe someday someone will learn the real lesson Bjarne had to teach, and really kill it! There is actually no reason for example that a language could not compile its own syntax in its files, but also allow to include C++ headers. Yes, you’d have to suffer and pay the pain of integrating a C++ compiler in whatever you come up with, and yes, your language would most probably just be something that expands to C++. Exactly how C++ started on top of C. But it’s unsexy, boring work, so most people will write yet another C-ish thing with a bit of ML in it on top of LLVM and call it a day...

Addendum: "Alex" asked an interesting question in the comments - why don't we make our own. I tried to answer that as well if you have a look below! 👇 

18 November, 2017

"Coder" color palettes for data visualization

Too often when programmers want to visualize data (which they should do often!), we simply resort to so called "coder-colors", encoding values directly into RGB channels (e.g. R = data1, G = data2 ...) without much consideration.

This is unfortunate, because it can both significatively distort the data, rendering it in a non perceptually linear fashion and biasing certain data columns to be more important than others (e.g. the blue channel is much less bright than the green one), and make the visualization less clear as we leverage only one color characteristic (brightness) to map the data.

The idea here is to build easy to use palette approximations for data visualization that can be coded as C/Java/Shader/etc... functions and replace "coder colors" with minimal effort.

Features we're looking for:

  • Perceptual linearity 
    • The palette steps should be equal in JND units
    • We could prove this by projecting the palette in color space made for appearance modeling (e.g. CIELAB) and looking at the gradient there. 
  • Good range 
    • We want to use not just brightness, but color variations as well.
    • We could even follow curved paths in a perceptually linear color space, we are not restricted to straight lines..
    • The objective is to be able to clearly distinguish >10 steps.
  • Intuitive for the task at hand, legible
    • E.g sequential data (0...1) versus diverging or categorical data (-1...1).
  • Colorblind aware
    • The encoding should primarily rely on brightness variation, color variation should be used only to try to increment the range/contrast and using colorblind safe colors.
Now, before I dump some code, I have to disclaim that albeit I tried to follow the principles listed above, I don't claim I am absolutely confident in the end results... Color appearance modelling is quite hard in practice, it depends on the viewing environment and the overall image being displayed, and there are many different color spaces that can be used.


The following palettes were done mostly by using CIELAB ramps and/or looking at well-known color combinations used in data visualization. 
The code below is GLSL, but I avoided on purpose to use GLSL vectors so it's trivial to copy and paste in C/Java/whatever else...

One-dimensional data.

vec3 ColorFn1D (float x)
{
x = clamp (x, 0.0, 1.0);
float r = -0.121 + 0.893 * x + 0.276 * sin (1.94 - 5.69 * x);
float g = 0.07 + 0.947 * x;
float b = 0.107 + (1.5 - 1.22 * x) * x;
return vec3 (r, g, b);
}

This palette is similar to R's "Viridis", even if it wasn't derived from the same data. You can notice the sine in one of the channels, it's not unusual for most of these palettes to be well approximated using sine waves because the most straightforward way to derive a brighness-hue-saturation perceptual color space is to use cylindrical transforms of color spaces that are rotated so one axis represents brightness, and the other two are color components (e.g. that's how CIELAB works with the related cylindrical transforms like CIELCH and HSLUV)

Palette, example use and sRGB plot

Note how the palette avoids stretching to pure black. This is wise both because the bottom range of sRGB is not great in terms of perceptual uniformity, and because lots of output devices won't do particularly great when dealing with blacks.

One-dimensional data, diverging.

vec3 ColorFn1Ddiv (float y)
{
y = clamp (y, -1.0, 1.0);
#if 0
float r = 0.569 + (0.396 + 0.834 * y) * sin (2.15 + 0.93 * y);
float g = 0.911 + (-0.06 - 0.863 * y) * sin (0.181 + 1.3 * y);
float b = 0.939 + (-0.309 - 0.705 * y) * sin (0.125 + 2.18 * y);
#else
float r = 0.484 + (0.432 - 0.104 * y) * sin(1.29 + 2.53*y);
float g = 0.334 + (0.585 + 0.00332 * y) * sin(1.82 + 1.95*y);
float b = 0.517 + (0.406 - 0.0348 * y) * sin(1.23 + 2.49*y);
#endif
return vec3 (r, g, b);
}

Palette, example use and sRGB plot

One-dimensional data, two categories.

Essentially, one dimensional data + a flag. It choses between two palettes that are designed to be similar in brightness but always quite easy to distinguish, at any brightness level.

vec3 ColorFn1DtwoC (float x, int c)
{
x = clamp (x, 0.0, 1.0);
float r, g, b;
if (c == 0)
{
r = max (0.0, -0.724 + (2.52 - 0.865*x)*x);
g = 0.315 + 0.589*x;
b = x > 0.464 ? (0.302*x + 0.641) : (1.27*x + 0.191);
}
else
{
r = 0.539 + (1.39 - 0.965 * x) * x;
g = max (0.0, -0.5 + (2.31 - 0.878*x)*x);
b = 0.142 + 0.539*x*x*x;
}
return vec3 (r, g, b);
}

Two examples, varying the category at different spatial frequencies
and the two palettes in isolation.

These palettes can't go too dark or too bright, because otherwise it won't be easy to distinguish colors anymore.
The following is a (very experimental) version which supports up to five different categories:

vec3 ColorFn1DfiveC (float x, int c)
{
x = clamp (x, 0.0, 1.0);
float r, g, b;
switch (c)
{
case 1 :
r = 0.22 + 0.71*x; g = 0.036 + 0.95*x; b = 0.5 + 0.49*x;
break;

case 2 :
g = 0.1 + 0.8*x;
r = 0.48 + x * (1.7 + (-1.8 + 0.56 * x) * x);
b = x * (-0.21 + x);
break;

case 3 :
g = 0.33 + 0.69*x; b = 0.059 + 0.78*x;
r = x * (-0.21 + (2.6 - 1.5 * x) * x);
break;

case 4 :
g = 0.22 + 0.75*x;
r = 0.033 + x * (-0.35 + (2.7 - 1.5 * x) * x);
b = 0.45 + (0.97 - 0.46 * x) * x;
break;

default :
r = g = b = 0.025 + 0.96*x;
}
return vec3 (r, g, b);
}

Two dimensions

Making a palette to map two dimensional data to color is not easy, really depends on what we're going to use it for. 

The following code implements a variant on the straightforward mapping of the two data channels to red and green, designed to be more perceptually linear.

vec3 ColorFn2D (float x, float y)
{
x = clamp (x, 0.0, 1.0);
y = clamp (y, 0.0, 1.0);

// Optional: gamma remapping step
x = x < 0.0433 ? 1.37 * x : x * (0.194 * x + 0.773) + 0.0254;
y = y < 0.0433 ? 1.37 * y : y * (0.194 * y + 0.773) + 0.0254;

float r = x;
float g = 0.6 * y;
float b = 0.0;

return vec3 (r, g, b);
}

Two-channel mapping and example use contrasted with naive
red-green direct mapping (rightmost image)

As an example of a similar palette designed with a different goal, the following was made to highlight areas where the two data sources intersect, by shifting towards white (with the mapping done via the red and blue channels, primarily, instead of red and green).
Beware of how this one is used, because it could be easily misinterpreted for a conventional red-blue channel mapping as we're so accustomed to these kinds of direct mappings.

vec3 ColorFn2D (float x, float y)
{
x = clamp (x, 0.0, 1.0);
y = clamp (y, 0.0, 1.0);

float r = x;
float g = 0.5*(x + 0.6)*y;
float b = y;

return vec3 (r, g, b);
}

Another two-channel mapping and example use contrasted 
with naive red-blue direct mapping (rightmost image)

Lastly, a (very experimental) code snippets for two-dimensional data where one dimension is divergent:


vec3 ColorFn2Ddiv (float x, float div)
{
x = clamp (x, 0.0, 1.0);
div = clamp (div, -1.0, 1.0);

#if 0
div = div * 0.5 + 0.5;
float r1 = (0.0812 + (0.479 + 0.267) * x) * div;
float g1 = (0.216 + 0.407 * x) * div;
float b1 = (0.323 + 0.679 * x) * div;

div = 1.0 - div;
float r2 = (0.0399 + (0.391 + 0.196) * x) * div;
float g2 = (0.232 + 0.422 * x) * div;
float b2 = (0.0910 + (0.137 - 0.213) * x) * div;
    
return vec3(r1, g1, b1) + vec3(r2, g2, b2);
#else
float r = 0.651 + (-0.427 - 0.138*div) * sin(0.689 + 1.95*div);
float g = 0.713 + 0.107*div - 0.0565*div*div;
float b = 0.849 - 0.13*div - 0.233*div*div;
    
return vec3 (r, g, b) * (x * 0.7 + 0.3);
#endif
}

DataLog & TableLog


What:
  • A simple system to serialize lists of numbers. 

Why: 

  • Programmers should use visualization as an everyday tool when developing algorithms. 
    • Most times if you just look at the final results via some aggregate statistics, for non trivial code, you end up missing important details that could lead to better solutions. 
    • Visualize often and early. Visualize the dynamic behaviour of your code!
  • What I used to do for the most part is to printf() from C code times values in a simple csv format, or directly as Mathematica arrays.
    • Mathematica is great for visualization and often with a one-liner expression I can process and display the data I emitted. Often I even copy the Mathematica code to do so as a comment in the C source.
    • Sometimes I peek directly in the process memory...
  • This hack’n’slash approach is fine, but it starts to be very inconvenient when you need to dump a lot of data and/or if the data is generated by multiple threads or in different stages in the program.
    • Importing the data can be very slow as well!
  • Thus, I finally decided I needed a better serialization code...

Features:

  • Schema-less. Serializes arrays of numbers. Supports nested arrays, no need to know the array dimensions up-front. Can represent any structure.
  • Compact. Stores numbers, internally, in the smallest type that can contain them (from 8-bit integers to double-precision floating point). Decodes always as double, transparently.
  • Sample import code for Processing.
  • Can also serialize to CSV, Mathematica arrays and UBJSON (which Mathematica 11.x can import directly)
  • Multi-thread safe.
    • Automatically sorts and optionally collates together data streams coming from different threads.
  • Not too slow. Usable. I would probably rewrite it from scratch now that I understand what I can do better - but the current implementation is good enough that I don't care, and the interface is ok.
  • Absolutely NOT meant to be used as a "real" serialization format, everything is meant to be easy to drop in an existing codebase, zero dependencies, and get some data out quickly, to then be removed...

Bonus: "TableLog" (included in the same source)
  • A system for statistical aggregation, for when you really have lots of data...
  • ...or the problem is simple enough that you know what statistics to extract from the C code!
  • Represents a data table (rows, columns).
    • Each row should be an independent "item" or experiment.
    • Each column is a quantity to be measured of the given item.
    • Multiple samples (data values) can be "pushed" to given rows/columns.
    • Columns automatically compute statistics over samples.
    • Each column can aggregate a different number of samples.
    • Each column can be configured to compute different statistics: average, minimum, maximum, histograms of different sizes.
  • Multithread-safe.
    • Multiple threads can write to different rows...
    • ...or the same row can be "opened" globally across threads.
    • Columns can be added incrementally (but will appear in all rows).
--- Grab them here! ---

DataLog: C code - computing & exporting data


DataLog: Processing code - importing & visualizing data


TableLog: C code
TableLog: Data imported in Excel


More visualization examples...

22 October, 2017

The curse of success. Why nothing great is ever "good".

- The wedding.

A few weeks ago I flew back to Italy. My best friend was getting married, and I had to be his best man. 

The Amalfi coast is great for a wedding.

One night a few days before the wedding we were spending some time together when suddenly he starts getting phone calls from work. Some of the intranet infrastructure in this huge industrial oil and gas site he works in started failing, and even if he is not an IT expert, he happens to be the chief of maintenance for the entire site, and that entails also making sure their server room has everything it needs to keep functioning.

A few months back he commissioned a partial remodel of the room to improve the airflow, as they started experiencing some problems with the cooling efficiency of the air conditioners in the room. Fresh of that experience, immediately a fear dawns on his face: the servers are overheating and that’s why they started experiencing loss of functionality, started with emails.

He sends a technician in the room and his fears were confirmed: none of the four AC units are working, there are more than fifty degrees in the room. Two of them are completely off, the other two have their control panels on but the pumps are not working. To add insult to injury, they didn't receive any notification apparently because the email server was the first to fail. 
After instructing the technician to open all windows in the room, it’s decided that he has to go on site to follow the situation. And as I didn’t have much better to do, I followed...

What came after was a night of speculations, experiments, and deductions that you might see in an episode of House M.D., but applied to heavy industrial machinery. Quite interesting to see from the perspective of a software engineer, debugging problems in code is not quite as exciting...

In the end, the problem turned out to be that one of the phases of a tri-phase outlet was missing, and in the end the culprit was found: one cable in the power line went completely bust, possibly due to some slow decay process that has been going on for years, maybe triggered by some slight load imbalance, till in the end an electric arc sparked between two contacts and fried immediately the system.

Fried connectors.

The two units that appeared still to be powered on had their controls wired to the two working phases of the tri-phase, but even for these the pumps would not work because they require all three phases to be present.

4 am, we're back in the car going home and I was asking questions. Why things went the way they did? What could have been done to prevent downtime of an apparently critical piece of IT? Why was that piece of IT even that critical, apparently there was a mirror unit at another location. What is exactly that server room doing? It seems that obviously there would be better ways to handle all that...

And then it dawned on me - this huge industrial site has a ton of moving parts, at any given time there are many ongoing maintenance projects going on, even just monitoring them all is a challenge. Nobody knows everything about everything. Nothing is perfect, lots of things are not even good, in some ways it seems to be barely getting by, in others, it looks fairly sci-fi... You keep the machine going, you pick your battles. Certain things will rot, some stuff will be old and obsolete and wasteful, some other will be state of the art.

Which happens to be exactly how we make videogames. And software in general! I've never been in a game company where there weren't parts of the technology that were bad. Where people didn't have anything to complain about. 
Sometimes, or often even, we complain only because we easily accommodate to a given baseline, anything good becomes just the way things are, and anything bad stands out. 
But often times we have areas where things are just objectively terrible, old, primitive, cumbersome, slow, wasteful, rotten. And the more successful is the game, the more used is the engine, the bigger and better the end results, the more we risk let some parts fall behind.

- The best products are not made with the "best" tools in the "best" ways.

It's easy to understand how this "curse of success" takes place. Production is a monster that devours everything. Ed Catmull and Amy Wallace describe this in chapter seven of the excellent "Creativity Inc." talking of production pressures as the "hungry beast". When you're successful you can't stop, you can't break things and rebuild the world, there's less space for architecture and "proper" engineering.

People want what you're making, so you'll have to make more of it, add features, make things bigger and better; quicky all your resources are drained trying to chase that dragon. On the other hand, the alternative is worse: technology that is perfectly planned, perfectly executed, and perfectly useless.

Engineers and computer scientists are often ill-equipped to deal with this reality. We learn about mathematical truths, hard science, all our education deals with rigorous theoretical foundations, in an almost idealized, Platonic sense of beauty. In this world, there is always a perfect solution to a problem, demonstrably so, and the goal of the engineer is to achieve it.
The trivialities of dealing with people, team, and products are left to human resources or marketing.

Of course, that's completely wrong as there are only two kinds of technology: the kind that serves people, and the useless kind. But this doesn't mean there is no concept of quality in technology either! Not at all! But, we'll have to redefine the concept of "great technology" and "proper" engineering. Not about numbers, features, and algorithms, but about happiness and people: problems solved, results delivered, needs addressed...

The gorgeous gothic church of San Lorenzo Maggiore, a patchwork of styles

Great technology then seems not to be defined by how perfect and sparkly clean it is on the inside (even if sometimes that can be a mean for the goal) but by a few things that make it unique, and lots of hard work to keep everything in working order. 
If you are very aggressive in prioritizing the end product, inevitably how it's done, the internals, will suffer. 

But the alternative is clearly wrong, isn't it? If you prioritize technical concerns over end-user features, you're making something beautiful maybe, but useless. It's gradient diffusion: the farther you are from the output, the more your gradient vanishes.
The product and its needs are the ones that drive the gradient of change the most, the tools that are used to make the product are one step farther, they still need to adapt and be in good shape in order to accomodate the product needs, but the gradient is smaller, and so on, the tools that are made to make the tools for the products have an even smaller gradient until it vanishes and we deal with technology that we don't even care to write or ever change, it's just there for us (e.g. our workstation's OS, Visual Studio internals, what is Mathematica doing when I'm graphing something, how Outlook works etc...)

The thing that I happen to say most often nowadays when discussing clever ideas is "nice, but what problem it is solving, in practice, for people, today?".

The role of the engineer should then mostly be to understand what your product needs, what makes a difference, and how to engineer solutions to keep things going, how to prioritize your efforts when there are thousands of things that could be done, very little time to do them, and huge teams of people using your tools and technologies that can't stop making forward progress...
That also explains why in practice we never found a fixed recipe for any complex system: there always are a lot of different ways to reach success: engineering is not about trying to find -the- best solution, but managing the ride towards a good one.

- Unreasonable solutions to pragmatic goals.

All this though does not mean that we should be myopic, and just create things by getting precise measures of their effects, and thus optimize for the change that yields the biggest improvement. You can do well, certainly, by aggressively iterating upon your technology, polishing it, climbing the nearest hill. There is value to that, but to be truly excellent one has also to make space for doing crazy things, spending time chasing far-fetched ideas, down avenues that seem at first wasteful.

I myself work in a group that does a lot of research, thus lives by taking risks (if you're not scared, you're not doing research, by definition you're just implementing a solution). And most if not all of the research we do is very hard to correlate to either sales or user engagement. When we are lucky, we can prove some of our stuff saved some time (thus money) for production.

And at one point, one might even need to go explore areas of diminishing returns...

It's what in optimization is called the "exploration versus exploitation" tradeoff: sometimes we have to trust the fact that in order to achieve success we don't have to explicitly seek it, we have to stop explicitly looking at these measures. But that does not mean that the end goal stops to be very pragmatic!
What it means is that sometimes (sometimes) to be truly the best one has to consciously dedicate time to play around, to do things because they are interesting even if we can't prove they will lead anywhere. Know how to tame the production beast.

In practice, it's a tricky balance and a lot of exploration is something that not many teams can realistically achieve (while surviving). Great engineering is also about understanding these tradeoffs and navigating them -consciously-.