Search this blog

08 June, 2008

C++ is premature optimization

UPDATE: further evidence here

When I started programming... well, I was a child, I had a commodore 64 and my programs were usually around thirty lines of code. But when I started doing code more seriously, C++ was not used at all for graphics. PC Demoscene was still using Pascal (in Borland's
Turbo dialect) and Assembly, C (usually using the Watcom compiler) was not the defacto standard yet. Actually I remember that I had to persuade the other coders in my demogroup to use DJGPP (a dos port of the gcc compiler) instead of Pascal (that I did not know anyway, I was using only assembly, C and powerbasic).

Those were times when you could still count how many cycles a routine took, how it was going to be executed by the two pipes (U and V, wow strikingly similar to what we now do with shader code, nice coincidence) of the Pentium processor, were you would expect a cache miss or a branch misprediction (without using a profiler, just looking at a given loop). And you could do a reasonable estimate of that even when coding in C, you could "see" the underlying machine code.

So I know why people love C++. I know why I love it too (when I love it :D).
It's not the best language in the world and we probably all know about that. But it gives us power. Not the kind of power you can feel when you solve a problem in two lines of code, that is something that scripting guys can enjoy (even if they are programming in something as horrible as Perl). Not even the power that you have when you're able to express an algorithm in an incredibly neat and elegant way (SML? OcaML? Haskell?). Nor the kind you feel when you manage to extend your language to give it new and powerfull programming idioms (Lisp? TCL?). Nothing like that, no.
It's the kind of power you get from being in control.

When you code in C++ you can easily see the equivalent C code, you know that virtual functions are equivalent to pointer to a struct of function pointers, you know that templates are glorified macros, etc... And well, nowdays C is our cross-platform version of Assembly, there's no point in using Assembly anyway (check the ID Quake and Doom sources to see how little asm was used even back then!) as we're not better than compilers, with our modern, and incredibly complex CPUs (until recently all the extra transistors we had for making new CPUs went into new and fancy ways of decoding and scheduling instructions, not into more raw computing power) we can't count cycles anymore, we can only give hints to the compiler, design for cache coherency and try to avoid branches.
Even PC hardcore demoscene coders do not use assembly for speed anymore, they use it for size (it's easy to predict the size of each instruction, thus is still possible to be in control of that). Isn't it the same with C++? How many could say to be able to choose the optimal choice of the functions to be inlined? Of the mutex locks to be placed? I guess that Java's hotspot can (or LLVM runtime optimizations etc...). Or if it does not yet, it COULD.
The only exception is SIMD code, our compilers are not very good with that yet...

And so I don't really want here to write down all the problems, design errors, limitations and such of C++. I wanted to, but I realized that I'm not the best person to ask about that.
There are plenty of books on how bad C++ is, and they are called C++ Coding Standards, Effective C++, Exceptional C++, even Design Patterns. All books full of solutions to C++ problems, they mostly talk about how to do some very simple things in C++ in the correct way or what to avoid to do when coding in C++ (most of the times with no exeptions, or with very rare exceptions). Of course, in a good language simple things should be natural to be done in the correct way, things that should not be done should not be possible at all, by default things should work in the most commonly used way etc...

We also have tools to check that we are not doing things in any bad way (and almost every project I've worked into did at least raise warnings as errors).
We don't want to code in C++, we try the best we can to restrict and extend C++ in such a way it's kinda suitable for development of our huge projects. C++ alone, is not, before we can start coding in it we have to ban certain (many!) practices, write libraries (memory allocation, serialization, attribute system - reflection, wrapping of compiler specific extensions like alignment stuff etc) and also write some tools (the most striking example are the "bulk" build ones, that make projects that do not compile our single C++ file, but use the preprocessor to concatenate them into larger one, to manage to get the huge link times caused by the C++ linking model, manageable). And all this only to make C++ usable, to overcome most of its problems, not to make it perfect, that stuff enables us to start coding our project, does not give us the expressive power that we have in other languages.

BUT I don't want to make my point againt C++ based on this, there are many wrong things there's plenty of evidence even if we don't look at the other languages for "cool" features to have. Enough of that (I promise!).

Most people that do still prefer C++ as a language (as opposed to practical considerations on why we still have to use it in our projects that is kinda a different matter) are not ignorant about those considerations (some are, but many are not). They simply like to be in control, no matter what. They need performance, more than anything else. Or they think they do. And they argue that being in control makes our code more performing. And here they are wrong.

We moved from Assembly to C and then to C++ for a reason. Our project were growing and we were not anymore able to control them in such low level languages. The same thing is happening now, we should seriously find alternatives to C++ and not only to be able to do our work while retaining mental sanity, but also because we care about performance. We are not able anymore to write good code and at the same time to dribble all the C++ shortcomings. Code reviews mostly care about catching coding standards infringements, that is, they try to let us stay into the path of our constrained/extended version of C++. They don't deal with the design much, and even less about performance.

Performance that does not mean bothering to replace i++ with ++i (not that's something that you shouldn't do, as it costs you nothing), but first and foremost algorithms, then data access design (cache coherency) and parallelism (multithreading). If we had always such care about those issues, then we could profile and locate a few functions that need to be optimized (SIMD, branchless stuff etc). But most of the times, we find hard even to get something to work. We are surrounded by bugs. We are improductive, even compiling requires too much time. And so we do bad design. I've seen a complex engine being too slow not because there was a single or a few functions that were slow, but because everything was too slow (mostly due to data and code cache misses, in other words, due to the high level design, or lack of "mature" optimizations). We laugh if some new language (with a new, and not so optimized compiler) reachs only 80% of the performance of optimized C++. But we generally don't write optimized C++. We are way under the 80% of an optimized implementation, in most of our code.

We are too much concerned about the small details, and the language quirks, unexpressiveness etc. We are using Assembly. We are optimizing prematurely. We are in control of something we can't control anymore. And it's truely the root of all evil. I would encourage you trying a few languages other than C++. See how more productive they are. See how they let you think about your problem and not about your code. How nice is to have a language that has types and objects, and happens to know about them too (reflection). C# and F# are a good start (but they are only the _start_).

P.S. Not that C++ as a language should be considered "faster" than many others. C++ compilers surely are incredibly good. But for example, as a language C# does have most of the features that enable us to write fast code in C++. The only things that are missing are, in my opinion, some stricter guarantees about memory locality, and the C++ const correctness (well, SIMD instrinsics, memory alignment etc are supported by C++ compilers, but not part of the C++ standard). But in return you get some others nice features that C++ lacks, like a moving garbage collector (that can move memory to optimize locality automatically), runtime optimizations (that the current microsoft .net compiler does not use as far as I know, but that are well possible given the language desing, and that are only barely mimicked by profile-guided optimizers for C/C++), and type safety (that avoids some aliasing issues that make C++ code difficoult to be optimized).

---
Notes: I like those two phrases from here:

...We tend to take this evil for granted, similar to how we take it for granted that people get killed every day by drunk drivers. Fatal accidents are just a fact of life, and you need to look both ways before you cross the street. Such is the life of a C++ or Java programmer...

...I'm also confident that regardless of which language you favored before reading this article, you still favor it now. Even if it's C++. It's hard to re-open a closed mind...

that site is actually incredibly cool, other good reads from there are this and this

13 comments:

Anonymous said...

"We laugh if some new language (with a new, and not so optimized compiler) reachs only 80% of the performance of optimized C++. But we generally don't write optimized C++. We are way under the 80% of an optimized implementation, in most of our code."

Yeah, but the thing is, you're not writing optimized C#/Lisp/Haskell either. And the really bad thing is that it's way easier to destroy performance by thinking in C#. I fear that in practice the gap is larger than 80%.

The new-wave languages teach you to look at the machine and the library as a black box. As a result, people who start with them don't understand anything about performance. Ask them to make a piece of code faster and they'll look for algorithmic solutions. Sure, Big-O optimizations come first, but many things follow. If you don't understand the machine and the standard library, you're stuck. Even if you do understand the machine, the new stuff makes it harder to see what's going on and to catch real performance problems that plague many applications, like memory allocation and useless copying.

C# is meant to be used in desktop applications where the bottleneck is the disk, network or DB server. Desktop applications are idle most of the time. They wait for user input, do some tiny processing, display the result and then go idle again. Even if the processing involves doing something interesting with the local CPU, 500 extra milliseconds will not matter, as the user will barely feel them. Optimizing this would be a major waste of time, regardless if the problem is algorithmic, memory handling or a slow VM trying to turn alien concepts into real machine code. On the other hand, wasting 500 milliseconds in a game is not going to go unnoticed.

Yes, premature optimization is evil. However, late optimization doesn't exist. I see all sorts of success stories about somebody writing a tic-tac-toe game sub-optimally and then optimizing it after getting the important stuff to work. The morale of these stories is always that you should leave optimizations at the end. Cute, but terribly wrong. This doesn't scale. There's no such thing as optimizing a large project at the end by fixing some spikes in the profile. You need to keep performance in mind throughout the project, otherwise it ends up just like you were saying above: nothing in particular is slow, but the whole thing is crawling. Sure, spending one month at the beginning of the project writing an ultra-optimized math library is a waste of time, since this is small and self-contained, so it can be done later if it actually proves to be a bottleneck. However, littering the code with containers that allocate once per element can't be optimized later. Bad ownership design which leads to useless memory copying can be "optimized" at the end by rewriting large parts of the code and there's no time for that, ever.

I don't think the new stuff is free of bad practices, especially bad performance practices. It's just that it's not used in high-performance code, so there was no need to come up with a list of things you shouldn't do. Some of the new languages have a prettier syntax than C++, so it's a bit harder to write horrible code, but there are style guides for them too. And, in the end, bugs are equally easy to program in all of them, if you are trying to accomplish the same thing.

Unknown said...

Again an interesting article that we could argue about endlessly.
As a fanatic C++ supporter I would recommend you to see the lecture that Stroustroup gave at the University of Waterloo or his book on the design and evolution of C++.
Sure, C# is very convenient, but for me it has a very narrow application domain. Saying that C++ is difficult or complex doesn't really scare me away from that language. The lack of some "power" features in C#, on the other hand, really pisses me off.

DEADC0DE said...

I want to make it clear (hopefully)

In the end what I'm arguing about is that writing huge projects in C++ is such a waste of time (like it was doing them in C or Assembly years ago) that we can't manage it anymore.

And being so expensive is not only a problem for our productivity, but in the end also for the performances, because we don't have anymore the time to think about them.

We care, but we don't have the time to care.

In every project I've seen things go slow not because programmers are ignorant, but because the "to do" list of possible "optimizations" that should be done and algorithms that could be tried always remains a to do list, never a done one.

So let's first try to have again control over our project complexity. Let's try to be productive again, even in a slightly less performing language (even if as I pointed out, most of the C++ alternatives are not slower, maybe now they have slightly slower compilers, but as languages there are no reasons they could not be as fast as C++).

The time we gain doing that could be spent again in optimizations. And in the end we might even well outperform our slow-code-in-a-fast-langauge implementations.

To Mihnea: I know that "mature" optimizations are fundamental, and that optimizations should not be done last. That's why I'm suggesting, let's be more productive, so we might early on focus on performance, and make our designs performance conscious. As of now, we are using a fast language, so our functions should be reasonably fast, and where they are not, we can use a profiler and optimize them. But our overall designs often are wrong, so having each function to be "fast" is not useful, the underlying idea is wrong, not the implementation.

This productivity vs performance tradeoff is not something so heretic to be proposed. Tim Sweeney himself said the same in his POPL talk (http://www.cs.princeton.edu/~dpw/popl/06/Tim-POPL.ppt)

<< When updating 10,000 objects at 60 FPS, everything is Performance-sensitive. But: Productivity is just as important. Will gladly sacrifice 10% of our performance for 10% higher productivity. We never use assembly language >>

Anonymous said...

But why would C# be more productive than C++ in games? I understand it being more productive when you have to do Windows GUI stuff. I understand it being more productive when all you do is piece together libraries, like you do with components in Delphi. But which language features are useful for games? Are we wasting that much productivity with "manual" memory management? How would using C# result in better design?

I agree that bad design happens, and that it's the worst kind of performance problem, because the only way out after painting yourself into a corner is a massive rewrite. I just don't see how C# would make a difference there, since bad design usually means bad people, not bad tools.

I can see how the exotic stuff (i.e. Haskell, CAML and the likes, not Java and C#) could make a difference, because the code is shorter and more concise. Being shorter, it's easier to modify it and it's easier to understand the system as a whole. However, this has perils of its own, like in the classic Haskell qsort example:

qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

Nice, clean, easy to understand, horribly slow. To do it properly you actually need more code than in C. Granted, this is not the norm, but it's something to look out for if these things go large-scale.

DEADC0DE said...

"...since bad design usually means bad people, not bad tools..."

That's really, really wrong, at least, that's my experience. That really is again what I was trying to explain, it is quite worth another post to make my opinion clear on that.

Bad design usually means that (smart) people had no time to refactor.

Incredibly skilled programmers happen to do incredibly bad designs, not because they were bad ideas from the start, but because things change, and they did not have the opportunity to change the design as well.

Things change, we gather information while we code, requirements change, you know, that's why we don't do monolithic designs but we iterate on code, refactor it, generalize only as needed etc.

Iteration is the key to good code. And to fast code as well.

Tools DO matter A LOT in making iterations faster, speaking of code, in making refactoring faster.

Language design itself can influence how easily we can refactor. Language design can influence how much time each iteration takes. Language design can also influence how easy is to make tools for making those tasks easier (refactoring tools, testing tools, etc etc etc)

C# is better than C++ because it has much smaller iteration times, for ALL those reasons.

DEADC0DE said...

Ah anyway I was pointing at F# and C# as _starting points_ of our search, not as the final solution.

So don't take my post as a C# vs C++ thing. I want to be clear about that. THAT POST IS NOT TALKING ABOUT C#

That said, if you want to know my opinion on it, I love C#, it's the only language with a commercial success that has almost everything I always wanted to have (almost, let's say, 70-80% of the things) and I think it's our best bet at the moment for a possible successor to C++.

It does fix all C++ design errors (it's a long list, some examples: proper generics, well defined parameter evaluation order, proper enums, it's more type safe, good memory management, better warnings, no #includes, decent compile times, etc etc...).

It has some nice syntactic sugar here and there, really well desinged and efficient.

It even has some really good language features, like proper first class functions and closures, anonymous methods, type inferencing, the whole LINQ thing... it's a LOT more powerful than Java for example, even if it has everything in a nice and simple C-like syntax, so it seems to be less powerful most of more exotic-to-c-programmers language.

It has proper reflection. And that alone is incredibly powerful. It lets you easily do a lot of things in runtime (serialization, inspection of all the objects on the heap, dynamic code loading, code generation, etc...).

It makes easy to write tools. Refactoring, code coverage, testing, static analysis, and tons of other stuff. Just have a look at the .Net Reflector and all its plugins!
I think that the whole "live coding" stuff (I've posted a blog article about that recently) could be done in pure C# in not too many lines of code...

To me, it's really close to being "C++ done right".

DEADC0DE said...

NOTE: Of course many of the features do have a cost, but isn't that obvious? If we need those features anyway chances are that trying to add them to a language that do not natively support them is going to be worse. And that's true even for C++, we know that RTTI is not so fast, but we know that we pay a cost only when we use it, and that it's still better than most home-made RTTI systems, if we need it (and we do) then it's better to use the one that our C++ compiler provides (generally). Also, compilers will get better and better, this happened for C++, it will happen for C# and for the other languages as well, it's the same story really, we are only at the start of a new cycle.

A nice article about the costs of the CLR is this one: http://blogs.msdn.com/ricom/archive/2005/02/22/378335.aspx

DEADC0DE said...

NOTE: of course you will find a lot of (usually bad) benchmarks of Java/C#/C++ around that will confirm that in the end, VM stuff can be fast and even faster than staticly compiled stuff.

i.e. jttp://www.idiom.com/~zilla/Computer/javaCbenchmark.html)

This stuff is so booooooooooring, this is not the point, at all, anyway.

Anonymous said...

"Bad design usually means that (smart) people had no time to refactor."

I can not agree. That implies that any design is bad at the beginning, and it can only be improved untill it becomes "good" through repeated iterations. While it's true that the design needs to be reviewed multiple times during the development, it's also true that skilled software designers are able to make it so that it's resilient to changes, or at least allowing them to be done in a reasonable easy way.

No, the problem is that too often "smart people" are really smart at implementing, or at reserching a new lighting model etc., but they are not good designer. That is an activity greatly underestimated by some developers, but it actually requires a lot of experience to be done right.

In the end, if your company has the right people doing the right jobs, then using C++ is less painful than the way you are depicting it.

Anyway, there are things where C++ is not the most appropriate language to use. Tools are a good example, and I don't see a reason not to use C# or the likes to build them. A videogame, anyway, is still out of range for such a young language, since there are (still) things that requires that extra power C++ grants ... and that is not changing too soon either, because as the platforms (PC, consoles) get better, we developers will always be demanding more power for more awesome effects and stuff, so we will still be struggling to gain back those damn milliseconds.

This, at least, in the near future. I think there will be a turning point when the language overhead wouldn't be significative anymore, but I also think it's not that time yet.

So, untill then, let's improve our design skills ;) you will see how much that is more beneficial than having the language cover for your lacks.

DEADC0DE said...

Marcus, you're reply is interesting and I think that many other programmers agree with you, so I want to reply to each of your observations:

First of all: "Bad design usually means that (smart) people had no time to refactor." means to me that designs usually _start_ good.

Or let me say it in a more correct way: even when your design starts good, eventually over time your code will rot, having no time to refactor is the main reason this rotten code remains in production.

Code does evolve. You start with a good design for a given task. Then people ask you to add more stuff, and they quite fit into your design, so you keep adding features to it. Those features are unexpected, no matter how generic your design was, they are unexpected because you did not know about them when you designed the system in the first place (that's why I think you should keep your design simple, not trying to overgeneralize it too much, because that will most of the times only bloat it, it's better to have a code that's easy to extend and modify, that one that tries to anticipate every possible future request). When a critical mass of those new features are added, eventually your design will turn from being optimal to not be optimal anymore and work is needed to evolve it. Quite often, this is not done.

Second point: I don't think that languages different than C++ require less design, or are easier to design, so having good designers will always be needed. It will be needed even more if you're not using C++, languages that offer more advanced constructs require better designers in order to use those constructs. C++ is quite simple, in its concepts, and quite arcane in the implementation of those simple concepts!

Smarter languages are needed to make our iteration times shorter, they are not magic spells that will turn an ignorant programmer into a great designer.

Last but not least, the whole point of the article is that I believe that with the right tools we can be less concerned about some details that we should not care of anyway, be more productive all around, and so be more focused into making our code run faster, first and foremost through better design. So I am as concerned as you about milliseconds (of course I am! I'm an "hardcore" graphics programmer, not a web designer), but I do think that a jump in the quality of our code will be made if we move from looking at the assembly to looking at the design (threads, memory locality, all that stuff that can't be patched into a program just modifying a few functions at the end of the project).

Anonymous said...

Angelo, you asked me for a comment. There were so many points and so many directions in your post that is is hard for me to provide meaningful comments with the little time I have available for blogging-related activities. That said, to comment just on the C++ bit, my stance is that C++ is a complete and utter abomination in so many ways and I would throw it away in a heartbeat if there existed a viable, more modern, complete "language package." Sadly, AFAIK, there is none. There are so many tools, libraries, middleware solutions, infrastructure needs, etc. that plain require C++ (in one way or another) that switching to a language that actually makes sense is not remotely viable from a business perspective (at least not in the game industry).

I very much enjoyed your statement that "There are plenty of books on how bad C++ is [...]." I think you very succinctly captured a large portion of what is wrong with C++ in a single sentence.

The reason I haven't (yet, at least) ranted about how shit C++ is on my blog is because it would be a bit like concluding that you shouldn't breathe the air because it is so badly polluted. Unfortunately, no matter how much we'd like to avoid the pollution, the reality is that we will have to breathe the C++ air for many many years to come! (Unless some large corporate backer "pulls a Java," except with a language that isn't shit, unlike Java/C++.)

Sorry, all the commentary I have time for; take it or leave it!

BugoTheCat said...

Nice article. Sometimes I see C/C++ people not caring about optimizing their code, thinking that there is no need because the compiler optimizes or our computers will be too fast in few years. And on the other side people can do impressive stuff even in Flash (and some things work different in speed in flash than in C and the curious mind or the performance freak will try to find this in any language or hardware he is going to code.

Also both premature and algorithm optimizations are essential. Personally I find algorithm optimizations more interesting and sometimes the system optimizations are trivial except if you are generating clever unrolled assembly speedcodes for a C64 or something :)

Panther said...

Objective-C all the way! Yeah!