Writing Quick Code in C++, Quickly

Download this episode

Download Video


Contemporary computer architectures make it possible for slow code to work reasonably well. They also make it difficult to write really fast code that exploits the CPU amenities to their fullest. And the smart money is on fast code—we’re running out of cool things to do with slow code, and the battle will be on doing really interesting and challenging things at the envelope of what the computing fabric endures.

So how to write quick code, quickly? Turns out it’s quite difficult because today’s complex architectures defy simple rules to be applied everywhere. It is not uncommon that innocuous high-level artifacts have a surprisingly high impact on the bottom line of an application’s run time (and power consumed).

This talk is an attempt to set forth a few pieces of tactical advice for writing quick code in C++. Applying these is not guaranteed to produce optimal code, but is likely to put it reasonably within the ballpark. 

These tips are based on practical experience but also motivated by the inner workings of modern CPUs.










Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image

      I hope this talk has some commonality with "Writing Fast Code I,II" from C++ and Beyond 2012, because video of these talks were not published here before. Smiley

    • User profile image

      I and II

    • User profile image

      You can find similar video... 3 optimization tips for c++

    • User profile image
      Christian Semmler

      Once again, great talk and insight!


      sizeof... actually has a couple more nice use cases I believe. For example, if you want to generate a type string (to pass through a C interface), you may do it like this

      template<typename... Types>
      struct TypeString {
      static constexpr char value[sizeof...(Types) + 1] = {

    • User profile image
      Elron A Yellin

      You can have the best of both worlds-- pretty api and an option to avoid extra allocations--

      string nextLine(istream&, string&& = string());

      // calling once, or when allocation cost doesn't matter
      auto line = nextLine(strm);

      // calling in a loop
      string line;
      for (...)
      line = nextLine(strm, std::move(line));

    • User profile image
      Alessandro Stamatto

      Nice solution Elron,

      Wouldn't a pass by traditional reference (istream&,string&) be a tiny bit faster than a move in this case? Or is is the same thing?

    • User profile image
      Elron A Yellin

      the default arg ("string()") is a temporary, ie, an rvalue, so the parameter can't be a string&

      Alternatively, we could just write a wrapper/overload, but that's a lot typing and repetition just to save two moves (for the case where we've already decided we don't care about efficiency)
      void nextLine(stream&, string&);
      string nextLine(istream& strm)
      string str;
      nextLine(strm, str);
      return str;

    • User profile image

      Isn't slide 27 supposed to have vtbl[tag][0] and vtbl[tag][1] instead of vtbl[0][tag] and vtbl[1][tag]?

    • User profile image
      Colin McEwan

      Oh dear god.

      From about 16:00 onwards I wanted to kill things.

      Constructing complex C++ template code to hack around the *lack of particular peephole optimisations* in the particular compiler he happens to be using.


    • User profile image

      @Andrea: I think that code is OK but vtbl declaration should be different

      static FP vtbl[totalMethods][totalClasses];

      Multi-dimensional arrays are difficult Wink

    • User profile image

      I agree Colin, he could at least have tried LLVM ;-)

    • User profile image

      @Colin McEwan: I think the point of that exercise was not to do peephole optimization but to provide portable semantics and missing features (like whole-storage assignment). I would not do anything like that myself but for someone who cares about 0.1% performance increase AND is forced to use bit-fields this may be an option.

    • User profile image

      I have used that pattern as well, and noticed that in the case of

      auto line = nextLine(strm);

      it can actually be *faster* than the trivial "string nextLine(istream&)" version as some of the exception handling / stack unwinding code can be moved out of the inner function by the compiler.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.