Writing Quick Code in C++, Quickly

Play Writing Quick Code in C++, Quickly
Sign in to queue


Contemporary computer architectures make it possible for slow code to work reasonably well. They also make it difficult to write really fast code that exploits the CPU amenities to their fullest. And the smart money is on fast code—we’re running out of cool things to do with slow code, and the battle will be on doing really interesting and challenging things at the envelope of what the computing fabric endures.

So how to write quick code, quickly? Turns out it’s quite difficult because today’s complex architectures defy simple rules to be applied everywhere. It is not uncommon that innocuous high-level artifacts have a surprisingly high impact on the bottom line of an application’s run time (and power consumed).

This talk is an attempt to set forth a few pieces of tactical advice for writing quick code in C++. Applying these is not guaranteed to produce optimal code, but is likely to put it reasonably within the ballpark. 

These tips are based on practical experience but also motivated by the inner workings of modern CPUs.










Right click to download this episode


The Discussion

  • User profile image

    I hope this talk has some commonality with "Writing Fast Code I,II" from C++ and Beyond 2012, because video of these talks were not published here before. Smiley

  • User profile image

    I and II

  • User profile image

    You can find similar video... 3 optimization tips for c++

  • User profile image
    Christian Semmler

    Once again, great talk and insight!


    sizeof... actually has a couple more nice use cases I believe. For example, if you want to generate a type string (to pass through a C interface), you may do it like this

    template<typename... Types>
    struct TypeString {
    static constexpr char value[sizeof...(Types) + 1] = {

  • User profile image
    Elron A Yellin

    You can have the best of both worlds-- pretty api and an option to avoid extra allocations--

    string nextLine(istream&, string&& = string());

    // calling once, or when allocation cost doesn't matter
    auto line = nextLine(strm);

    // calling in a loop
    string line;
    for (...)
    line = nextLine(strm, std::move(line));

  • User profile image
    Alessandro Stamatto

    Nice solution Elron,

    Wouldn't a pass by traditional reference (istream&,string&) be a tiny bit faster than a move in this case? Or is is the same thing?

  • User profile image
    Elron A Yellin

    the default arg ("string()") is a temporary, ie, an rvalue, so the parameter can't be a string&

    Alternatively, we could just write a wrapper/overload, but that's a lot typing and repetition just to save two moves (for the case where we've already decided we don't care about efficiency)
    void nextLine(stream&, string&);
    string nextLine(istream& strm)
    string str;
    nextLine(strm, str);
    return str;

  • User profile image

    Isn't slide 27 supposed to have vtbl[tag][0] and vtbl[tag][1] instead of vtbl[0][tag] and vtbl[1][tag]?

  • User profile image
    Colin McEwan

    Oh dear god.

    From about 16:00 onwards I wanted to kill things.

    Constructing complex C++ template code to hack around the *lack of particular peephole optimisations* in the particular compiler he happens to be using.


  • User profile image

    @Andrea: I think that code is OK but vtbl declaration should be different

    static FP vtbl[totalMethods][totalClasses];

    Multi-dimensional arrays are difficult Wink

  • User profile image

    I agree Colin, he could at least have tried LLVM ;-)

  • User profile image

    @Colin McEwan: I think the point of that exercise was not to do peephole optimization but to provide portable semantics and missing features (like whole-storage assignment). I would not do anything like that myself but for someone who cares about 0.1% performance increase AND is forced to use bit-fields this may be an option.

  • User profile image

    I have used that pattern as well, and noticed that in the case of

    auto line = nextLine(strm);

    it can actually be *faster* than the trivial "string nextLine(istream&)" version as some of the exception handling / stack unwinding code can be moved out of the inner function by the compiler.

Add Your 2 Cents