C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 1 of 2

Download this episode

Download Video


Herb Sutter presents atomic<> Weapons, 1 of 2. This was filmed at C++ and Beyond 2012. As the title suggests, this is a two part series (given the depth of treatment and complexity of the subject matter).

Part 1 -> Optimizations, races, and the memory model; acquire and release ordering; mutexes vs. atomics vs. fences

Download the slides.


This session in one word: Deep.

It's a session that includes topics I've publicly said for years is Stuff You Shouldn't Need To Know and I Just Won't Teach, but it's becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they'll reach for the big red levers with the flashing warning lights. Since we can't keep people from pulling the big red levers, we'd better document the A to Z of what the levers actually do, so that people don't SCRAM unless they really, really, really meant to.

Topics Covered:

  • The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We'll include clear answers to several FAQs: "how do the compiler and hardware cooperate to remember how to respect these rules?", "what is a race condition?", and the ageless one-hand-clapping question "how is a race condition like a debugger?"
  • The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I'll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
  • The unspeakables: I'll grudgingly and reluctantly talk about the Thing I Said I'd Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don't use them! If you can avoid it. But here's what you need to know, even though it would be nice if you didn't need to know it.
  • The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I'll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We'll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.

Part 2 -> Restrictions on compilers and hardware (incl. common bugs); code generation and performance on x86/x64, IA64, POWER, ARM, and more; relaxed atomics; volatile



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image

      Oooh, very pleased to see this given a multi hardware platform treatment. I do hope MS keeps up the ARM investments once the Haswell low-power x86 wave breaks. 

    • User profile image

      Thanks for the talks. I appreciate all the help I can get trying to understand memory ordering Smiley

      Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times.

      As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?

      Also, where would people suggest is a good place to ask questions and learn more about using atomics (and ask if others can help verify my reasoning)?


    • User profile image

      if I were you I would listen to Herb and not try to be a hero. :) If you need lockfree stuff Boost 1.53 added some lockfree stuff to be used, and if you want to learn more 1024cores blog is nice but not updated source of info.

    • User profile image

      GREAT TALK !!! Angel

    • User profile image
      Rein Halbersma

      Great talk! There is a lot of academic research being done on multicore memory models. Here are 3 links to a paper with the mathematical foundations, a graduate course and an online compiler displaying some of the reorderings that are allowed


    • User profile image

      @bcosta Interesting question. Where would do you think it can be of use?


      Great talk. I'm sure all that information will come in handy as most languages and hardware are adopting this model.

    • User profile image

      @bcosta wrote: "Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times. As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?"

      That's fine, it's just many loads pairing with the same store... by "they have to come in pairs" we mean you get the ordering guarantees only when a specific load-acquire sees a specific store-release, and is guaranteed to see everything else the storing thread did before the store-release. That's one "pair." The same store-release could be observed by multiple load-acquires, pairing with each one.

    • User profile image

      @herbsutter:Thanks for the clarification Herb. I thought that would be the case.


      @ajasmin:One scenario I can think of is a very simple wait free "bounded" Single Producer Multi Consumer read-copy-update object (that doesn't handle overflows). I am sure there are many others.


    • User profile image

      I appreciate that so many interesting and good videos are uploaded,
      also in different formats, but.....
      940MB for 1h2m MP4 video (720x408) for a "static" presentation?
      Dude I can shrink this down to 400MB with no quality lose, maybe even less.

      Please try to work on your encoding. You can do better then this.

    • User profile image

      @zack: Thanks, Zack. There is also an MP4 version that's 437 MB. Of course, there's a decrease in quality relative to the 940 MB version, but it looks good to me (slides are clearly legible and Herb looks real).

    • User profile image

      Very nice! When you went from slide 36 (bottom of page 18 in the pdf stack) to slide 37 and made a note to the audience about how hard it is to get atomics-based algorithms correctly, I thought you could have easily used the same simple example from slide 36 for "total store order" to demonstrate this exact point by comparing if(y==1 && x==0) with if(x==0 && y==1) for thread 4.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.