Loading user information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading user information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 1 of 2

1 hour, 21 minutes, 12 seconds


Right click “Save as…”

Herb Sutter presents atomic<> Weapons, 1 of 2. This was filmed at C++ and Beyond 2012. As the title suggests, this is a two part series (given the depth of treatment and complexity of the subject matter).

Part 1 -> Optimizations, races, and the memory model; acquire and release ordering; mutexes vs. atomics vs. fences

Download the slides.


This session in one word: Deep.

It's a session that includes topics I've publicly said for years is Stuff You Shouldn't Need To Know and I Just Won't Teach, but it's becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they'll reach for the big red levers with the flashing warning lights. Since we can't keep people from pulling the big red levers, we'd better document the A to Z of what the levers actually do, so that people don't SCRAM unless they really, really, really meant to.

Topics Covered:

  • The facts: The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct. We'll include clear answers to several FAQs: "how do the compiler and hardware cooperate to remember how to respect these rules?", "what is a race condition?", and the ageless one-hand-clapping question "how is a race condition like a debugger?"
  • The tools: The deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers. I'll try to convince you why standalone memory barriers are bad, and why barriers should always be associated with a specific load or store.
  • The unspeakables: I'll grudgingly and reluctantly talk about the Thing I Said I'd Never Teach That Programmers Should Never Need To Now: relaxed atomics. Don't use them! If you can avoid it. But here's what you need to know, even though it would be nice if you didn't need to know it.
  • The rapidly-changing hardware reality: How locks and atomics map to hardware instructions on ARM and x86/x64, and throw in POWER and Itanium for good measure – and I'll cover how and why the answers are actually different last year and this year, and how they will likely be different again a few years from now. We'll cover how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers.

Part 2 -> Restrictions on compilers and hardware (incl. common bugs); code generation and performance on x86/x64, IA64, POWER, ARM, and more; relaxed atomics; volatile


Follow the discussion

  • Oops, something didn't work.

    Getting subscription
    Subscribe to this conversation
  • Oooh, very pleased to see this given a multi hardware platform treatment. I do hope MS keeps up the ARM investments once the Haswell low-power x86 wave breaks. 

  • Thanks for the talks. I appreciate all the help I can get trying to understand memory ordering Smiley

    Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times.

    As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?

    Also, where would people suggest is a good place to ask questions and learn more about using atomics (and ask if others can help verify my reasoning)?


  • IvanIvan

    if I were you I would listen to Herb and not try to be a hero. :) If you need lockfree stuff Boost 1.53 added some lockfree stuff to be used, and if you want to learn more 1024cores blog is nice but not updated source of info.

  • felix9felix9 the cat that walked by itself

    GREAT TALK !!! Angel

  • Rein HalbersmaRein Halbersma

    Great talk! There is a lot of academic research being done on multicore memory models. Here are 3 links to a paper with the mathematical foundations, a graduate course and an online compiler displaying some of the reorderings that are allowed


  • @bcosta Interesting question. Where would do you think it can be of use?


    Great talk. I'm sure all that information will come in handy as most languages and hardware are adopting this model.

  • @bcosta wrote: "Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times. As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?"

    That's fine, it's just many loads pairing with the same store... by "they have to come in pairs" we mean you get the ordering guarantees only when a specific load-acquire sees a specific store-release, and is guaranteed to see everything else the storing thread did before the store-release. That's one "pair." The same store-release could be observed by multiple load-acquires, pairing with each one.

  • @herbsutter:Thanks for the clarification Herb. I thought that would be the case.


    @ajasmin:One scenario I can think of is a very simple wait free "bounded" Single Producer Multi Consumer read-copy-update object (that doesn't handle overflows). I am sure there are many others.


  • zackzack

    I appreciate that so many interesting and good videos are uploaded,
    also in different formats, but.....
    940MB for 1h2m MP4 video (720x408) for a "static" presentation?
    Dude I can shrink this down to 400MB with no quality lose, maybe even less.

    Please try to work on your encoding. You can do better then this.

  • CharlesCharles Welcome Change

    @zack: Thanks, Zack. There is also an MP4 version that's 437 MB. Of course, there's a decrease in quality relative to the 940 MB version, but it looks good to me (slides are clearly legible and Herb looks real).

  • Very nice! When you went from slide 36 (bottom of page 18 in the pdf stack) to slide 37 and made a note to the audience about how hard it is to get atomics-based algorithms correctly, I thought you could have easily used the same simple example from slide 36 for "total store order" to demonstrate this exact point by comparing if(y==1 && x==0) with if(x==0 && y==1) for thread 4.

Remove this comment

Remove this thread


Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.