Inside Windows 8: Chris Stevens - Boot Environment

Herb Sutter presents atomic<> Weapons, 1 of 2. This was filmed at C++ and Beyond 2012. As the title suggests, this is a two part series (given the depth of treatment and complexity of the subject matter).
Part 1 -> Optimizations, races, and the memory model; acquire and release ordering; mutexes vs. atomics vs. fences
Abstract:
This session in one word: Deep.
It's a session that includes topics I've publicly said for years is Stuff You Shouldn't Need To Know and I Just Won't Teach, but it's becoming achingly clear that people do need to know about it. Achingly, heartbreakingly clear, because some hardware incents you to pull out the big guns to achieve top performance, and C++ programmers just are so addicted to full performance that they'll reach for the big red levers with the flashing warning lights. Since we can't keep people from pulling the big red levers, we'd better document the A to Z of what the levers actually do, so that people don't SCRAM unless they really, really, really meant to.
Topics Covered:
Oooh, very pleased to see this given a multi hardware platform treatment. I do hope MS keeps up the ARM investments once the Haswell low-power x86 wave breaks.
Thanks for the talks. I appreciate all the help I can get trying to understand memory ordering
Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times.
As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?
Also, where would people suggest is a good place to ask questions and learn more about using atomics (and ask if others can help verify my reasoning)?
Thanks,
Brendon.
@bcosta
if I were you I would listen to Herb and not try to be a hero. :) If you need lockfree stuff Boost 1.53 added some lockfree stuff to be used, and if you want to learn more 1024cores blog is nice but not updated source of info.
GREAT TALK !!!
Great talk! There is a lot of academic research being done on multicore memory models. Here are 3 links to a paper with the mathematical foundations, a graduate course and an online compiler displaying some of the reorderings that are allowed
http://www.cl.cam.ac.uk/~pes20/cpp/popl085ap-sewell.pdf
http://www.cl.cam.ac.uk/teaching/1213/R204/materials.html
http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/
@bcosta Interesting question. Where would do you think it can be of use?
Great talk. I'm sure all that information will come in handy as most languages and hardware are adopting this model.
@bcosta wrote: "Is it strictly necessary that acquire/release "come in a pairs"? I have heard this a few times. As an example, a single thread that uses a store/release to publish some data where multiple other threads that use a load/acquire to read that data. This is permitted right? I.e. One release can be "viewed" by many acquires or does it need to be strictly one-to-one?"
That's fine, it's just many loads pairing with the same store... by "they have to come in pairs" we mean you get the ordering guarantees only when a specific load-acquire sees a specific store-release, and is guaranteed to see everything else the storing thread did before the store-release. That's one "pair." The same store-release could be observed by multiple load-acquires, pairing with each one.
@herbsutter:Thanks for the clarification Herb. I thought that would be the case.
@ajasmin:One scenario I can think of is a very simple wait free "bounded" Single Producer Multi Consumer read-copy-update object (that doesn't handle overflows). I am sure there are many others.
I appreciate that so many interesting and good videos are uploaded,
also in different formats, but.....
940MB for 1h2m MP4 video (720x408) for a "static" presentation?
Dude I can shrink this down to 400MB with no quality lose, maybe even less.
Please try to work on your encoding. You can do better then this.
@zack: Thanks, Zack. There is also an MP4 version that's 437 MB. Of course, there's a decrease in quality relative to the 940 MB version, but it looks good to me (slides are clearly legible and Herb looks real).
C
Very nice! When you went from slide 36 (bottom of page 18 in the pdf stack) to slide 37 and made a note to the audience about how hard it is to get atomics-based algorithms correctly, I thought you could have easily used the same simple example from slide 36 for "total store order" to demonstrate this exact point by comparing if(y==1 && x==0) with if(x==0 && y==1) for thread 4.