Curt Nichols
blog: Code Never Written.
He was a Working Stiff located in the Channel9 Witness Protection Program--until they dropped the ball and leaked his true identity and location... :
Formerly http://channel9.msdn.com/Niners/amotif
C&B 2011 Panel: Herb Sutter, Andrei Alexandrescu and Scott Meyers - Concurrency and Parallelism
Jan 04, 2012 at 7:27 PMGood talk, thanks for making it available.
Meet the Industrial Design Team
Apr 16, 2009 at 7:39 AMRoz Ho: Reflections On Leadership and Believing in Yourself
Nov 25, 2008 at 8:00 PMJoe Duffy, Huseyin Yildiz, Daan Leijen, Stephen Toub - Parallel Extensions: Inside the Task Parallel
Feb 20, 2008 at 7:45 AMNothing is going to be able to stop the developer from writing bad code or failing to test and profile code.
That said, developers need tools that help profile and illustrate an application's behavior, particularly for data parallelism. We do have some access to these things today--we can pretty easily look at numbers for context switches, cache line loads, etc.--but it has a pretty ad hoc feel to it. It would be great to have tools that target profiling parallel code and, you know, dumb it down for the rest of us.
Untested synchronization code will always be a risk, whether it's used in a simple multi-threaded program or one that attempts parallelism. (Yes, I said "attempts."
Scalability is going to be another lurking demon. Does your program scale? It's not just a question of machine resources, or of whether your algorithms and locking code scale.
Hardware architecture also affects whether your application scale. E.g., a 2-socket, 4 core computer will behave differently than a 1-socket, 4 core computer. Your application may work better with one or the other (particularly depending on its cache usage patterns).
More than that, you must test on the target hardware. I could tell you how software I've written scales on 4- and 8-core computers of differing architectures, but I can't and won't promise anything about its performance on a 16-core box, because I've never had access to one. For all I know the algorithms or locking could bring it to its knees. For that matter, I/O could become the bottleneck. I won't know unless I can profile it on that particular hardware.
This last bit is, I think, the most likely outcome of your "naive programmer" scenario above. The developer codes it up on a 4-core box and it runs pretty well and scales nicely from 1 to 4 cores. Move it to a 16-core box and it runs half as fast. Seriously. Stay in this business a few years and you will see this happen. A lot.
Perhaps this is just another argument for configurable core usage.
Looking forward to watching the video, looks like it should be good.