Good talk, thanks for making it available.
He was a Working Stiff located in the Channel9 Witness Protection Program--until they dropped the ball and leaked his true identity and location... :
Jan 04, 2012 at 7:27PM
Joe Duffy, Huseyin Yildiz, Daan Leijen, Stephen Toub - Parallel Extensions: Inside the Task ParallelFeb 20, 2008 at 7:45AMJohn Melville, MD wrote:
1. Developer is on a 1-2 core box, or a many core box with enough other junk (VS, virus scanner, outlook, IE, whatever else) so that unbeknownst to the programmer, there is no real parallelism.
Nothing is going to be able to stop the developer from writing bad code or failing to test and profile code.
That said, developers need tools that help profile and illustrate an application's behavior, particularly for data parallelism. We do have some access to these things today--we can pretty easily look at numbers for context switches, cache line loads, etc.--but it has a pretty ad hoc feel to it. It would be great to have tools that target profiling parallel code and, you know, dumb it down for the rest of us. Intel's Thread Profiler does a great job of visualizing core usage, but to my recollection it only works on native code.John Melville, MD wrote:
3. Code gets loaded on the mega-beheamoth 16 core production machine that's not running anything else.
4. Code runs in parallel for the first time in production. Or even worse its shrink-wrap software; and next year when bigger machines come out, more parallelism is exposed and the software starts failing randomly.
Untested synchronization code will always be a risk, whether it's used in a simple multi-threaded program or one that attempts parallelism. (Yes, I said "attempts." )
Scalability is going to be another lurking demon. Does your program scale? It's not just a question of machine resources, or of whether your algorithms and locking code scale.
Hardware architecture also affects whether your application scale. E.g., a 2-socket, 4 core computer will behave differently than a 1-socket, 4 core computer. Your application may work better with one or the other (particularly depending on its cache usage patterns).
More than that, you must test on the target hardware. I could tell you how software I've written scales on 4- and 8-core computers of differing architectures, but I can't and won't promise anything about its performance on a 16-core box, because I've never had access to one. For all I know the algorithms or locking could bring it to its knees. For that matter, I/O could become the bottleneck. I won't know unless I can profile it on that particular hardware.
This last bit is, I think, the most likely outcome of your "naive programmer" scenario above. The developer codes it up on a 4-core box and it runs pretty well and scales nicely from 1 to 4 cores. Move it to a 16-core box and it runs half as fast. Seriously. Stay in this business a few years and you will see this happen. A lot.
Perhaps this is just another argument for configurable core usage.
Looking forward to watching the video, looks like it should be good.