Posted By: Charles | Feb 19th @ 11:00 AM
Joe Duffy, Huseyin Yildiz, Daan Leijen, Stephen Toub and I gathered in a conference room in building 122 to dig into the Task Parallel Library infrastructure. You've heard about the Parallel Computing Platform a few months ago in an interview with Anders Hejlsberg and Joe Duffy. We didn't go too deep in that talk. It was an introduction to the Parallel Computing Platform.

Here, we take a dive down into the technical rabbit hole with Daan, Joe, Stephen and Huseyin.

Daan is an MSR reseacher who's work has been instrumental in bringing TPL and Parallel Extensions to life. Of course, Joe is the guy who invented PLINQ (he wrote the original Think Week paper that impressed Bill) and is a lead developer on the Parallel Computing Platform team. Stephen is the Program Manager (and is the one driving and scheduling many of the interviews you will see covering Parallel Computing Platform here on C9 - Thanks, Stephen!) and Huseyin is a developer who recently joined the group and is already making a big impact.

Most of the time here is spent on the whiteboard with Daan. Make some time for this conversation. There's an awful lot to learn here.

Enjoy!

Click here for the low res download.
Rating:
0
0
evildictaitor
evildictaitor
How could you use the adjective "indescribable" truthfully?
Oooh. The last few minutes are ever so interesting. I'm not so sure that compilers are as far off as you think when you say that automatic parallelism is far away. For languages like C and C++ it's going to be a very long time before any useful autoparallelization happens, but for languages like Haskell, F# and C# quite a lot can be done (and is being done) to automatically parallelize code, and there's some interesting theorems as to the upperbound of automatic parallelisation that compilers can do, and it's not far off the theoretical optimum (and equal for safe, managed code).
Charles,

You brought up a great point\question that has been on my mind for awhile.

Why not add some keyword or attribute to the language itself to further enhance parrelellism?

I can't remember his name but he replied that they are looking into this level of language integration.  I'd love to see a video on that topic.

evildictaitor
evildictaitor
How could you use the adjective "indescribable" truthfully?
MetaGunny wrote:
Charles,

You brought up a great point\question that has been on my mind for awhile.

Why not add some keyword or attribute to the language itself to further enhance parrelellism?

I can't remember his name but he replied that they are looking into this level of language integration.  I'd love to see a video on that topic.



Noooo! Leave C# alone! One of the reasons C# is nice and easy to learn is that it's core keyword set is quite compact. If you start adding lots of keywords here there and everywhere the language gets out of hand quickly.

On the other hand, there's no reason why a compiler shouldn't be able to take the same keyword "for" to mean both the normal syncronous meaning when the for-body is expensive, and the Parallel.For() when the body can be easilly split into asyncronous tasks and have this as an option inside the build menu.
John Melville-- MD
John Melville-- MD
Equality Through Technology

Has anyone considered a "debug" switch to run with n queues, regardless of how many processors are availaible?  I see a bad scenerio comming.

1. Developer is on a 1-2 core box, or a many core box with enough other junk (VS, virus scanner, outlook, IE, whatever else) so that unbeknownst to the programmer, there is no real parallelism.

2.  Dev signs off on the code that has never been tested running truly parallel.

3. Code gets loaded on the mega-beheamoth 16 core production machine that's not running anything else.

4. Code runs in parallel for the first time in production.  Or even worse its shrink-wrap software; and next year when bigger machines come out, more parallelism is exposed and the software starts failing randomly.

Most developers would consider #4 to be a very bad thing, but I find it inevitable.  My dev box is usually running 3-6+ apps when I'm developing, and therefore most of my testing.  If I can't force the code to be parallel in testing, the interleaved paths might get very little coverage, and it will be very hard to know this is happening.

Just curious if this has been thought about.

Ion Todirel
Ion Todirel
ban...kai
I'm using Silverlight player, there is no sound for this video, what's up Charles?
Curt Nichols
Curt Nichols
No Silver Bullet
John Melville, MD wrote:


1. Developer is on a 1-2 core box, or a many core box with enough other junk (VS, virus scanner, outlook, IE, whatever else) so that unbeknownst to the programmer, there is no real parallelism.



Nothing is going to be able to stop the developer from writing bad code or failing to test and profile code.

That said, developers need tools that help profile and illustrate an application's behavior, particularly for data parallelism. We do have some access to these things today--we can pretty easily look at numbers for context switches, cache line loads, etc.--but it has a pretty ad hoc feel to it. It would be great to have tools that target profiling parallel code and, you know, dumb it down for the rest of us. Smiley Intel's Thread Profiler does a great job of visualizing core usage, but to my recollection it only works on native code.

John Melville, MD wrote:


3. Code gets loaded on the mega-beheamoth 16 core production machine that's not running anything else.

4. Code runs in parallel for the first time in production.  Or even worse its shrink-wrap software; and next year when bigger machines come out, more parallelism is exposed and the software starts failing randomly.



Untested synchronization code will always be a risk, whether it's used in a simple multi-threaded program or one that attempts parallelism. (Yes, I said "attempts." Tongue Out)

Scalability is going to be another lurking demon. Does your program scale? It's not just a question of machine resources, or of whether your algorithms and locking code scale.

Hardware architecture also affects whether your application scale. E.g., a 2-socket, 4 core computer will behave differently than a 1-socket, 4 core computer. Your application may work better with one or the other (particularly depending on its cache usage patterns).

More than that, you must test on the target hardware. I could tell you how software I've written scales on 4- and 8-core computers of differing architectures, but I can't and won't promise anything about its performance on a 16-core box, because I've never had access to one. For all I know the algorithms or locking could bring it to its knees. For that matter, I/O could become the bottleneck. I won't know unless I can profile it on that particular hardware.

This last bit is, I think, the most likely outcome of your "naive programmer" scenario above. The developer codes it up on a 4-core box and it runs pretty well and scales nicely from 1 to 4 cores. Move it to a 16-core box and it runs half as fast. Seriously. Stay in this business a few years and you will see this happen. A lot.

Perhaps this is just another argument for configurable core usage.

Looking forward to watching the video, looks like it should be good. Smiley
littleguru
littleguru
allein, allein,... allein, allein!
Brainstorming here:

Wouldn't here

try
{
    Parallel.For(0, n, () => ... throws exceptions);
}
catch(exception.Contains(typeof(FooException)) ||
   exception.Contains(typeof(BarException))
{
    // handle the exception
}

come in handy for C#? Why am I asking this? I see a lot of people starting to handle exceptions like this:

try
{
    // some parallel code that throws exceptions.
}
catch(AggregateException)
{
    // do something
}

where they would also handle OutOfMemoryException and such - or are such critical exceptions never aggregated?

to #4

I understand you can have many apps running on your dev box, but as long as processor core utilization is not 100% you should be able to successefully schedule your tasks on that core. It is similar to load balancing technique or a time compression utilization. So your parrallel code will experience parallel run-time environment before production, it could be slower though, so what.