Seva. You are correct, my apologies. Although the problem with using PF dev team stats is that while they may know that cranking up the other threads takes, say, 2,000 cycles per thread (I've really no idea - could be 20,000), you'd still have to test your task/code to see how much work it is in comparison.
Although if the profiler could try P.For and normal For and suggest when you're actually degrading performance, that'd be excellent. That said, I did think that P.For uses some kind of ramping up technology so it ran synchronously for small loads - and thus, what we saw in the video shouldn't happen. P.For is so easy to use but so easy to use in the wrong place.
My problem is that most of the code I write I think is too small to be parallelised. Like arrays of 12 items. By the time I've finished I've got a huge chain of little bits of work adding up to a large amount of work, so then I start to look for logic that can run independently of each other and kick off some branch of execution on another thread and re-join/wait further down. The problem being that in the future I won't be able to find enough simultaneous work to spread across all cores!
Daniel - thanks, that'd be cool.