Greeting, Im glad to hear that you had some new parallel sorting algoritms but its still not my favourite sort of all time If you really want to sort 2 million pixels, I'd highly suggest "ShearSort". It maintains data locality fairly well and would be insanely perfect for GPU sorting too since core communication happens predictivaly and at once for all little buckets, Way better than hugely recursive functions like a Quicksort even though IntroSort is fairly good for anything less than 10,000 element to sort. As soon at the amount needing to get sorted increases its till breaks down... call stacks grow too damn large with even those highly recursive things unless you have tail recursion in the PPL implementation being used.
Thats a great example in The highly buggy Java which shows a ShrearSort advantage with only 10000 elements The more elements... the faster in comparison ShearSort is. It's plenty fun to mess with the parameters since you can use many algorithms listed below the sort to see how each work by tweaking the URL.
I keep my own ShearSort around for fun because it's my personal favourite, but its sort of tricky too implement. My wish is to have a highly available ShearSort which ANYTONE can benefit from given the header file from ConcrtExtras.
Also he did a good job, he seems slightly nervous but we're all developers here, more camera time should help him out on the whiteboard since hes quite a good member of the team.
Thanks Charles for asking and trying to pry open their design decisions for things but honestly... I miss Rick Malloy... he was my Parallel C++ hero on C9.