I've seen a couple talks by Herb Sutter over the last year where he talks about how awesome lambdas are (and he's right they are!) One thing he generally mentions, almost offhanded, is that the std::for_each algorithm is able to "partially unroll" the loop.

It would be interesting to shed some light on how the standard library is able to do tricks like this where it seems to produce code that is as fast or faster than its longhand counterpart.