@Dexter: The code in my post was just an example of compiler optimization. How a good template class should work.

Both your code and Burkholder's code have a big overhead.
Is there a way to minimize the overhead like in my example ?