Interesting topic.
But the code examples above will generate a lot of overhead for the small tasks it seems to be for. For example using it as a scoped_array.
Best would be if you could make the optimizer inline , etc..
Make the compiler do the heavy work for us.

Like this little code example:

inline void normal_function(int v) {
    cout << "normal_function: " << v << endl;

template <typename Function>
class sop {
    Function f_;
    sop(Function fun) : f_(fun) {}
    ~sop() {

void t_sop()
    auto l = [](){
    sop<decltype(l)> s(l);

int main()
    return 0;


Generated code in win32 release mode:

00291000  mov         eax,dword ptr [__imp_std::endl (292044h)]  
00291005  mov         ecx,dword ptr [__imp_std::cout (292068h)]  
0029100B  push        eax  
0029100C  push        7Bh  
0029100E  push        offset string "normal_function: " (292114h)  
00291013  push        ecx  
00291014  call        std::operator<<<std::char_traits<char> > (2910F0h)  
00291019  add         esp,8  
0029101C  mov         ecx,eax  
0029101E  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (292050h)]  
00291024  mov         ecx,eax  
00291026  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (29204Ch)]  

Shouldn't you be able to achieve this with the power of c++0x but without this big overhead ?