Interesting topic.
But the code examples above will generate a lot of overhead for the small tasks it seems to be for. For example using it as a scoped_array.
Best would be if you could make the optimizer inline , etc..
Make the compiler do the heavy work for us.

Like this little code example:

inline void normal_function(int v) {
    cout << "normal_function: " << v << endl;
}

template <typename Function>
class sop {
    Function f_;
public:
    sop(Function fun) : f_(fun) {}
    ~sop() {
        f_();
    }
};

void t_sop()
{
    auto l = [](){
        normal_function(123);
    };
    sop<decltype(l)> s(l);
}

int main()
{
    t_sop();
    return 0;
}
 

 

Generated code in win32 release mode:

    t_sop();
00291000  mov         eax,dword ptr [__imp_std::endl (292044h)]  
00291005  mov         ecx,dword ptr [__imp_std::cout (292068h)]  
0029100B  push        eax  
0029100C  push        7Bh  
0029100E  push        offset string "normal_function: " (292114h)  
00291013  push        ecx  
00291014  call        std::operator<<<std::char_traits<char> > (2910F0h)  
00291019  add         esp,8  
0029101C  mov         ecx,eax  
0029101E  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (292050h)]  
00291024  mov         ecx,eax  
00291026  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (29204Ch)]  


Shouldn't you be able to achieve this with the power of c++0x but without this big overhead ?