Best would be if you could make the optimizer inline , etc..

I don't understand... that's exactly what the compiler did. The code in main is the code from normal_function, 3 calls to the stream insertion operator and nothing else.