the exception that gets thrown when running the C++ app outside of the debugger is 0x40010006, which is DBG_PRINTEXCEPTION_C

Hmm, I have the feeling that Windows won't always like this stack switching business. NT's structured exception handling relies on the stack and if you switch the stack... I'm not quite sure what can happen.

I'm wondering whether the thread switching function is really saving enough registers

It should be enough unless you use SSE. A normal kernel the relise on interrupts to switch threads has no choice but to save all registers because it doesn't know which registers are used. But what you're doing here is more like cooperative multithreading and more importantly, it's all done in C. The C/C++ compiler expects that a function will preserve those 4 registers (ebp, ebx, edi, esi), all other general purpose registers can be modified by the function as needed. Also, the FPU stack is supposed to be empty when a function is called so it doesn't contain anything that needs saving.