I don't think with the above code there should be an issue with CPU registers since I am explicitly casting what is on the "virtual stack" to an instance of the actual object each time.

That doesn't matter too much. If you have code like the following:

((foo*)*bar)->DoSomething();((foo*)*bar)->DoSomething();

the compiler can (and usually will) perform the cast once and store the result into a register. However, this should work fine if you're allocating GC memory between those 2 lines. The compiler can't reuse the value from the previous cast because it has no way to prove that *bar wasn't changed by the memory allocation function.

Unfortunately this compiler optimization means that you have no way of doing normal preemptive multithreading, you have to use SwitchContext. I'm not what that will do to performance but I see no other way of doing this given the constraints.

I'm not exactly sure about the details of context switching and register saving etc, but it will basically need to come down to saving all of the registers into some storage (preferably in the VirtualThread class),

Pretty much yes. One possibility is to save the registers on the current thread's stack and then switch the stack. It's also possible that you won't need to save the registers if you use the SwitchContext approach because the compiler will probably save the "used" registers when it calls the function.