Well, in this case it might be something along these lines:
[Edit: I changed the code since there were some errors with the stack pointers]
int SomeMethod(VirtualThread* thread)
VirtualObject** stackStart = thread->StackPointer;
*thread->StackPointer = new foo();
VirtualObject** bar = thread->StackPointer++;
// The following lines of code would be changed to the code below it
// if (someValue)
// return 123;
retVal = 123;
thread->StackPointer = stackStart;
Or something along those lines. I do think there would need to be at least one level of indirection for local and static ref types.
Edit: I don't think with the above code there should be an issue with CPU registers since I am explicitly casting what is on the "virtual stack" to an instance of the actual object each time. The object can move between those calls. This is especially true if I use the context switching mechanism I proposed above.
Edit Edit: About the preemptive multitheading: If you think about it, even if you use a timer or interrupt, you still need to know which virtual thread to switch to. I'm not exactly sure about the details of context switching and register saving etc, but it will basically need to come down to saving all of the registers into some storage (preferably in the VirtualThread class), and then restoring the state from another VirtualThread and continuing execution where it left off.
So as far as the GC is concerned, now it just needs to enumerate through all VirtualThreads' stacks starting at StackBase up to StackPointer, and if it finds any non-null values there, it just calls VirtualObject::Mark() on it. Now you know exactly which objects are still being used as "locals" withing the executing functions (in addition to also calling VirtualObject::Mark() for each non-null static object).
Edit Edit Edit: I just realized that the "this" can also move while the class is executing code so it means that a "this" pointer will need to be passed on the stack as well, and all "this" references will need to go through this pointer as well.
Obviously this is getting more complex but I still think it would be way faster than the current performance we see with the Micro Framework.