Tech Off Thread

77 posts

IL to C++ compiler - Need advice on implementing context switching

Back to Forum: Tech Off
  • User profile image
    BitFlipper

    EDIT: Nevermind, I found the problem. It seems that std::wcout uses quite a bit of stack space. I had the native stack too small, so once I increased the size to 16K or larger, the problem went away. I was initially keeping the stack size smaller because I thought the few calls I was making just didn't use that much space. What is weird is that I'm sure I increased the size of the stack to something like 512K at some point but I also found another bug in a vector class I was using, so maybe I didn't realize that the corrupted memory at that time was really caused by the vector class.

     

    OK I finally was able to get back to the heap corruption issue. In order to really test context switching, I needed to implement threading, which required me to implement delegates (including MultiCastDelegates while I was at it), and also TimeSpan and DateTime. Quite a tangent.

    Anyway, so I was testing the StartMainThread call as shown a few messages above. I tried many different things, including giving NativeStack its own block of memory, but that block of memory is always corrupted. I carefully looked at the end of the block of memory (where the new thread will push and pop values), and indeed it pushes them within the boundary of the allocated block of memory (going downwards as expected).

    The corrupted memory is always the size of the allocated block of memory plus 0x40. It is as if the stack switching code is doing something with the end of the block of memory. I even tried allocating a huge block of memory, and then setting the NativeStack to the middle of this block of memory - it still ends up corrupting the end of the block of memory. It is as if it known where the boundaries of the allocated block of memory is and then mucks around at 0x40 bytes past the end of it.

    I confirmed that other code that allocates different blocks of memory aren't corrupting those blocks by allocating large dummy blocks inbetween those allocation calls so that the used blocks are not next to each other - it always comes back to the NativeStack block of memory as the one being corrupted.

    Any ideas what could be going on here?

  • User profile image
    Dexter

    Hmm, I guess this will be "fun" to debug. Couple more questions:

    1. you mentioned that it happens after using cout, what happens if you don't use it?
    2. How many threads are you starting and from where? All from main thread? Or each thread starts another thread?
  • User profile image
    BitFlipper

    ,Dexter wrote

    Hmm, I guess this will be "fun" to debug. Couple more questions:

    1. you mentioned that it happens after using cout, what happens if you don't use it?
    2. How many threads are you starting and from where? All from main thread? Or each thread starts another thread?

    I found the problem (see edited message above). I really haven't started any additional threads at this point. I was just using the StartMainThread call to switch the stack of the main thread. It was the main thread (with its new stack) that was causing the corrupted memory due to the stack not being large enough.

    Previously, without using wcout, there was no memory corruption which makes sense now that it is clear what the problem was.

    I must say, working with C++ is really painful compared to C# (I used to do it a lot up to about 8 years ago when I started programming mainly in C#). Part of the problem is that the compiler tends to give you really obscure error messages when something goes wrong. Leave out a brace or semicolon somewhere, and it tells you some class in some unrelated file is undefined. Often you have to go the last error messages of dozens and dozens of others to figure out what the real problem is.

  • User profile image
    Dexter

    He he, I was wondering about the stack size but I didn't expect cout to need that much. I'm glad it's working now.

  • User profile image
    BitFlipper

    OK, next problem... The following doesn't work:

    __declspec(naked) void __fastcall SwitchContext(void* pOldStack, void* pNewStack)
    {
        __asm
        {        
            // Save registers
            push ebp
            push ebx
            push esi
            push edi
                
            // Save old stack
            mov [ecx], esp    // store to pOldStack
                
            // Load new stack
            mov esp, [edx]    // load from pNewStack
            
            // Restore registers
            pop edi
            pop esi
            pop ebx
            pop ebp
            ret
        }
    }

    My assumption here is that pOldStack and pNewStack should not the same as NativeStack for each thread, correct? NativeStack is really a constant value pointing to the top of the block of memory that was allocated for that thread's stack.

    So instead, I need to have an additional void* in each thread whose address will be passed in to the SwitchContext call, correct? The question is, if this is the very first context switch, what should be stored at the pNewStack value? I assume this should be at some negative offset into that thread's NativeStack. Most likely I would need to calculate that value somewhere in the following call that prepares a thread for execution:

    // Set up the thread so that it is ready to run
    void NativePlatform::InitializeThreadContext(RuntimeThread* pThread, VirtualObject** pData)
    {    
        void* stack = pThread->NativeStack;
        void* startAddress = (void*)pThread->GetStartAddress();
    
        __asm
        {        
            mov ecx, esp    // Save current stack pointer
            mov esp, stack
    
            mov eax, startAddress
            push eax        // Push the start function address as the return address
    
            // Save registers
            xor eax, eax    // Let's always start a thread with zeroed registers
            push eax
            push eax
            push eax
            push eax
    
            mov stack, esp
            mov esp, ecx    // Restore the stack pointer
        }
    }

    BTW, thanks for all of your help so far, it is greatly appreciated.

  • User profile image
    Dexter

    So instead, I need to have an additional void* in each thread whose address will be passed in to the SwitchContext call, correct?

    Yes, this value contains the current stack pointer for the thread. When you switch away from a thread its current stack pointer gets saved and the stack pointer of the new thread is loaded.

    The question is, if this is the very first context switch, what should be stored at the pNewStack value

    That's the role of InitializeContext, it sets up the initial stack so it looks similar to the one expected by SwitchContext. That's why the start address gets pushed on the stack, when SwitchContext willreturn to what's stored on the stack, your start address. This might sound strange, the start function is notcalled but returned too, but it works fine as long as the stack is setup correctly. That's also the reason for those 4 push eax, they compensate the fact that SwitchContext pops edi, esi, ebp, ebx.

    I assume this should be at some negative offset into that thread's NativeStack

    The InitializeContext already computes the correct value but it looks like it doesn't store it correctly. Let's fix things a bit:

    void NativePlatform::InitializeThreadContext(RuntimeThread* pThread, VirtualObject** pData) {
        void* stack = pThread->NativeStack;
        void* startAddress = (void*)pThread->GetStartAddress();

        __asm {
            // Save current stack
            mov ecx, esp

            // Switch to pThread's stack
            mov esp, stack

            // Push arguments for the start function
            // It looks like your start function expects pThread and pData so:
            mov eax, pData
            push eax      
            mov eax, pThread
            push eax

            // Push the start function address, SwitchContext will return to this address
            mov eax, startAddress
            push eax

            // Push 4 values to compensate for pop edi, esi, ebx, ebp in SwitchContext
            xor eax, eax
            push eax
            push eax
            push eax
            push eax

            // At this point the stack is similar to what SwitchContext expects, save the stack pointer
            mov stack, esp

            // Switch back to this thread's stack
            mov esp, ecx
        }
        // save the thread's current stack pointer
        pThread->CurrentStackPointer = stack;
    }

    I added some comments to make it more clear and also added arguments for your start function (my previous example didn't pass any arguments to the start function).

    After InitializeContext runs you should be able to switch to the new thread by doing SwitchContext(&currentThread->CurrentStackPointer, &newThread->CurrentStackPointer).

    Now there's another fun thing to solve. What happens when the thread exits? Big Smile

  • User profile image
    BitFlipper

    I've been tweaking the asm code for quite some time today with some luck. I can get it to call into the correct function (startAddress), but I just can't get ESP and EBP to have the correct values and hence the stack is messed up.

    I tried your code, and in that case it also successfully calls into startAddress, but EBP is null and so the stack is also messed up. I haven't yet been able to verify whether ESP is correct in this case.

    That is kinda where I've been stuck, since I'm not 100% sure where to get the correct value to store for EBP. EBP is the last value it pops before executing the ret.

  • User profile image
    BitFlipper

    Below is what I have so far. Note I need 4 versions, depending on whether it is a static or instance call, and whether it passes the extra data or not. Below is the static version with no data. As I said it successfully calls into startThread, but the stack is screwed up.

    void NativePlatform::InitializeThreadContext(RuntimeThread* pThread, VirtualObject** pData)
    {    
        void* stack = pThread->NativeStack;
        void* callObj = pThread->GetCallObject();
        void* startAddress = (void*)pThread->GetStartAddress();
    
        if (!callObj && !pData)
        {
            // Static call with no data
            __asm
            {        
                mov ecx, esp
                mov esi, startAddress
                mov eax, pThread
                mov esp, stack
    
                push eax
                push esi
                push esp
    
                // Save registers
                xor eax, eax             
                push eax
                push eax 
                push eax
    
                mov edi, esp
                mov esp, ecx
    
                mov stack, edi         
            }
    
            pThread->CurrentStackPointer = stack;
        }
        else if (!callObj && pData)
        {
            // ...
        }
    }

  • User profile image
    BitFlipper

    I noticed that when I call directly into the destination function without using this context switching code, I notice that the difference between ESP and EBP is 216. I would have thought it would be much less, like 4 or 8. So that really confuses me because I was trying to see what the stack is supposed to look like on a good call, but this makes no sense to me.

    About your observation related to how to handle the case when the thread exits. I was thinking about that too, and I think in that case it should trigger another context switch to another thread, and the context switch code just doesn't switch back to it after that. I could also push the necessary values onto the stack so that it ultimately "returns" into a function that causes a context switch, and hence switches to another thread.

  • User profile image
    Dexter

    but EBP is null and so the stack is also messed up

    Well, it's null because that's how InitializeContext initializes it. But why do you care about the contents of EBP? The start function doesn't expect a value in this register.

    Below is what I have so far

    I see you're pushing ESP, why? Again, this will be poped as EBP but you don't need EBP to have any particular value.

  • User profile image
    BitFlipper

    @Dexter:

    OK I see what you are saying about EBP, although if I'm eventually going to put an additional return address on the stack to serve as a thread exit handler, I should probably have something meaningful in EBP.

    But for now I'm not going to worry about EBP since the called function doesn't care about it, and instead try and figure out what else could be wrong here. As I said previously, once the called function is entered and I manually subtract 4 from EBP (after it was assigned from ESP in the prologue), the called function can see it's parameters properly. That might be a good clue as to where the problem could be.

  • User profile image
    Dexter

    Stupid me, I'm talking about the way threads exit and I forget that the return address of the start function needs to be pushed onto the stack to. Even if the start function never returns because it does a SwitchContext you still need to push it because otherwise the start function can't find its arguments.

    Between push eax and push esi you need to add another push for the return address. This can be 0 if you chose never to return from the start function or it can be the address of an "exit function" that calls SwitchContext to get away from the thread.

  • User profile image
    Dexter

    Note I need 4 versions, depending on whether it is a static or instance call, and whether it passes the extra data or not

    That sounds a bit excesive. Normally there is a single thread start function and this isn't the function you pass to the Thread constructor through a delegate. This start function is a special function that takes care of initializing the thread and then calls the delegate.

    Also, the static/instance distinction is the job of the delegate, the thread start function should not be sensitive to this.

  • User profile image
    BitFlipper

    @Dexter:

    The four versions correspond to TreadStart and ParameterizedThreadStart where each of those can call a static or instance method. While the Thread constructor doesn't care about static vs instance, internally I need to handle those two cases. In this particular case, it determines how many parameters are pushed onto the stack. For instance functions, an additional "this" parameter will be pushed. In the generated C++ code, the four functions would look like this:

    Void ThreadTestClass::ThreadTestStaticNoData(RuntimeThread* pThread) {}
    
    Void ThreadTestClass::ThreadTestStaticData(RuntimeThread* pThread, VirtualObject** data) {}
    
    Void ThreadTestClass::ThreadTestInstanceNoData(RuntimeThread* pThread, ThreadTestClass** pThis) {}
    
    Void ThreadTestClass::ThreadTestInstanceData(RuntimeThread* pThread, ThreadTestClass** pThis, VirtualObject** data) {}

    Those are the functions that the first context switch need to call into, and each have a different parameter signature.

    EDIT: You have a good point about the single start function, I didn't think about that. So this single start function is the one that cares about the four different versions, and calls the correct one. In addition, it could do the cleanup after the start function returns. At that point it would set the Exit flag on pThread, and then call the scheduler which would switch away to another thread.

  • User profile image
    Dexter

    OK, but did you try the fix for the start function? Does it work?

  • User profile image
    BitFlipper

    @Dexter:

    I just rewrote the code to use the single start function, but I have not yet fixed the off-by-4 bug. I have to leave for work right now so I'll have to wait to do that tonight maybe.

  • User profile image
    BitFlipper

    This morning on my train commute I was able to try out your suggestion about adding the extra push for the return address, and it worked! I'm also now calling into the special start function and it then calls one of the four function types (the actual specified start function). Once that function returns, I set the Exited flag on the thread and call into the scheduler to switch to a different thread. This actually all started working correctly so I'm quite happy about that. Thanks for your help with his.

    Next up is to implement Sleep and Join. Sleep should be easy in that all I need to do is set a future tick based on the specified millisecond value, then call into the scheduler to switch to a different thread. The scheduler will then enumerate through all threads and see which ones are ready to run, skipping those that have tick value set sometime in the future. BTW I'm using a counter on each thread that is incremented by the amount of its priority. Then the scheduler picks the thread with the highest counter value. When a thread is run, it's counter is set back to zero.

    Join should also be easy in that it stores a pointer (or thread ID) to the thread it should "join". The scheduler will check whether the other thread is still in a running state and then keep skipping the joining thread until the other thread's Exited flag is set.

    If no thread is ready to run, then my thinking was that I would need to call the asm halt instruction. However reading up on the halt instruction indicates that it is a privileged instruction that can only be executed by the kernel. I guess for x86 right now I can just call Sleep(0). If this eventually runs on a microcontroller I will just call whatever the equivalent is for the halt instruction.

  • User profile image
    Adam​Speight2008

    What about a hashtable based on the tick, so you don't have enumerate every thread.?

    <Tick, LinkedList<Thread>> 

     

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.