Tech Off Thread

77 posts

IL to C++ compiler - Need advice on implementing context switching

Back to Forum: Tech Off
  • User profile image
    BitFlipper

    OK I was able to fix the problem with the temp variables, although I'd need some thorough testing to verify that it works in all cases. Basically, what I needed to do is whenever an argument, local or static variable gets assigned, I need to check whether that same variable is currently on the stack. If so, I need to get a new temp variable and assigne it the value of the variable, and also replace the previous variable with the temp variable everywhere it shows up on the stack. Below is what the fixed function looks like:

    Void VirtualString::Ctor_VirtualString(VirtualThread* pThread, VirtualString** pThis, Char* pStr)
    {
        // Ref type locals
        VirtualString** temp0 = (VirtualString**)&pThread->StackPointer[1];
    
        memset(&pThread->StackPointer[1], 0, sizeof(void*) * 1);
        pThread->StackPointer += 1;
    
        // Value type locals
        Bool local0 = false;
        Char* temp1 = null;
        Char* temp2 = null;
    
    IL_0000: // ldarg.0
    IL_0001: // call Void .ctor()
        (*pThis)->VirtualObject::Ctor_VirtualObject(pThread, (VirtualObject**)pThis);
    
    IL_0006: // nop
    IL_0007: // nop
    IL_0008: // ldarg.0
    IL_0009: // ldarg.1
    IL_000a: // stfld Char* m_staticBuffer
        (*pThis)->m_staticBuffer = pStr;
    
    IL_000f: // ldarg.0
    IL_0010: // ldc.i4.0
    IL_0011: // stfld UInt16 m_length
        (*pThis)->m_length = 0;
    
    IL_0016: // ldarg.0
    IL_0017: // ldfld Char* m_staticBuffer
    IL_001c: // ldc.i4.0
    IL_001d: // conv.u
    IL_001e: // ceq
    IL_0020: // ldc.i4.0
    IL_0021: // ceq
    IL_0023: // stloc.0
        local0 = (((*pThis)->m_staticBuffer == (Int32)(UInt32)(0)) == 0);
    
    IL_0024: // ldloc.0
    IL_0025: // brtrue.s 2
        if (local0 != 0)
            goto IL_0029;
    
    IL_0027: // br.s 36
        goto IL_004d;
    
    IL_0029: // br.s 15
        goto IL_003a;
    
    IL_002b: // ldarg.0
    IL_002c: // dup
        (*temp0) = *(pThis);
    
    IL_002d: // ldfld UInt16 m_length
    IL_0032: // ldc.i4.1
    IL_0033: // add
    IL_0034: // conv.u2
    IL_0035: // stfld UInt16 m_length
        (*pThis)->m_length = (Int32)(UInt16)(((Int32)(*temp0)->m_length + 1));
    
    IL_003a: // ldarg.1
    IL_003b: // dup
        temp1 = pStr;
    
    IL_003c: // ldc.i4.2
    IL_003d: // conv.i
    IL_003e: // add
    IL_003f: // starg.s Char* pStr
        temp2 = pStr;
        pStr = (Char*)(((Int32)temp1 + (Int32)2));
    
    IL_0041: // ldind.u2
    IL_0042: // ldc.i4.0
    IL_0043: // ceq
    IL_0045: // ldc.i4.0
    IL_0046: // ceq
    IL_0048: // stloc.0
        local0 = ((((UInt16)*(UInt16*)temp2) == 0) == 0);
    
    IL_0049: // ldloc.0
    IL_004a: // brtrue.s -33
        if (local0 != 0)
            goto IL_002b;
    
    IL_004c: // nop
    IL_004d: // ret
        goto Exit;
    
    Exit:
        pThread->StackPointer -= 1;
    }

     

    Note at line IL_003f, a temp local is used to store the value of pStr before it is assigned a different value. At line IL_0048, that temp local is used instead of the now changed pStr. This fixes the bug with the off-by-one I was seeing previously.

  • User profile image
    BitFlipper

    On a different but related topic...

    I was wondering how to deal with debugging an application written for a microcontroller while using this .Net -> C++ compiler. Previously I was thinking that I would try to figure out what the .Net Micro Framework does to enable debugging, but I think that is a big task since I'll probably need to create a VS debugging plugin etc. Also it would be difficult (although not impossible) to reliable get the state of all variables etc while in the debugger.

    So I thought of a completely different way to enable debugging using this .Net -> C++ compiler. Basically, the idea is to not even do the .Net -> C++ compilation at all during debugging. Instead, there will be a "core" (or kernel) component running on the microcontroller that contains all of the "internal" calls that you can make to, say, set or get the state of an IO pin. I already implemented "internal" calls in my compiler (which is not the same as PInvoke of course). So basically I can create a class in C# called InOutPin, which allows you to get or set the state of a specific pin. Inside this class, I have a method called, say, "GetPinState". I add a special attribute, like this:

    [VirtualCallAttribute(IsInternalCall = true, FunctionName = "IOPins::GetState")]
    bool GetPinState(int pinIndex)
    {
        // Actual C# code that will execute while running in "debug" mode in VS.
        // This code will not be compiled to C++, instead it will call the specified C++ function directly
    }

    So what happens is that if you run the application inside VS, it will simply treat GetPinState like any other method and execute the code it contains. On the other hand, when that method gets compiled to C++ code, that attribute will tell the compiler that this is actually an internal call and will call the specified function mentioned in the FunctionName parameter. All of this is already implemented and works (I did a bunch of string functions, like creating, concatenating etc).

    So my idea is to take advantage of that in the sense that I can add code into GetPinState (the C# version), that will make the same call into the microcontroller via a serial or USB port. So basically, when you start running the application in VS, the library you use will contain code to connect to the microcontroller which is running the kernel. That kernel will know when the debugger has connected and listen for any calls. Once the connection is made, if you then call GetPinState, the call will be remoted to the microcontroller which will then call the corresponding C++ function to get the pin's state.

    I don't foresee any real reasons why such an approach won't work, but there are some pros and cons that I can think of:

    Pros:

    • You can use VS to debug, including using advanced features like edit and continue. As far as VS is concerned, it's just another full .Net application.
    • Compilation and debugging should be fast, just a like any other .Net application (at least the parts that aren't calling into the microcontroller). You don't need to wait for your code to be compiled and first downloaded the microcontroller like you do now.

    Cons:

    • Remoted calls to the microcontroller might be slow (Netduino supposedly supports up to 3M baud). Not sure how slow this will be, but you probably won't be toggling pins at MHz speeds (the fastest speed you can toggle a pin in .Net MF on a Netduino is about 9KHz, compared to about 4-5MHz with native code on the same microcontroller).
    • You won't be debugging the compiled C++ code, you will be debugging the pre-compiled C# code. Because of this, you will be relying on the assumption that the .Net -> C++ compiler won't introduce any bugs into your code. Which is unlikely. This may or may not be a problem.
    • You won't know that the code you are writing isn't going to work on the microcontroller until you actually try to compile it with the special Release build. You might be using a class for which there isn't a version available, and you might be happily using WCF, XNA or WPF but only realize later on that it can't be compiled to C++.

    Anyway to me this solution sounds much easier to implement, and unless I'm missing something big, I don't see how it won't be able to work.

  • User profile image
    Dexter

    If so, I need to get a new temp variable and assigne it the value of the variable, and also replace the previous variable with the temp variable everywhere it shows up on the stack.

    Mmm, seems like a cool solution to me.

  • User profile image
    evildictait​or

    The C++ compiler will not be able to optimize my ref types away (and it shouldn't), because I store them on the virtual stack. Creating large numbers of temp ref types is not ideal because that will give more work to the GC, in addition to requiring a larger virtual stack (memory is quite limited on a microcontroller).

    Then don't put them on a virtual stack. Why would adding more temporaries hurt the GC? If the C/C++ compiler detects that two temporary registers hold the same value then they alias them and you end up with fewer local variables. The C/C++ also agressively reuses registers and stack positions for variables that die before other ones become live on release build (this is the reason why debugging variables doesn't always work on release builds).

  • User profile image
    BitFlipper

    ,evildictait​or wrote

    *snip*

    Then don't put them on a virtual stack. Why would adding more temporaries hurt the GC? If the C/C++ compiler detects that two temporary registers hold the same value then they alias them and you end up with fewer local variables. The C/C++ also agressively reuses registers and stack positions for variables that die before other ones become live on release build (this is the reason why debugging variables doesn't always work on release builds).

    That won't work because the way the GC works is that it should be able to get to any object that is currently in use. If I create a new object, but don't put a reference to it on the virtual stack, and the GC does a Mark/Sweep operation, that object's Mark function will never be called (via enumerating the objects on the virtual stack), and hence it's Mark flag will never be set. Then during the Sweep phase, the object will be collected.

    I don't believe the C++ compiler can or should optimize the ref types. Value types are fine, that can't cause any side-effects.

  • User profile image
    BitFlipper

    OK things are going quite well (except for the problem mentioned below). I've implemented GC functionality up to but not including compacting. Basically you can now do this:

    for (var idx = 0; idx < 10000000; idx++)
    {
        var obj = new SomeClass();
    }

    The GC will dispose of the objects that are no longer rooted. It also makes a Finalize call into each object but right now it isn't yet hooked up to the class's destructor. That should be easy.

    I was implementing boxing/unboxing but this exposed a flaw in my current compiler (boxing/unboxing works quite well). The problem is that code can branch somewhere into the "groupings" of instructions that ultimately becomes one of more lines of C++ code. While testing the following method:

     

    static void Main(string[] args)
    {
         var result = BoxTest(false);
    }


    private static int BoxTest(object obj)
    {
        if (obj is bool)
            return (bool)obj ? 1 : 2;
                
        if (obj == null)
            return 3;
        else
            return 4;
    }
    

     

    I found that my compiled C++ version was not doing the right thing. Here is the generated C++ code: 

    Void Usr_Program::Usr_Main(VirtualThread* pThread, InternalArray<VirtualString*>** args)
    {
        // Ref type locals
        VirtualObject** temp0 = (VirtualObject**)&pThread->StackPointer[1];
    
        memset(&pThread->StackPointer[1], 0, sizeof(void*) * 1);
        pThread->StackPointer += 1;
    
        // Value type locals
        Int32 local0 = 0;
        Int32 valueRetVal0 = 0;
    
    IL_0000: // nop
    IL_0001: // ldc.i4.0
    IL_0002: // box System.Boolean
        pThread->AllocObject(sizeof(VirtualBox<Bool>), (VirtualObject**)temp0);
        new (*temp0) VirtualBox<Bool>(0x0005, 0);
    
    IL_0007: // call Int32 BoxTest(System.Object)
        valueRetVal0 = Usr_Program::Usr_BoxTest(pThread, temp0);
    
    IL_000c: // stloc.0
        local0 = valueRetVal0;
    
    IL_000d: // ret
        goto Exit;
    
    Exit:
        pThread->StackPointer -= 1;
    }

     

    Int32 Usr_Program::Usr_BoxTest(VirtualThread* pThread, VirtualObject** obj)
    {
        // Value type locals
        Int32 local0 = 0;
        Bool local1 = false;
    
    IL_0000: // nop
    IL_0001: // ldarg.0
    IL_0002: // isinst System.Boolean
    IL_0007: // ldnull
    IL_0008: // cgt.un
    IL_000a: // ldc.i4.0
    IL_000b: // ceq
    IL_000d: // stloc.1
        local1 = ((((obj && (*obj)->IsInstanceOf(0x0005)) ? obj : null) > null)) == (0);
    
    IL_000e: // ldloc.1
    IL_000f: // brtrue.s 15
        if (local1 != 0)
            goto IL_0020;
    
    IL_0011: // ldarg.0
    IL_0012: // unbox.any System.Boolean
    IL_0017: // brtrue.s 3
        if ((((VirtualBox<Bool>*)(*obj))->Value) != 0)
            goto IL_001c;
    
    IL_0019: // ldc.i4.2
    IL_001a: // br.s 1
        goto IL_001d;
    
    IL_001c: // ldc.i4.1
    IL_001d: // stloc.0
        local0 = 1;
    
    IL_001e: // br.s 19
        goto IL_0033;
    
    IL_0020: // ldarg.0
    IL_0021: // ldnull
    IL_0022: // ceq
    IL_0024: // ldc.i4.0
    IL_0025: // ceq
    IL_0027: // stloc.1
        local1 = ((obj) == (null)) == (0);
    
    IL_0028: // ldloc.1
    IL_0029: // brtrue.s 4
        if (local1 != 0)
            goto IL_002f;
    
    IL_002b: // ldc.i4.3
    IL_002c: // stloc.0
        local0 = 3;
    
    IL_002d: // br.s 4
        goto IL_0033;
    
    IL_002f: // ldc.i4.4
    IL_0030: // stloc.0
        local0 = 4;
    
    IL_0031: // br.s 0
        goto IL_0033;
    
    IL_0033: // ldloc.0
    IL_0034: // ret
        return local0;
    
    }

    What I found was that the unboxing call at line IL_0017 was not being called at all. In fact, the compiler didn't even bother to compile that part of the code at all (it's completely missing from the disassembly). Closer inspection shows that as far as the test on line IL_0017 goes, it will always end up going to line IL_001d, hence it doesn't even bother.

    Obviously this is the wrong end result, because in the C# version of the method, the check for true/false has a real result on the return value of the method.

    I'm trying to think of ways to change the compiler to produce correct code, but so far I haven't thought of a good solution yet. This is obviously a solvable problem because all IL -> native code compilers have solved this problem. There shouldn't be any reason why I can't create functional C++ code as opposed to functional native code (if anything it should be easier).

    Anyway, one possible solution I've been thinking about is to take advantage of the fact that I've already done a pre-pass at this point and I know exactly which instructions are branch targets. I can use this information to break the instructions into smaller groups. The problem with this is that it is not always possible (or obvious) to break instructions into smaller groups. For instance, in this case I need to break the instructions between IL_001c and IL_001d (each one is a branch target). But how do I write IL_001c in C++ code?

    I'm just curious, what method is used to compile IL to native code? I'm using an evaluation stack based approach that accumulates expressions until an instruction comes along that does some real work, like storing, comparing, branching etc, based on the evaluation stack contents. And only then it output the code to perform that operation. Is there a different approach? At least the evaluation stack seems logical to me since it follows the same evaluation stack based approach of the IL very closely.

  • User profile image
    Dexter

    Closer inspection shows that as far as the test on line IL_0017 goes, it will always end up going to line IL_001d, hence it doesn't even bother.

    The test is not the problem, the problem is that the ldc 2 at IL_0019 is nowhere to be found in the generated C++ code. My guess is that the reason for this bug is the following:

    I'm using an evaluation stack based approach that accumulates expressions until an instruction comes along that does some real work, like storing

    The value (expression) that gets stored here was defined in two different places. Somehow you lost track of one of those definitions and ended up always writing 1. IL_001D can be reached with 2 different stack states but the generated code accounts for only for one.

    I'm just curious, what method is used to compile IL to native code?

    I don't know exactly what CLR does but what I'd do is convert the IL to some intermediate representation where there's no stack and where all instruction operands are explicit instead of implicit. Such explicit representation are far more easy to reason about.

    You may say that you aren't writing a compiler but getting familiar with some compiler techniques would be a good idea. Reading the code generation chapter(s) from the "Dragon Book" (http://dragonbook.stanford.edu/) is advisable.

  • User profile image
    BitFlipper

    @Dexter:

    ,Dexter wrote

    *snip*

    The test is not the problem, the problem is that the ldc 2 at IL_0019 is nowhere to be found in the generated C++ code. My guess is that the reason for this bug is the following:

    *snip*

    The value (expression) that gets stored here was defined in two different places. Somehow you lost track of one of those definitions and ended up always writing 1. IL_001D can be reached with 2 different stack states but the generated code accounts for only for one.

     

    Yes that is what I was trying to say, and I understand the problem in this case. The IL sees the two instructions as two distinct locations to jump to, while in the currently compiled C++ code, it translates to just one location, hence the C++ compiler rightly sees the test as superfluous and removes it.

    Interestingly, if you run the debug version of ILSpy, in addition the the "IL" and "C#" decompile options, it has an additional 53 (yes, I counted them) decompile options of various stages of transforms and optimizations. Here is what the method looks like when selecting the decompile option "ILAst (after InlineVariables)":

    [sorry once again the C9 forum insists on putting everything on one line, so using plaintext]
    --------------------------

    arg_1D_0
    var_0_1D : int32

        brtrue(IL_20, ceq(cgt.un(isinst([mscorlib]System.Boolean, ldloc(obj)), ldnull()), ldc.i4(0)))
        brtrue(IL_1C, unbox.any([mscorlib]System.Boolean, ldloc(obj)))

        arg_1D_0 = ldc.i4(2)
        br(IL_1D)

    IL_1C:
        arg_1D_0 = ldc.i4(1)

    IL_1D:
        stloc(var_0_1D, arg_1D_0)
            br(IL_33)

    IL_20:
        brtrue(IL_2F, ceq(ceq(ldloc(obj), ldnull()), ldc.i4(0)))
            stloc(var_0_1D, ldc.i4(3))
            br(IL_33)

    IL_2F:
        stloc(var_0_1D, ldc.i4(4))

    IL_33:
        ret(ldloc(var_0_1D))

    --------------------------

    So it used a temp variable called IL_1D_0 to store the result of both expressions lldc.i4(2) and ldc.i4(1). So the variable was associated with the instruction at IL_1D. This is similar to what I was trying by assigning the expression to temp variables, all I have to do now is figure out how to know when to use the same temp variable as opposed to grabbing a new one each time.

  • User profile image
    BitFlipper

    Hmm, did some benchmarks and got some interesting results. I have the following C# method:

    private static int PerformanceTest3()
    {
        var result = 0;
    
        for (var idx1 = 0; idx1 < 30000; idx1++)
        {
            var strResult = "";
    
            for (var idx2 = 0; idx2 < 1000; idx2++)
            {
                var intVal = 0;
                strResult = idx2.ToString();
                Int32.TryParse(strResult, out intVal);
                result += intVal;
            }
        }
    
        return result;
    }

    The release build of this the .Net method takes about 7 seconds to complete on my system. When I compile this using my .Net -> C++ compiler, the debug build takes about 45 seconds to complete, but the release build only takes 3 seconds to complete - more than twice as fast as the .Net version.

    Initially I though some optimizations must have resulted in the code not doing the right thing, but I confirmed that even with the C++ version that runs for just 3 seconds, that the return value is correct (2100098112 in this case). There is no way the C++ compiler can optimize away the ToString and TryParse calls.

    I am checking whether strResult is null in my Int32::TryParse call, and off the top of my head I can't see what else I should be checking. Also, I don't yet have a try/catch mechanism in place but I already have an idea how that will work and it should not add a lot of overhead. 

    My guess is that the .Net version is probably getting bogged down while dealing with culture related conversions when converting the value from int to string and back.

    Anyway, quite encouraging results.

  • User profile image
    davewill

    To all participants on this thread, thanks for taking the time to post.  Please continue.

  • User profile image
    BitFlipper

    OK I spent a lot of time fixing various issues with the .Net -> C++ compiler. Basically I ran into issues related to how different code paths through the same function can load different values onto the stack from different locations (and different variables), and hence I had to basically change my code so that each possible code path is evaluated. If two or more instructions load different values onto the stack, then a temp variable must be used, and then all places where that value can be popped must work back and ensure that all other places where that value gets pushed use the same temp variable.

    I had to rewrite a lot of the code because it just wasn't flexible enough to handle these kinds of scenarios. The updated code allows one to go back and retroactively modify variables (change constants to temps, swap a set of stack values for a temp, etc).

    This has complicated the code a lot but I guess that is typical of any type of compiler. Anyway, it seems to work fine now, and in the process it automatically fixed the issue mentioned above where the "jump to" location doesn't really exist and therefore the result could be incorrect.

    Off to the next issue... I want to implement context switching, keeping in mind that ultimately this will run on a microcontroller that has just one thread. For now, I'll do this on x86 since I use it to test all of this stuff. Once I get it to work, I just need to have ifdef sections for the different supported processors.

    For x86, I found this old web page. I'm not familiar with "AT&T assembly syntax", so I'm not 100% sure how to convert that to the Intel format that VS will understand. Here's my 1st attempt based on the linked page:

    void VirtualMachine::SwitchContext(VirtualThread* pOld, VirtualThread* pNew)
    {
        Int32 oldStack = (Int32)pOld->NativeStack; 
        Int32 newStack = (Int32)pNew->NativeStack; 
        
        __asm
        {
            mov eax, oldStack;
            mov ecx, newStack;
    
            // Save registers
            push ebp
            push ebx
            push esi
            push edi
    
            // Save old stack
            mov eax, esp
    
            // Load new stack
            mov esp, ecx 
    
            // Restore registers
            pop edi
            pop esi
            pop ebx
            pop ebp
            ret
        }
    }

    I'm a bit rusty on my x86 assembly, but what I don't understand is that the code is moving oldStack into eax, then a few lines later, esp is moved into eax. Is my translation of the original code even correct?

    Also, there seems to be a bit of an chicken and egg problem here. Obviously this code should only be called to switch between two threads that have already been running previously. So what I'm missing right now is a way to set up these stacks for the initial run (in my simulated Thread.Start call, which is implemented except for the fact that it doesn't yet set up the native stack).

    In addition, my assumption here is that my VirtualThread class will not only hold the "virtual stack", but also this native stack that will be used to do context switching for each thread. Up until now there was only one "thread" so it was just using whatever the stack was of the running thread.

    Interestingly, the compiler gives me this warning:

    warning C4731: 'VirtualMachine::SwitchContext' : frame pointer register 'ebp' modified by inline assembly code

    I assume this is a good thing because that is exactly what the intention was? Or is this a legit warning that I'm actually doing something wrong?

  • User profile image
    Dexter

    Interestingly, the compiler gives me this warning:

    I'll answer this first because the rest can't be done properly with this in the way. The compiler can generate prolog/epilog code for a function:

    SwitchContext:
                   push ebp         ; save the previous frame pointer
                   mov  ebp, esp    ; setup the frame pointer
                   sub  esp, 8      ; space for local variables
                   ...              ; function code
                   mov  esp, ebp    ; restore the stack pointer
                   pop  ebp         ; restore the frame pointer
                   ret


    Now it should be obvious what's the warning about. The compiler uses ebp for its own purposes and warns you if you change it in inline asm. This compiler generated code can be a problem when you want to implement context switching because it affects the stack layout. You can implement SwitchContext even if such code is present but creating the stack for a new thread will be a problem because you don't know what stack layout SwitchContext expects.

    A possible solution is to use __declspec(naked) (which prevents such code from being generated) and __fastcall (which causes the first 2 arguments of the function to be passed in registers ecx and edx).

    I'm a bit rusty on my x86 assembly, but what I don't understand is that the code is moving oldStack into eax, then a few lines later, esp is moved into eax. Is my translation of the original code even correct?

    Nope. In the original version there were some parathesis which you ignored. Something like mov %esp, (%eax) converts to mov [eax], esp.

    Here's an example that uses __declspec(naked) and __fastcall):

    __declspec(naked) void __fastcall SwitchContext(VirtualThread* pOld, VirtualThread* pNew)
    {
        // N.B. the following code assumes that NativeStack is at offset 4 in VirtualThread. If that's not true
        // then the appropiate offset needs to be used when storing/loading the stack.
        __asm
        {        
            // Save registers
            push ebp
            push ebx
            push esi
            push edi
            // Save old stack
            mov [ecx+4], esp    // store to pOld->NativeStack
            // Load new stack
            mov esp, [edx+4]    // load from pNew->NativeStack
            // Restore registers
            pop edi
            pop esi
            pop ebx
            pop ebp
            ret
        }
    }

    So what I'm missing right now is a way to set up these stacks for the initial run

    To create a new thread you have to allocate memory for its native stack and setup the stack exactly the same as in SwitchContext does. Once the thread stack is properly initialized you can simply call SwitchContext to start the new thread.

    In addition, my assumption here is that my VirtualThread class will not only hold the "virtual stack", but also this native stack that will be used to do context switching for each thread.

    Yes, if you create thread you'll need to also allocate the native stacks yourself.

    void InitializeContext(VirtualThread* pNew, void *startFunction)
    {    
        int stackSize = 4096;
        char *stack = reinterpret_cast<char *>(malloc(stackSize)); // allocate some space, malloc is just an example
        stack += stackSize; // go to the top of the stack because the stack grows down
        
        __asm
        {        
            mov ecx, esp // save current stack pointer
            mov esp, stack

            mov eax, startFunction
            push eax    // push the start function address as the return address (of SwitchContext)

            // Save registers
            xor eax, eax // let's always start a thread with zeroed registers
            push eax
            push eax
            push eax
            push eax

            mov stack, esp
            mov esp, ecx // restore the stack pointer
        }
       
        pNew->NativeStack = stack;
    }

  • User profile image
    BitFlipper

    @Dexter:

    Most excellent, sir. I will try out your suggestions and let you know how it goes. Your help is greatly appreciated.

    [EDIT] Just one observation... Since the native stack is growing down, I can probably share the same block of memory for both the virtual and native stack (so they'll be growing towards each other). Then I can easily do a check at strategic points to see how close the current virtual stack and native stack pointers are getting to each other. When they are within some predefined limit of each other, I can throw a stack overflow exception. This way I only need to deal with one area of "wasted space" per thread vs two, and a single check for stack overflow.

  • User profile image
    BitFlipper

    Since I don't need anything else other than the native stack pointers in SwitchContext, I can probably just use the following and don't worry about the offset of NativeStack into VirtualThread:

    __declspec(naked) void __fastcall SwitchContext(void* pOld, void* pNew)
    {
        __asm
        {        
            // Save registers
            push ebp
            push ebx
            push esi
            push edi
            // Save old stack
            mov [ecx], esp    // store to pOld->NativeStack
            // Load new stack
            mov esp, [edx]    // load from pNew->NativeStack
            // Restore registers
            pop edi
            pop esi
            pop ebx
            pop ebp
            ret
        }
    }

    Then I just call it like this (where both pOld and pNew are VirtualThread):

    SwitchContext(&pOld->NativeStack, &pNew->NativeStack);

    EDIT: I changed the title to reflect the current part of the project I'm working on.

  • User profile image
    BitFlipper

    OK, for my StartMainThread function, I came up with this:

    void NativePlatform::StartMainThread(RuntimeThread* pThread, void* pData, void *pStartAddress)
    {    
        void* stack = pThread->NativeStack;
    
        __asm
        {        
            mov eax, pThread 
            mov ebx, pData
            mov ecx, pStartAddress
            mov esi, ebp
            mov edi, esp
    
            mov ebp, stack
            mov esp, ebp
    
            push esi
            push edi
            push ebx
            push eax
    
            call ecx
    
            pop eax
            pop ebx
            pop edi
            pop esi
    
            mov esp, edi
            mov ebp, esi
        }
    }

    I need to pass in pThread and pData as two parameters into the start function. This seemed to work just fine until I started using std::wcout in my code, after which I get a heap corruption error when the application exits. If I call the start function directly without going through StartMainThread, then I don't get the corruption dialog when I exit. Obviously something is still wrong with the above asm code.

    Any idea what I'm doing wrong?

  • User profile image
    Dexter

    How is startAddress function declared?

  • User profile image
    BitFlipper

    Like this:

    void (*m_startAddressData)(RuntimeThread*, VirtualObject**);

    You are thinking that maybe the calling convention is wrong?

  • User profile image
    Dexter

    "You are thinking that maybe the calling convention is wrong?"

    Unlikely. It can be wrong only if you changed the default from cdecl to stdcall or fastcall.

    Hmm, the code looks correct. The 4 pops after the calls aren't really needed but they don't hurt.

    Heap corruption + stack allocated using malloc (I assume)... are you sure the intial native stack pointer is correct?

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.