Tech Off Thread

77 posts

IL to C++ compiler - Need advice on implementing context switching

Back to Forum: Tech Off
  • User profile image
    BitFlipper

    After playing around with writing an IL interpreter in C# (for fun, not profit - see here), I got the idea to write an IL to C++ compiler that will take a .Net *.exe and do the following:

    • Do a static analysis of the code to determine which methods and types are actually in the code path, and ignore everything else. Done
    • Create substitute types for all the commonly used types like Int8, Int16, Int32 etc, object, string, DateTime, TimeSpan, etc. Since a lot of the "native" types make internal calls, they need to be replaced with "pure" types. Unfortunately I would also need to create equivalents to other commonly used classes, since any methods that call into internal calls would be flagged as an error. Mostly done (although not all of the base type methods and properties have been implemented yet, just some to test the feasibility of the approach).
    • Ensure the type system is fully managed. This means that the actual type of an "object" (or void*) can be determined at runtime. I can take this all the way to what reflection can do right now in .Net, it just depends on how much metadata I care to store in the exe. For instance, right now I don't store a type's namespace, its name or its member layouts etc, but it would be trivial to add it if I find I need it. Done.
    • Implement generics. Done.
    • Implement boxing/unboxing. Not done yet but should not be too difficult.
    • Implement PInvoke. Not implemented yet but this should be relatively trivial since we are already in C++ land.
    • Compile a C++ header file, declaring each used type and method. Done
    • Compile a C++ source file with method implementations. Of course this involves the actual compiling from IL instructions to C++ code but from my previous experience with the IL interpreter I think this would be relatively easy although time consuming. Not done yet.

    Advantages of this approach:

    • You can take the C++ files and compile them for a microcontroller using any supported C++ compiler.
    • This should run at almost "native" speed. Currently the .Net framework runs about 100 to 1000 times slower on a microcontroller than the equivalent native code.
    • You can use the "full" .Net as opposed to just the .Net Micro Framework, although most of the framework classes' methods would not be available due to them making internal calls (this is a per-method limitation, so some methods might work just fine).
    • The exe generated by this should be very small, since only the methods and types that are actually used in the call graph are compiled. For instance, if you never use the method DateTime.Now, that code will never make it into the final exe. This has a big advantage to even the Micro Framework, which still requires all of the .Net runtime to be present on the microcontroller.

    I don't have a problem writing the part of the GC that walks the object graph marking all referenced objects and then deleting unreferenced ones. Since I have full control over the C++ class generation, I intend to implement something like this:

    class VirtualObject
    {
    public:
        Bool m_mark;
    
        virtual void Mark()
        {
            m_mark = True;
        }
    };
    
    class VirtualMyClass : VirtualObject
    {
    public:
        VirtualObject m_variableA;
        SomeOtherObject m_variableB;
    
        virtual void Mark()
        {
            if (m_mark)
                return;   // prevent infinite loops if two objects reference each other directly or indirectly
    
            VirtualObject::Mark();
    
            if (m_variableA)
                m_variableA->Mark();
    
            if (m_variableB)
                m_variableB->Mark();
        }
    };

    The above code would be trivial to create. There would be a "virtual stack" where all "ref types" will be created and referenced to. Now all the GC has to do is to look at all the static variables and the "virtual stack", and call Mark() on all variables derived from VirtualObject that are not null. Then walk all of the currently allocated VirtualObject types and delete the ones that do not have the mark flag set, and clear the mark flag on the ones that do have it set (for the next pass).

    OK, that was way more than I intended to write, but it helps to get the full picture of what I'm doing. Now to the questions I have...

    So marking and deleting objects are taken care of. But what about compacting the remaining objects? I'm wondering how one would go about implementing VirtualObjects that are "movable". I have some ideas by they seem way too complex, and it involves having multiple levels of indirection. This doesn't seem ideal either because even if you have an array of objects and use an index as a "handle" for each, if you then allocate 1,000,000 objects, but only end up keeping a reference to the last one, you are stuck with a lookup array that has 999,999 items that are null but just the last one is not null. You can't compact the array since something is referencing to the item at index 999,999.

    So what is the correct way to implement a system where object are fully "movable"? Even JITted code would still need to implement this. How do the smart guys do it?

  • User profile image
    Dexter

    Now all the GC has to do is to look at all the static variables and the "virtual stack", and call Mark() on all variables derived from VirtualObject that are not null.

    This means that you know where the references to the objects are. If you know this then you can simply update the references when the objects move. An additional level of indirection is not required and it's not efficient anyway.

  • User profile image
    BitFlipper

    ,Dexter wrote

    *snip*

    This means that you know where the references to the objects are. If you know this then you can simply update the references when the objects move. An additional level of indirection is not required and it's not efficient anyway.

    I think you are right. I was thinking that a currently executing function might have a local reference that could not be updated in such cases but if done correctly then this might not be a problem. So all "local" references to objects should really be references to objects on the "virtual object stack" as either a pointer to the location on the virtual stack, or an index to the virtual stack.

    I'll need to give this some more thought but as I said I believe you are correct.

    Somewhat related: Since microcontrollers don't support preemptive multithreading (or at least not the ARM microcontroller I'm playing with) I need to emulate that. The way I'm thinking of doing that is something like this:

    void SomeClass::SomeFunction(VirtualThread* thread)
    {
        // ...code...
        
        if (VirtualMachine.PendingContextSwitch)
            VirtualMachine.SwitchContext(thread);
        
        // ...code...
    }

    Usually the static VirtualMachine.PendingContextSwitch value would only be checked at the end of a function, or inside a loop (since there is no way to know how long the loop might take to complete). Also if the function is really long then I might add some additional checks.

    The VirtualMachine.SwitchContext() call will do a context switch on the CPU. This is obviously very CPU dependent, and on the ARM processor you call the SWI instruction. Not sure how to do it on an x86 processor but it would not be too difficult to figure out (I'll stay away from x64 for now).

    The VirtualMachine.PendingContextSwitch value itself would be set to true either periodically by a timer in VirtualMachine on x86, or by a hardware timer interrupt on the ARM processor.

    The above mechanism has the advantage that I can control exactly where the code can be preempted, and maybe update all local references (if it is even needed) if VirtualMachine.SwitchContext returns true. Something like:

    void SomeClass::SomeFunction(VirtualThread* thread)
    {
        // ...code...

        if (VirtualMachine.PendingContextSwitch) && VirtualMachine.SwitchContext(thread)
        {
            // update all local references
        }

        // ...code...
    }

    Since most of the time PendingContextSwitch will be false, the check should not add much overhead, even inside tight loops. Once again, would this be the way the smart guys do it, or am I completely missing something here?

    EDIT: Does anyone know what is up with the code formatting? Sometimes it insists putting everything on the same line, no matter whether you put newlines or Shift+newlines. I removed the code from the code blocks to make them readable.

  • User profile image
    Dexter

    Hmm, there's one problem I forgot to mention. In addition to stack and static variables there's one more place where you need to look for references: CPU registers.

    This seems to be rather complicated to solve even in the absence of multithreading due to the fact that you generate C++ code, not assembly. Imagine this:

    void SomeMethod() {
    ...
    foo bar = new foo();
    ...
    }

    Let's say that new runs out of memory and triggers a GC. Let's also say that there's no threading and that the GC happens on the same thread. Even in this simple case you have no idea what the CPU registers contain when the GC code is entered, it's up to the compiler.

    As for threading: I'm not familiar with these microcontrollers but I'm not sure why you would need all that machinery. As long as the CPU supports interrupts you should be able to implement preemptive multithreading without the need for this Pending/Switch stuff. And if the CPU doesn't have interrupts then who is going to set Pending anyway? You may need some form of cooperative multithreading in that case.

  • User profile image
    BitFlipper

    Well, in this case it might be something along these lines:

    [Edit: I changed the code since there were some errors with the stack pointers]

    class VirtualThread
    {
        VirtualObject** StackBase;
        VirtualObject** StackPointer;
    }
       

    int SomeMethod(VirtualThread* thread)
    {
        int retVal;
        VirtualObject** stackStart = thread->StackPointer;
        *thread->StackPointer = new foo();
        VirtualObject** bar = thread->StackPointer++;
        ((foo*)*bar)->DoSomething();
        // ...code...
       
        // The following lines of code would be changed to the code below it
        // if (someValue)
        //     return 123;

        if (someValue)
        {
            retVal = 123;
            goto Exit:
        }
       
        // ...code...

    Exit:
        thread->StackPointer = stackStart;
        return retVal;
    }

    Or something along those lines. I do think there would need to be at least one level of indirection for local and static ref types.

    Edit: I don't think with the above code there should be an issue with CPU registers since I am explicitly casting what is on the "virtual stack" to an instance of the actual object each time. The object can move between those calls. This is especially true if I use the context switching mechanism I proposed above.

    Edit Edit: About the preemptive multitheading: If you think about it, even if you use a timer or interrupt, you still need to know which virtual thread to switch to. I'm not exactly sure about the details of context switching and register saving etc, but it will basically need to come down to saving all of the registers into some storage (preferably in the VirtualThread class), and then restoring the state from another VirtualThread and continuing execution where it left off.

    So as far as the GC is concerned, now it just needs to enumerate through all VirtualThreads' stacks starting at StackBase up to StackPointer, and if it finds any non-null values there, it just calls VirtualObject::Mark() on it. Now you know exactly which objects are still being used as "locals" withing the executing functions (in addition to also calling VirtualObject::Mark() for each non-null static object).

    Edit Edit Edit: I just realized that the "this" can also move while the class is executing code so it means that a "this" pointer will need to be passed on the stack as well, and all "this" references will need to go through this pointer as well.

    Obviously this is getting more complex but I still think it would be way faster than the current performance we see with the Micro Framework.

  • User profile image
    Dexter

    I don't think with the above code there should be an issue with CPU registers since I am explicitly casting what is on the "virtual stack" to an instance of the actual object each time.

    That doesn't matter too much. If you have code like the following:

    ((foo*)*bar)->DoSomething();((foo*)*bar)->DoSomething();

    the compiler can (and usually will) perform the cast once and store the result into a register. However, this should work fine if you're allocating GC memory between those 2 lines. The compiler can't reuse the value from the previous cast because it has no way to prove that *bar wasn't changed by the memory allocation function.

    Unfortunately this compiler optimization means that you have no way of doing normal preemptive multithreading, you have to use SwitchContext. I'm not what that will do to performance but I see no other way of doing this given the constraints.

    I'm not exactly sure about the details of context switching and register saving etc, but it will basically need to come down to saving all of the registers into some storage (preferably in the VirtualThread class),

    Pretty much yes. One possibility is to save the registers on the current thread's stack and then switch the stack. It's also possible that you won't need to save the registers if you use the SwitchContext approach because the compiler will probably save the "used" registers when it calls the function.

  • User profile image
    BitFlipper

    My question is what does the "real" framework do with this when it JITs the code? Surely the same issues apply? We know that ref types can move around memory. How does it handle it?

    BTW I now make all functions static, even the instance functions, since I now also pass my own custom "this" pointer and there is no need for the real this pointer.

    So, the following C# method...

    class Foo
    {
        int Multiply(int x, int y)
        {
            return x * y;
        }
    }

    ...now becomes (at tleast the declaration)...

    class Foo : public VirtualObject
    {
        static int Multiply(VirtualThread* pThread, VirtualObject** pThis, int x, int y);
    }

    Edit: I can't make virtual functions static. I don't believe this will be a problem since AFAIC the virtual tables themselves are static, so it doesn't matter if the class instance moves around memory.

  • User profile image
    Dexter

    Surely the same issues apply?

    Not really. Since CLR's JIT outputs assembly code it has knowledge about what's in the registers or about points where it is safe to stop the code for GC because there's no pointer in any register. Since you're generating C++ code you don't have this luxury, you're at the mercy of the compiler.

  • User profile image
    evildictait​or

    ,Dexter wrote

    *snip*

    Not really. Since CLR's JIT outputs assembly code it has knowledge about what's in the registers or about points where it is safe to stop the code for GC because there's no pointer in any register. Since you're generating C++ code you don't have this luxury, you're at the mercy of the compiler.

    Actually the CLR doesn't make any assumptions like that. During a garbage collect on a particular thread, it suspends the thread (either by suspending it from another thread or during a call into the runtime, e.g. during a new object allocation). The registers are stored onto the stack and the stack is then inspected for any value that lies within the bounds of the current heap. Any value that falls within an object is marked as live for this round. The root-set computed is therefore a super-set of the "true" root-set, since an integer which happens to hold a value that would also be a valid pointer into the heap might cause a value to live longer than it strictly should, but this doesn't matter since this is much faster than using type information to derive the true root set.

    At this point the thread can be resumed.

    Added to the root set (which is rounded to the nearest object and now has type information which is stored with the object) is all of the objects from the other thread's root sets, all of the static variables (using type information) and thread local variables.

    1) Mark all objects as "black"

    2) Mark all objects from the current live set (computed above) as "gray"

    3) While there are still objects in the gray set:

      4) Add all objects reachable from that object which are in the black set to the gray set.

      5) Add the object to the white set.

    6) Goto 3 until the gray set is empty.

    7) Now all objects reachable from the live set are colored white, and all objects colored black are not reachable from the live set.

  • User profile image
    evildictait​or

    ,BitFlipper wrote

    Edit: I can't make virtual functions static. I don't believe this will be a problem since AFAIC the virtual tables themselves are static, so it doesn't matter if the class instance moves around memory.

    Virtual tables aren't static:

    class myClass {
      int a;
      static int b;
      virtual void myFunc();
      myClass() {}
    }

    looks like this in C:


    static int _myClass_b;
    typedef struct
    {
      int m_a;
      void* myFunc
    } MyClass; 

    void _myClass(MyClass* cls)
    {
      cls->myFunc = &MyClass_myFunc;

  • User profile image
    BitFlipper

    @evildictaitor:

    Thanks for the info about the GC. I'm using a "virtual" stack that only contains ref types so I don't have the problem with objects being incorrectly flagged due to matching int values, etc.

    As far as the vtables are concerned, I believe the vtables themselves are "static" in the sense that there is just one vtable for each type. So even if you create 100 instances of the same type, they all have a hidden pointer that point to the exact same vtable. Hence if you move the class around in memory, it won't matter since it will still point to the same vtable.

    From Wikipedia: "Typically, the compiler creates a separate vtable for each class. When an object is created, a pointer to this vtable, called the virtual table pointer, vpointer or VPTR, is added as a hidden member of this object" http://en.wikipedia.org/wiki/Vtable

     

  • User profile image
    Dexter

    Actually the CLR doesn't make any assumptions like that.

    Actually it does.

    The registers are stored onto the stack

    Actually no. While the kernel may indeed store the registers on the stack when the thread is suspended the only way for an application to get those registers is to call GetThreadContext. Sure, the CONTEXT that you pass to that function can be stored on stack but that's irrelevant here.

    the stack is then inspected for any value that lies within the bounds of the current heap. Any value that falls within an object is marked as live for this round. The root-set computed is therefore a super-set of the "true" root-set, since an integer which happens to hold a value that would also be a valid pointer into the heap might cause a value to live longer than it strictly should

    Nope, this is not what MS's CLR does. What you're describing here sounds like coservative GC, not compacting GC.

    but this doesn't matter since this is much faster than using type information to derive the true root set.

    Actually it matters a lot because it prevents objects from being moved which was one the main questions in this thread. You can't move objects unless you know exactly where pointers are. Or are you suggesting to modify random values in memory that happen to look like pointers?

  • User profile image
    BitFlipper

    @Dexter:

    Yes I'm finding a lot of complexities while trying to implement a compacting GC. The way I'm planning on doing it is with the "virtual stack" I mentioned previously. Each thread will have such a stack, and each of a function's ref type variables and parameters are really a pointer to a location on this stack like this:

    MyRefType** foo = (MyRefType**)&(++pThread->StackPointer);

    So while the value that is stored in the virtual stack can change as the class is moved around, foo itself never needs to change. When moving classes around, what I will need to do is check each value on each thread's virtual stack and update any pointers that have changed. Still trying to think what will be the most efficient way of doing this, but I believe in concept it will work.

    For static ref type variables, I will just create one C function that is hardwired to check each class's static ref variables and call the Mark mentioned in the original post. Also there will be one C function that is hardwired to update any static ref pointers that have changed.

    For class ref type variables, each class will have a hardwired function that can be used to update any of it's ref type pointers that could have changed. Something like:

    void MyRefType::UpdatePointer(VirtualObject* oldPointer, VirtualObject* newPointer)
    {
        if (m_foo == oldPointer)
            m_foo = newPointer;

        if (m_foo != NULL)
            m_foo->UpdatePointer(oldPointer, newPointer);
       
        // Do the same for all other class ref types...
    }

    [Sorry, I'm not doing the code blocks again, sometimes C9 loses it's mind and insists that all code should be on one line only. I'll just indent the text for now (which apparently doesn't seem to work either)]

    Hmm, now that I think about it, that might not work because classes that are not reachable will never be called. If those classes are not rooted but they are still performing some sort of operations (maybe in some long running loop or a timer callback), their class variables will not be updated. So probably I will need to call UpdatePointer on each class while walking the object heap. In that case the UpdatePointer function will only update its own ref type variables, not call into UpdatePointer for each of its non-null ref objects. So it simplifies to:

    void MyRefType::UpdatePointer(VirtualObject* oldPointer, VirtualObject* newPointer)
    {
        if (m_foo == oldPointer)
            m_foo = newPointer;
        
        // Do the same for all other class ref types...
    }

    On a different topic, I ran into an interesting problem. So I have this concept that I will need to create "pure" classes due to the fact that many of the standard classes make internal calls, which I cannot decompile and create C++ code for. So just as an experiment, I added these lines of code to my C# test app:

    var list = new List<int>();
    list.Add(1);
    list.Add(2);

    And my C++ code suddenly got hugely bloated with all sorts of seemingly unrelated classes like the CultureInfo, etc with all of their huge amounts of static string resources etc. It seems that List is referencing those classes somewhere directly or indirectly, possibly through its static constructor. So I'm pulling in all of those types and doing a call graph analysis on their static constructors (which I need to call for all types I include), which is probably why it is detecting that all of those need to be included. Also in most cases it eventually hits an internal call and all bets are off.

    Not sure how to work around that problem. Maybe I can do a better call graph analysis and only include static variables if the call graph actually references them (I might even do this for all variables - if they are never referenced in the call graph, I think they can be completely ignored, no?).

    It won't be fun if I have to create "pure" versions of all the generic types, etc. But remember that the bar in this case is extremenly low: The .Net Micro Framework doesn't even support generics at all, so the fact that I do is already a big advantage given the ultimate goal of creating C++ code that can be compiled for a microcontroller.

    Yet another different topic: Right now you can debug your .Net MF code running on a microcontroller using VS, just like any other .Net app (no edit and continue though). Since the .Net MF code is open source, I plan to figure out how it is done and ultimately (far down the line), I'd like to support debugging from VS. I can store as much or as little metadata for each type as I want, so I could make it such that using some VS plugin, I can map a current instance of any type back to the original type in the original .Net code (and pass it's member values to the debugger as well), so I think it should be possible to make a VS debugging plugin. Has anyone had experience with this, and how difficult would such a project be?

  • User profile image
    Dexter

    and each of a function's ref type variables and parameters are really a pointer to alocation on this stack like this:

    One thing I don't understand is what are you going to do about value types that contain pointers and are stored on the stack. Did I miss something?

    that might not work because classes that are not reachable will never be called. If those classes are not rooted but they are still performing some sort of operations (maybe in some long running loop or a timer callback), their class variables will not be updated.

    That should never happen, if an object is not reachable then it's garbage and it will be collected. If you pass a GC pointer to native code then you know what you need to do: use "fixed" or GCHandle to prevent the GC from moving/collecting the object.

    So probably I will need to call UpdatePointer on each class while walking the object heap.

    Watch out that moving objects and updating pointers can be extremly tricky. I'm not talking about finding the roots which can be itself a problem, I'm talking about how can you update pointers in an efficient way. One thing is clear: you can't move one object and then call UpdatePointers on all the other objects in the heap, that has quadratic complexity.

    Since evildictait​or mentioned it: have you considered using Boehm GC instead?http://en.wikipedia.org/wiki/Boehm_garbage_collector

    And my C++ code suddenly got hugely bloated with all sorts of seemingly unrelated classes like the CultureInfo

    Hmm, not sure what CultureInfo has to do with List<>. Maybe the Sort method uses it somehow. Since you wrote that code you should be able to modify it to log some information about why a particular type has been included, right? Big Smile If I understand correctly now you're including all the methods of an used type, you may consider excluding methods that are never called.

    so I think it should be possible to make a VS debugging plugin. Has anyone had experience with this, and how difficult would such a project be?

    Absolutely no idea but if you can make all the rest work (compiler, GC etc.) then you should be able to write such a plugin too Smiley

     

  • User profile image
    BitFlipper

    ,Dexter wrote

    One thing I don't understand is what are you going to do about value types that contain pointers and are stored on the stack. Did I miss something?

    Yes, a very good point. I've thought of two ways to solve that problem:

    1. The "easy way out" would be to check whether a struct has any ref type fields and then treat it as a ref type (convert it to a class, which should be trivial). Not ideal since the point of using structs might be to put them in an array. But this could be the initial "solution".
    2. A better but more difficult solution: Any structs that have ref type fields also get the same hard-coded functions to manage their pointers just like classes get (Mark and UpdatePointer). Then when the structs get initialized as locals in a function, a pointer to them is pushed onto the virtual stack. Now when the GC walks the stack, those structs will also have their pointer management functions called. For arrays, they can be treated as structs (not boxed), but the array class that holds structs (or ref types), will know how to call each of its array items' pointer management functions.

    That should never happen, if an object is not reachable then it's garbage and it will be collected. If you pass a GC pointer to native code then you know what you need to do: use "fixed" or GCHandle to prevent the GC from moving/collecting the object.

    Are you sure? Let's say the class has a method like this:

    void MyClass.DoSomething()
    {
        while (!someBoolThatWillAlwaysBeTrue)
            someRefObjField.DoSomethingElse();
    }

    That is certainly not "illegal" code and the compiler can't guard against it. Now if the original ref to MyClass goes out of scope, then it is no longer reachable and it will be GC'd (and hence someRefObjField won't be reachable either and will be GC'd).

    Watch out that moving objects and updating pointers can be extremly tricky. I'm not talking about finding the roots which can be itself a problem, I'm talking about how can you update pointers in an efficient way. One thing is clear: you can't move one object and then call UpdatePointers on all the other objects in the heap, that has quadratic complexity.

    Yes, and I was wondering about that too. Not sure if it would be faster to send a Dictionary<,> of changed pointers or not. I was thinking that I might just move one class at a time, hence making the GC "incremental". On a microcontroller it might be more important for the GC to not lock everything up for long periods of times instead of absolute performance. There is often somewhat of a "realtime" nature to microcontrollers with its hardware interrupts etc.

    Since evildictait​or mentioned it: have you considered using Boehm GC instead?http://en.wikipedia.org/wiki/Boehm_garbage_collector

    I'll take a look, thanks.

    Hmm, not sure what CultureInfo has to do with List<>. Maybe the Sort method uses it somehow. Since you wrote that code you should be able to modify it to log some information about why a particular type has been included, right?Big Smile

    It possibly came in via one of the exception classes. I'll need to create substitutes for all of those too, so once I do, it might end up filtering out a lot of the unneeded stuff.

    If I understand correctly now you're including all the methods of an used type, you may consider excluding methods that are never called.

    That is what I'm already doing with my "call graph". I start at the Assembly.EntryPoint and look at all the IL instructions for any method calls. Any types that get referenced along the way gets added too. This process is repeated recursively for all methods found within the IL.

    What I'm suggesting is improving on this process by stripping the fields out of types that were never referenced during all of the code paths. So if a class has a member field called SomeHugeClass, but the code path never references it, it can be safely removed it altogether. How else is it going to be referenced (other than via the static constructors which I'l treat in the same way)? Of course this makes reflection impossible but that is fine, I think reflection can be left out for now - it is a microcontroller after all.

  • User profile image
    Dexter

    A better but more difficult solution: Any structs that have ref type fields also get the same hard-coded functions to manage their pointers just like classes get

    That sounds good. And I might have a slightly different idea, treat the stack frame as objects:

    • for each method create a struct that contains all the method parameters and locals
    • when you call a method allocate such a struct on the stack (by using alloca or other means)
    • fill the parameters
    • call the method with a pointer to the allocated struct
    • you'll also need to store pointers to these stack frames on that virtual stack of yours or simply chain the stack frames into a linked list (pointed to by the thread)

    Pros:

    • "everything" is an object, you don't need separate cases for "real" objects and stack frames
    • this works transparently with value types that contain references
    • this might avoid the register problem completly. if a method makes a "alloc" call the compiler won't be able to prove that the stack frame object hasn't been modified and it will have to reload any registers that have been previously read from stack frame

    Cons:

    • This will very likely disable some compiler optimizations. Usually some of the parameters are passed through registers, this can't happen anymore since the only real parameter will be the pointer to stack frame. Similarly for local variables.

    That is certainly not "illegal" code and the compiler can't guard against it. Now if the original ref to MyClass goes out of scope, then it is no longer reachable and it will be GC'd

    MyClass can't go out of scope, if one of its methods is running then there must be a stack frame for it and that stack frame must contain the "this" pointer for MyClass.

    On a microcontroller it might be more important for the GC to not lock everything up for long periods of times instead of absolute performance.

    Sounds like you want generational GC too. Even more complicated Big Smile.

    What I'm suggesting is improving on this process by stripping the fields out of types that were never referenced during all of the code paths.

    Sure, you could do that. Not sure if it helps with the List<T> case. I looked at the code and it's indeed possible that CultureInfo gets dragged in by an exception (see the Capacity prop). But it's unlikely that you can eliminate that by doing some static code analysis.

  • User profile image
    BitFlipper

    @Dexter:

    Your idea about the structure per function call is interesting but it is going to bloat the code a lot (or at least the source files) since I'll need to create a unique structure for each call.

    My IL to C++ converter is coming along nicely. I have implemented roughly 70% of all IL instructions, enough to get a very good idea about performance. I tested it with a method that calls into another method with various parameters that then loads a string and does some basic math. It ran at about 80% - 90% of the C# version. It was able to complete ~71M calls/sec. Not too bad, considering I have not tried to optimize the code yet.

    The following have been implemented:

    • Generics
    • Arrays
    • Function calls (static, instance, virtual). P/Invoke should be easy to add.
    • "Proxy" functions, used to handle static and instance calls on native types, like Int32, bool, etc. I chose not to create structs for these because I think it would be faster to just treat them as native types and use proxy functions (like Int32 Int32_Parse("123")).
    • Strings. This includes the static (internal) strings, which are really fast to load (IL ldstr). I just pre-create an array of all of the static strings and specify the index to the array in the C++ code.
    • The first part of the GC which includes making all ref types accessible in order to mark them before the sweep operations, and allowing updating of any pointers that have changed due to compacting. Classes are movable in theory, but I have not tested this yet.
    • About 70% of all IL instructions.
    • Some other stuff I forgot about.

    I did run into a bizarre problem though, maybe someone can help me out.

    I have the following C# method:

    public class VirtualString : VirtualObject
    {
        private UInt16 m_length;
        private unsafe char* m_buffer;
    
        public unsafe VirtualString(char* pStr)
            : base()
        {
            m_buffer = pStr;
            m_length = 0;
    
            if (m_buffer == null)
                return;
    
            while (*pStr++ != 0)
                m_length++;
        }
    }
    

    The IL looks like this (I modified ILSpy to show the instruction operand as well): 

        IL_0000: 0x02                    ldarg.0
        IL_0001: 0x28 0x060000a2         call instance void PostCompiler.VirtualTypes.VirtualObject::.ctor()
        IL_0006: 0x00                    nop
        IL_0007: 0x00                    nop
        IL_0008: 0x02                    ldarg.0
        IL_0009: 0x03                    ldarg.1
        IL_000a: 0x7d 0x040001c8         stfld char* PostCompiler.VirtualTypes.VirtualString::m_buffer
        IL_000f: 0x02                    ldarg.0
        IL_0010: 0x16                    ldc.i4.0
        IL_0011: 0x7d 0x040001c7         stfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length
        IL_0016: 0x02                    ldarg.0
        IL_0017: 0x7b 0x040001c8         ldfld char* PostCompiler.VirtualTypes.VirtualString::m_buffer
        IL_001c: 0x16                    ldc.i4.0
        IL_001d: 0xe0                    conv.u
        IL_001e: 0xfe01                    ceq
        IL_0020: 0x16                    ldc.i4.0
        IL_0021: 0xfe01                    ceq
        IL_0023: 0x0a                    stloc.0
        IL_0024: 0x06                    ldloc.0
        IL_0025: 0x2d 0x02               brtrue.s IL_0029
    
        IL_0027: 0x2b 0x24               br.s IL_004d
    
        IL_0029: 0x2b 0x0f               br.s IL_003a
        // loop start (head: IL_003a)
            IL_002b: 0x02                    ldarg.0
            IL_002c: 0x25                    dup
            IL_002d: 0x7b 0x040001c7         ldfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length
            IL_0032: 0x17                    ldc.i4.1
            IL_0033: 0x58                    add
            IL_0034: 0xd1                    conv.u2
            IL_0035: 0x7d 0x040001c7         stfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length
    
            IL_003a: 0x03                    ldarg.1
            IL_003b: 0x25                    dup
            IL_003c: 0x18                    ldc.i4.2
            IL_003d: 0xd3                    conv.i
            IL_003e: 0x58                    add
            IL_003f: 0x10 0x01               starg.s pStr
            IL_0041: 0x49                    ldind.u2
            IL_0042: 0x16                    ldc.i4.0
            IL_0043: 0xfe01                    ceq
            IL_0045: 0x16                    ldc.i4.0
            IL_0046: 0xfe01                    ceq
            IL_0048: 0x0a                    stloc.0
            IL_0049: 0x06                    ldloc.0
            IL_004a: 0x2d 0xdf               brtrue.s IL_002b
        // end loop
    
        IL_004c: 0x00                    nop
    
        IL_004d: 0x2a                    ret
    }

    My IL -> C++ compiler creates the following C++ code: 

    /////////////////////
    // T8012_VirtualString.T8012_Ctor_VirtualString()
    /////////////////////
    Void T8012_VirtualString::T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, Char* pStr)
    {
    
        // Value type locals
        Bool local0 = false;
    
    IL_0000: // ldarg.0
    IL_0001: // call: Void .ctor()
        (*pThis)->T8003_VirtualObject::T8003_Ctor_VirtualObject(pThread, (T8003_VirtualObject**)pThis);
    
    IL_0006: // nop
    IL_0007: // nop
    IL_0008: // ldarg.0
    IL_0009: // ldarg.1
    IL_000a: // stfld: Char* m_buffer
        (*pThis)->m_buffer = pStr;
    
    IL_000f: // ldarg.0
    IL_0010: // ldc.i4.0
    IL_0011: // stfld: UInt16 m_length
        (*pThis)->m_length = 0;
    
    IL_0016: // ldarg.0
    IL_0017: // ldfld: Char* m_buffer
    IL_001c: // ldc.i4.0
    IL_001d: // conv.u
    IL_001e: // ceq
    IL_0020: // ldc.i4.0
    IL_0021: // ceq
    IL_0023: // stloc.0
        local0 = (((*pThis)->m_buffer == (Int32)(UInt32)(0)) == 0);
    
    IL_0024: // ldloc.0
    IL_0025: // brtrue.s: 2
        if (local0 != 0)
            goto IL_0029;
    
    IL_0027: // br.s: 36
        goto IL_004d;
    
    IL_0029: // br.s: 15
        goto IL_003a;
    
    IL_002b: // ldarg.0
    IL_002c: // dup
    IL_002d: // ldfld: UInt16 m_length
    IL_0032: // ldc.i4.1
    IL_0033: // add
    IL_0034: // conv.u2
    IL_0035: // stfld: UInt16 m_length
        (*pThis)->m_length = (Int32)(UInt16)(((Int32)(*pThis)->m_length + 1));
    
    IL_003a: // ldarg.1
    IL_003b: // dup
    IL_003c: // ldc.i4.2
    IL_003d: // conv.i
    IL_003e: // add
    IL_003f: // starg.s: Char* pStr
        pStr = (Char*)(((Int32)pStr + (Int32)2));
    
    IL_0041: // ldind.u2
    IL_0042: // ldc.i4.0
    IL_0043: // ceq
    IL_0045: // ldc.i4.0
    IL_0046: // ceq
    IL_0048: // stloc.0
        local0 = ((((UInt16)*(UInt16*)pStr) == 0) == 0);
    
    IL_0049: // ldloc.0
    IL_004a: // brtrue.s: -33
        if (local0 != 0)
            goto IL_002b;
    
    IL_004c: // nop
    IL_004d: // ret
        return;
    }

    EDIT: Here is the C++ class declaration. Note that "Char" is defined as "wchar_t".

    /////////////////////
    // Original Name:  System.String
    // Native Name:    T8012_VirtualString
    // Native TypeRef: 0x8012
    /////////////////////
    class T8012_VirtualString : public T8003_VirtualObject
    {
    public:
        UInt16 m_length;
        Char* m_buffer;
    
        // Constructors
        T8012_VirtualString();
        Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis);
        Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, Char* pStr);
        Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, UInt16 length);
    
        // Special Functions
        virtual UInt16 GetTypeId() { return 0x8012; }
        virtual Bool IsInstanceOf(UInt16 otherId) { return otherId == 0x8012 || otherId == 0x8003; }
    
        static Int32 get_Length(VirtualThread* pThread, T8012_VirtualString** pThis);
    };

     

    The problem is that the code does something different from what the original one does. It always sets m_length to one less than what it should be. The C# version works correctly (pass in "123", and m_length is set to 3), but the C++ version leaves m_length one too small. I'm confused as to how that can happen (pass in "123" and m_length is set to 2). As far as I can see, my C++ code does exactly what the IL tells it to do. Any ideas?

    Note I can turn the IL labels and instruction comments on/off but turned both on to see the relationship to the original IL better.

  • User profile image
    BitFlipper

    It seems to me it would work right if the instruction at IL_0029 was removed. Of course I can't change the generated code manually to fix the bug since the real bug is obviously in my compiler, but I can't figure out what it is supposedly doing wrong.

    EDIT: OK I found that if I change the C# code to this:

            public unsafe VirtualString(char* pStr)
                : base()
            {
                m_buffer = pStr;
                m_length = 0;
    
                if (m_buffer == null)
                    return;
    
                while (*pStr != 0)
                {
                    m_length++;
                    pStr++;
                }
            }
    

    I get the right results in the C++ version. This indicates that the problem is in how I handle "*pStr++".

    EDIT EDIT:I wonder if it has something to do the the "dup" instructions. The way I handle dup is to clone the value of the item on the top of the "virtual" stack and push it so that there are two. Also, the way this virtual stack works is that it contains text that represents the local variable, parameter name or a previous operation. So in the case of the dup at IL_003b, the text on the stack will be "pStr". So when I perform the dup operation, the two top items on the stack now contain "pStr". I wonder whether I need to create a completely new local variable when I come across a dup operation.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.