@Dexter:

Your idea about the structure per function call is interesting but it is going to bloat the code a lot (or at least the source files) since I'll need to create a unique structure for each call.

My IL to C++ converter is coming along nicely. I have implemented roughly 70% of all IL instructions, enough to get a very good idea about performance. I tested it with a method that calls into another method with various parameters that then loads a string and does some basic math. It ran at about 80% - 90% of the C# version. It was able to complete ~71M calls/sec. Not too bad, considering I have not tried to optimize the code yet.

The following have been implemented:

  • Generics
  • Arrays
  • Function calls (static, instance, virtual). P/Invoke should be easy to add.
  • "Proxy" functions, used to handle static and instance calls on native types, like Int32, bool, etc. I chose not to create structs for these because I think it would be faster to just treat them as native types and use proxy functions (like Int32 Int32_Parse("123")).
  • Strings. This includes the static (internal) strings, which are really fast to load (IL ldstr). I just pre-create an array of all of the static strings and specify the index to the array in the C++ code.
  • The first part of the GC which includes making all ref types accessible in order to mark them before the sweep operations, and allowing updating of any pointers that have changed due to compacting. Classes are movable in theory, but I have not tested this yet.
  • About 70% of all IL instructions.
  • Some other stuff I forgot about.

I did run into a bizarre problem though, maybe someone can help me out.

I have the following C# method:

public class VirtualString : VirtualObject
{
    private UInt16 m_length;
    private unsafe char* m_buffer;

    public unsafe VirtualString(char* pStr)
        : base()
    {
        m_buffer = pStr;
        m_length = 0;

        if (m_buffer == null)
            return;

        while (*pStr++ != 0)
            m_length++;
    }
}

The IL looks like this (I modified ILSpy to show the instruction operand as well): 

    IL_0000: 0x02                    ldarg.0
    IL_0001: 0x28 0x060000a2         call instance void PostCompiler.VirtualTypes.VirtualObject::.ctor()
    IL_0006: 0x00                    nop
    IL_0007: 0x00                    nop
    IL_0008: 0x02                    ldarg.0
    IL_0009: 0x03                    ldarg.1
    IL_000a: 0x7d 0x040001c8         stfld char* PostCompiler.VirtualTypes.VirtualString::m_buffer
    IL_000f: 0x02                    ldarg.0
    IL_0010: 0x16                    ldc.i4.0
    IL_0011: 0x7d 0x040001c7         stfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length
    IL_0016: 0x02                    ldarg.0
    IL_0017: 0x7b 0x040001c8         ldfld char* PostCompiler.VirtualTypes.VirtualString::m_buffer
    IL_001c: 0x16                    ldc.i4.0
    IL_001d: 0xe0                    conv.u
    IL_001e: 0xfe01                    ceq
    IL_0020: 0x16                    ldc.i4.0
    IL_0021: 0xfe01                    ceq
    IL_0023: 0x0a                    stloc.0
    IL_0024: 0x06                    ldloc.0
    IL_0025: 0x2d 0x02               brtrue.s IL_0029

    IL_0027: 0x2b 0x24               br.s IL_004d

    IL_0029: 0x2b 0x0f               br.s IL_003a
    // loop start (head: IL_003a)
        IL_002b: 0x02                    ldarg.0
        IL_002c: 0x25                    dup
        IL_002d: 0x7b 0x040001c7         ldfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length
        IL_0032: 0x17                    ldc.i4.1
        IL_0033: 0x58                    add
        IL_0034: 0xd1                    conv.u2
        IL_0035: 0x7d 0x040001c7         stfld uint16 PostCompiler.VirtualTypes.VirtualString::m_length

        IL_003a: 0x03                    ldarg.1
        IL_003b: 0x25                    dup
        IL_003c: 0x18                    ldc.i4.2
        IL_003d: 0xd3                    conv.i
        IL_003e: 0x58                    add
        IL_003f: 0x10 0x01               starg.s pStr
        IL_0041: 0x49                    ldind.u2
        IL_0042: 0x16                    ldc.i4.0
        IL_0043: 0xfe01                    ceq
        IL_0045: 0x16                    ldc.i4.0
        IL_0046: 0xfe01                    ceq
        IL_0048: 0x0a                    stloc.0
        IL_0049: 0x06                    ldloc.0
        IL_004a: 0x2d 0xdf               brtrue.s IL_002b
    // end loop

    IL_004c: 0x00                    nop

    IL_004d: 0x2a                    ret
}

My IL -> C++ compiler creates the following C++ code: 

/////////////////////
// T8012_VirtualString.T8012_Ctor_VirtualString()
/////////////////////
Void T8012_VirtualString::T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, Char* pStr)
{

    // Value type locals
    Bool local0 = false;

IL_0000: // ldarg.0
IL_0001: // call: Void .ctor()
    (*pThis)->T8003_VirtualObject::T8003_Ctor_VirtualObject(pThread, (T8003_VirtualObject**)pThis);

IL_0006: // nop
IL_0007: // nop
IL_0008: // ldarg.0
IL_0009: // ldarg.1
IL_000a: // stfld: Char* m_buffer
    (*pThis)->m_buffer = pStr;

IL_000f: // ldarg.0
IL_0010: // ldc.i4.0
IL_0011: // stfld: UInt16 m_length
    (*pThis)->m_length = 0;

IL_0016: // ldarg.0
IL_0017: // ldfld: Char* m_buffer
IL_001c: // ldc.i4.0
IL_001d: // conv.u
IL_001e: // ceq
IL_0020: // ldc.i4.0
IL_0021: // ceq
IL_0023: // stloc.0
    local0 = (((*pThis)->m_buffer == (Int32)(UInt32)(0)) == 0);

IL_0024: // ldloc.0
IL_0025: // brtrue.s: 2
    if (local0 != 0)
        goto IL_0029;

IL_0027: // br.s: 36
    goto IL_004d;

IL_0029: // br.s: 15
    goto IL_003a;

IL_002b: // ldarg.0
IL_002c: // dup
IL_002d: // ldfld: UInt16 m_length
IL_0032: // ldc.i4.1
IL_0033: // add
IL_0034: // conv.u2
IL_0035: // stfld: UInt16 m_length
    (*pThis)->m_length = (Int32)(UInt16)(((Int32)(*pThis)->m_length + 1));

IL_003a: // ldarg.1
IL_003b: // dup
IL_003c: // ldc.i4.2
IL_003d: // conv.i
IL_003e: // add
IL_003f: // starg.s: Char* pStr
    pStr = (Char*)(((Int32)pStr + (Int32)2));

IL_0041: // ldind.u2
IL_0042: // ldc.i4.0
IL_0043: // ceq
IL_0045: // ldc.i4.0
IL_0046: // ceq
IL_0048: // stloc.0
    local0 = ((((UInt16)*(UInt16*)pStr) == 0) == 0);

IL_0049: // ldloc.0
IL_004a: // brtrue.s: -33
    if (local0 != 0)
        goto IL_002b;

IL_004c: // nop
IL_004d: // ret
    return;
}

EDIT: Here is the C++ class declaration. Note that "Char" is defined as "wchar_t".

/////////////////////
// Original Name:  System.String
// Native Name:    T8012_VirtualString
// Native TypeRef: 0x8012
/////////////////////
class T8012_VirtualString : public T8003_VirtualObject
{
public:
    UInt16 m_length;
    Char* m_buffer;

    // Constructors
    T8012_VirtualString();
    Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis);
    Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, Char* pStr);
    Void T8012_Ctor_VirtualString(VirtualThread* pThread, T8012_VirtualString** pThis, UInt16 length);

    // Special Functions
    virtual UInt16 GetTypeId() { return 0x8012; }
    virtual Bool IsInstanceOf(UInt16 otherId) { return otherId == 0x8012 || otherId == 0x8003; }

    static Int32 get_Length(VirtualThread* pThread, T8012_VirtualString** pThis);
};

 

The problem is that the code does something different from what the original one does. It always sets m_length to one less than what it should be. The C# version works correctly (pass in "123", and m_length is set to 3), but the C++ version leaves m_length one too small. I'm confused as to how that can happen (pass in "123" and m_length is set to 2). As far as I can see, my C++ code does exactly what the IL tells it to do. Any ideas?

Note I can turn the IL labels and instruction comments on/off but turned both on to see the relationship to the original IL better.