Coffeehouse Post

Single Post Permalink

View Thread: Has anyone considered writing an MSIL interpreter in C#?
  • User profile image
    bdesmet

    @evildictaitor: Don't underestimate the C# compiler wrt the switch statement. A few examples:

    • Switches with a low number of cases can become if statements.
    • Switches with reasonably contigious ranges can be turned into the switch instruction. Holes can be patched with a jump target to the default case.
    • Switches with a non-zero case can get rebased using an add or sub instruction in order to use the switch instruction again.
    • Switches with sparse case values can be turned into nested/hierarchical switch instructions, using any of the above techniques applied.
    • Switches on string cases can become if statements, but if there are more than a treshold number (currently 6 IIRC), a Dictionary<string, int> is lazily constructed, using the int values for switch instruction use.

    Just a little example:

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Bar(int i, out int j)
    {
        switch (i)
        {
            case 17:
                j = 1;
                break;
            case 19:
                j = 2;
                break;
            case 20:
                j = 3;
                break;
            case 25:
                j = 4;
                break;
            default:
                j = 0;
                break;
        }
    }

    Code above got compiled without /o+, so you'll see redundant nop instruction and excessive st/ld pairs in the IL. Using SOS, we dump the IL and the generated assembler:

    0:004> !name2ee switch.exe Program.Bar
    Module:      000007ff000330d8
    Assembly:    switch.exe
    Token:       0000000006000002
    MethodDesc:  000007ff000340e0
    Name:        Program.Bar(Int32, Int32 ByRef)
    JITTED Code Address: 000007ff001501d0
    0:004> !dumpil 000007ff000340e0
    ilAddr = 000000000137207c
    IL_0000: nop 
    IL_0001: ldarg.0 
    IL_0002: stloc.0 
    IL_0003: ldloc.0 
    IL_0004: ldc.i4.s 17
    IL_0006: sub 
    IL_0007: switch (IL_0023, IL_0037, IL_0028, IL_002d)
    IL_001c: ldloc.0 
    IL_001d: ldc.i4.s 25
    IL_001f: beq.s IL_0032
    IL_0021: br.s IL_0037
    IL_0023: ldarg.1 
    IL_0024: ldc.i4.1 
    IL_0025: stind.i4 
    IL_0026: br.s IL_003c
    IL_0028: ldarg.1 
    IL_0029: ldc.i4.2 
    IL_002a: stind.i4 
    IL_002b: br.s IL_003c
    IL_002d: ldarg.1 
    IL_002e: ldc.i4.3 
    IL_002f: stind.i4 
    IL_0030: br.s IL_003c
    IL_0032: ldarg.1 
    IL_0033: ldc.i4.4 
    IL_0034: stind.i4 
    IL_0035: br.s IL_003c
    IL_0037: ldarg.1 
    IL_0038: ldc.i4.0 
    IL_0039: stind.i4 
    IL_003a: br.s IL_003c
    IL_003c: ret 

    Notice the sub instruction to rebase the switch statement and the insertion of case 18 equals default case by a jump to IL_0037 in the switch instruction's table. Case 25 which is further away from the switch range encountered here is handled by fall-through after the switch instruction and a jump to IL_0032. If you'd have a cluster of cases close to 25 again, you'd see another switch instruction being born.

    I'll leave it as a good SOS exercise to the reader to perform a !U on the JITTED code address for different platform architectures and unravvle the assember generated...

    0:004> !u 000007ff001501d0
    Normal JIT generated code
    Program.Bar(Int32, Int32 ByRef)
    Begin 000007ff001501d0, size 62
    ...

    Also don't forget order of cases in a switch statement doesn't matter, so the techniques described above can be applied even if cases don't appear in a sorted lexical order.