@AndyC: in theory, yes. In practice, every time we experimented with that we ended up with a solution that was arguably worse (and more complex) than writing the whole parser by hand. To be fair, I'm observing the technology from a pretty odd angle...
@vesuvius: may be specific to my line of work, but you wouldn't believe the number of structured data I come across that live in some custom format, often not fully documented and "improved" over the years by successive augmentation of the original syntax. Lots of scientific instruments tend to produce this kind of junk, it seems.
Sure, you can always write your own parsers using one of the many fine languages you mentioned, but it's a slow and pedantic job that often results in poorly maintainable code (as a reflection of the poorly defined syntax). A language - or rather, a meta-language - that allows you to clearly map the syntax would be a great help. And apparently there isn't a lot of choice in that area, especially if you expect any sort of user-friendliness or even just sanity.
Since Oslo got scattered among various projects, the M language kind of fell under the radar and whatever documentation was available is now either defunct or obsolete.
It was an interesting language and I thought it had some potential, for instance as a cleaner yacc replacement. Does anyone know if there is any activity on the language, either by Microsoft or some independent party?
Considering that NTFS is less sensitive to fragmentation and that on SSD and Flash drives in general defragmentation ranges from useless to harmful, I think that's a technology that is earmarked for oblivion, as was the case for RAM and disk compressors a few years ago.
@Charles: interesting discussion, even though I find the basic point rather moot: JS is not necessarily the best language to act as an assembly language, but it's the only choice we have; it would take forever to have a viable alternative.
Also, I doubt that creating a new language on top of JS would solve much; any improvement is obviously welcome, but I think that the real game changer would be enabling a whole class of languages (as in all those implemented on .NET) to be used on the web. That's what would finally enable us to use the right tool for the job and get developers more likely to embrace the web as an application platform.
Just my 2 cents, of course...
P.S. and I got you red handed defining JS "a simple scripting language"
If performance allows it, I would rather see an IL to JS converter, not the translation of a specific language. One size doesn't fit all, and neither do two or three sizes... I wonder what would Chakra (or other highly optimizing JS engines) do in this case.
@BitFlipper: actually, I asked a similar question on this forum and you kindly provided that figure, thanks again for that.
While that was more than good enough for my purposes, I don't think that number can be extended to the general case.
It's impossible to answer your question correctly just comparing small chunks of code... they are in no way representative of a real scenario. Anyway, just to get a general idea, let's look at the issue a little closer... I already know that I'm going to be as inaccurate as it gets, bear with me.
Let's assume we need to flip the first pin of some IO port. The fastest native code loop would be composed of three instructions, something like:
move GPIOAddress, 0x1
move GPIOAddress, 0x0
Most ARM instructions execute in just one cycle (on average) except that jumps flush the three-stage pipeline. This means our little program takes five cycles to execute each loop, which gets us a 9.6MHz frequency on a 48MHz processor. Awful duty cycle and all that, but that's fine for now.
The IL code of the main loop you used would require more than just three instructions, I'd expect something like this:
... // initialization skipped as it is just a one-time tax
IL_0001 ldloc.0 // load the instance of the IO pin class
IL_0002 ldc.i4.1 // push "true" on the stack
IL_0003 call IO.Write // call the native method
IL_0004 ldloc.0 // load the instance of the IO pin class
IL_0005 ldc.i4.0 // push "false" on the stack
IL_0006 call IO.Write // call the native method
IL_0007 br.s IL_0001
That's seven IL instructions. The code expansion is partially due to the use of a stack machine, but the main difference is that we used an OO approach, where the pin is represented by an instance of some IO class. The implementation of the Write method would require something like this:
retrieve the bool argument from the stack
retrieve the "this" pointer from the stack
get the address of the port this pin instance is linked to
get the bit mask for this pin
read the current status of the port
if the value is true, OR the bit mask with the port status
otherwise, AND it with the complement
write back the new status
I don't have the ARM specs on hand now, but let's assume that all instructions take just one cycle... that's 11 instructions, the last of which flushes the pipeline (so it ends up costing 3 cycles).
Let's make a little thought experiment here... let's assume that each IL instruction can be executed as fast as an ARM opcode. That would take our code up to some (theoretical) 39 cycles which would yield a frequency of 1.23 MHz. That's just 128 times faster than the 9.6kHz you reported for managed code... even excluding the native method (that would stay native), this would mean that the interpreter is using less than 400 cycles on average per IL instruction. That's not too bad all considered...
(sorry for the long rambling... it's just a subject I happen to be quite fond of)
@BitFlipper: I didn't claim it was difficult to solve, just that it was expensive. As you correctly deduced you need to keep track of the type of every value that gets pushed onto the stack, either using a separate stack or by "decorating" each stack slot (memory alignment might make this quite wasteful). Either way you are consuming more precious RAM and more CPU cycles. Might not seem much, but once you consider the sheer number of stack operations involved you see how this piles up.
A second interesting point is that MSIL essentially describes a pure stack machine (as in "without registers"). That was a good choice as it allows high level compilers to generate code that is CPU agnostic and because it yields compact code. The obvious disadvantage of stack machines is that they require roughly twice as many operations to execute as compared to the equivalent register machine. On the desktop this is ok as the JIT is CPU-specific and knows how to handle registers, but an interpreter will be stuck with the inefficiency.
Just to give you a sense of what I mean: as you probably know, in order to execute a simple "ADD" instruction you need to:
pop the first operand from the stack and store it into a register
pop the second operand from the stack and store it into another register
perform the addition
push the result onto the stack
Seems harmless, but once you consider that executing the next operation will most likely start with "pop the first operand into a register", it screams bloody murder. Unfortunately, an interpreter cannot predict what the next operation will do so it will have to go through the dance pushing and popping values uselessly.
There are several other concerns (field access, reevaluation of the "this" pointer, parameter passing etc.) and optimizations that cannot be performed on a stack machine, but I'll leave them for another time.
These are shortcomings of the whole idea of interpreting MSIL, not flaws in the implementation of the MF interpreter which I'm assuming to be quite good (MS has far more brilliant developers than it's generally credited for). So, if you are in for big performance improvements, I think it would be more productive to try and change the whole game, not trying to shave off cycles from the interpreter. The only issue I can foresee is that you may end up trading code size for RAM and performance; while this is generally ok as RAM and CPU muscle are scarcer than Flash space, it might be an interesting challenge to fit all the libraries on your device. You may have to go for a hybrid solution.
At any rate, even considering all the inefficiencies I listed, I still cannot figure out how you might get the 1:1000 figure you mention. What are you comparing the performance to, exactly?
@BitFlipper: I suspect one of the sources of the poor performance you are witnessing is that are not just comparing native code vs interpreted IL, you are also missing out on all the optimizations the JIT is in charge of.
Another issue is that MSIL is not very interpreter friendly. For instance, while the java bytecode contains different opcodes depending on the type of its operand (e.g. dadd, fadd, iadd), MSIL does not as it relies on the fact that the JIT can always determine statically the stack state at any given point and emit the correct sequence of machine instructions.
Unfortunately, a pure interpreter cannot do that, which means it must keep track of the stack contents somehow and that's expensive both in terms of memory and CPU cycles.
So, maybe you can shave a few cycles here and there in the interpreter, but I suspect you won't get significant performance gains that way; at least not the kind of improvements you couldn't get by upgrading your hardware.
I would probably try either to translate MSIL to a different bytecode (trading code size for performance) or, better, to compile chunks of code to native. This doesn't necessarily rule out cooperative multithreading: you could always inject code that keeps track of some sort of timeslice and traps to the VM when it runs out.