For example, based on my understanding of how Windows works, I could choose a slightly disingenuous example to beat even Singularity's ABI count:
#include "stdafx.h"
#include <Windows.h>
#include <intrin.h>
#define ITERATION_COUNT 1000000
int _tmain(int argc, _TCHAR* argv[])
{
size_t start, end;
size_t i;
size_t cycles;
DWORD _result = 0;
while(TRUE)
{
// setup the thread so the scheduler doesn't get in the way of our measurements:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST);
Sleep(10); // yield the thread
// we now have a full timeslice to play with:
start = __rdtsc();
for(i = 0; i < ITERATION_COUNT; i++)
{
_result ^= GetTickCount();
}
end = __rdtsc();
cycles = end - start;
cycles /= ITERATION_COUNT;
printf("Cycles: %d\n", cycles);
}
return 0;
}
Which prints
Cycles: 11
Cycles: 11
Cycles: 11
Cycles: 12
Cycles: 11
Cycles: 11
Cycles: 11
Cycles: 10
Cycles: 11
on my machine - six times faster than the number that Singularity is claiming victory with.
It's slightly disingenuous because GetTickCount() doesn't actually perform a switch into kernel-mode, even though the result is computed by the kernel, but rather uses a special region of memory designed precisely for sharing precomputed values between the kernel and usermode (which seems like a fair comparison if the Singularity team are going to play shenanigans with "kernel api"s that just return precomputed results in order to cheat on benchmarks)
But even if I choose something that actually does do a proper kernel mode switch such as NtClose(NULL), you'll see a nearly six-fold difference between what a syscall actually costs and what that paper is reporting it to cost.
So in summary: benchmarks without a good understanding of what the benchmark is benchmarking and without careful analysis of whether the benchmark is a valid comparison have a tendency to be devious and to bias strongly in favour of whatever the author wants it to say.
So yeah. Again, I call shenanigans.