For example, based on my understanding of how Windows works, I could choose a slightly disingenuous example to beat even Singularity's ABI count:
#define ITERATION_COUNT 1000000
int _tmain(int argc, _TCHAR* argv)
size_t start, end;
DWORD _result = 0;
// setup the thread so the scheduler doesn't get in the way of our measurements:
Sleep(10); // yield the thread
// we now have a full timeslice to play with:
start = __rdtsc();
for(i = 0; i < ITERATION_COUNT; i++)
_result ^= GetTickCount();
end = __rdtsc();
cycles = end - start;
cycles /= ITERATION_COUNT;
printf("Cycles: %d\n", cycles);
on my machine - six times faster than the number that Singularity is claiming victory with.
It's slightly disingenuous because GetTickCount() doesn't actually perform a switch into kernel-mode, even though the result is computed by the kernel, but rather uses a special region of memory designed precisely for sharing precomputed values between the kernel and usermode (which seems like a fair comparison if the Singularity team are going to play shenanigans with "kernel api"s that just return precomputed results in order to cheat on benchmarks)
But even if I choose something that actually does do a proper kernel mode switch such as NtClose(NULL), you'll see a nearly six-fold difference between what a syscall actually costs and what that paper is reporting it to cost.
So in summary: benchmarks without a good understanding of what the benchmark is benchmarking and without careful analysis of whether the benchmark is a valid comparison have a tendency to be devious and to bias strongly in favour of whatever the author wants it to say.