Coffeehouse Thread

9 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

The Singularity Project (with results)

Back to Forum: Coffeehouse
  • User profile image
    LiquidBoy

    I found a nice Singularity paper that actually goes into some performance statistics..

    Shows across the board Cycle reductions at the OS level for many systems within the SinguarlityOS.

    I really want to see this Midori project's managed SingularityOS work (with dev platform) ... It really does sound exciting...

     

    http://www.cs.rochester.edu/~sandhya/csc256/seminars/ryates_singularity.pdf

     

     

    e.g. results

     

    Software Isolated Process: Creation

    ============================

    [Process create and start]

    OS  - Cycles

    Singularity - 353,000

    freeBSD 5.3 - 1,030,000

    Linux 2.6.11 (Red Hat FC4) - 719,000

    Windows XP (SP2)  - 5,380,000

     

     

    Software Isolated Process: Syscall

    ============================

    [Minimum Kernel API Call]

    OS - Cycles

    Singularity - 91

    freeBSD 5.3 - 878

    Linux 2.6.11 (Red Hat FC4) - 437

    Windows XP (SP2) - 627

     

    etc. etc. (lots more stats in pdf above)

  • User profile image
    Bass

    That's pretty cool. So it allows the OS to run without a MMU while some (if not all?) of the process isolation benefits. I'd be interesting to learn a bit more about Singularities limitations though, as it doesn't seem to be touched upon that much in these viewgraphs (two bullets) - I'm not entirely convinced software-based process isolation can be made as robust as hardware-based process isolation without some sacrifices in the kind of software you can write.

  • User profile image
    magicalclick

    eally hope to see this OS being used in the wild. Managed OS solved so many problems. R

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • User profile image
    evildictait​or

    , LiquidBoy wrote

    I found a nice Singularity paper that actually goes into some performance statistics..

    Shows across the board Cycle reductions at the OS level for many systems within the SinguarlityOS.

    Not wanting to rain on this kid's parade, but Singularity processes can't be trivially compared to Windows or Free BSD processes.

    For one, as Bass points out, Singularity makes no serious attempt to enforce security boundaries between processes; a single memory corruption anywhere in the runtime gives root access to the system.

    Similarly because there's no process separation, there's no protection against side-channel attacks in the kernel to leak crypto-secrets, passwords and other private events going on in the kernel or other processes.

    There's also no backwards compatability requirement for Singularity, which means that unlike Windows, Singularity gets to basically choose what it defines a process create to mean. It certainly doesn't have to bother with session management in Win32k or set up GDI tables, or initialize the process for TLS slots and create Pebs and Tebs and Nls sections, and it doesn't have to parse prefetch tables or initialize app-compat shims. So claiming that Singularity is more by noting that it does less is a bit of an unfair comparison.

    And finally, since there's no actual syscall mechanism, Singularity can't really pretend that it has one and claim a win there.

    All in all, I simply can't take Singularity seriously until it addresses the glaring problem: Singularity cannot run backwards-compatible apps, and until it decides to seriously address this issue, it simply isn't an OS that can be compared apples-to-apples with OSes like Windows, Mac and Linux, and instead probably deserves comparisons more akin to the XBox OS or microcontroller OSes.

  • User profile image
    JohnAskew

    , magicalclick wrote

    eally hope to see this OS being used in the wild. Managed OS solved so many problems. R

    What's with the sentence-based pig latin?

  • User profile image
    Blue Ink

    , evildictait​or wrote

    *snip*

    Not wanting to rain on this kid's parade, but Singularity processes can't be trivially compared to Windows or Free BSD processes.

    For one, as Bass points out, Singularity makes no serious attempt to enforce security boundaries between processes; a single memory corruption anywhere in the runtime gives root access to the system.

    Similarly because there's no process separation, there's no protection against side-channel attacks in the kernel to leak crypto-secrets, passwords and other private events going on in the kernel or other processes.

    As far as I understand, Singularity only allows execution of code it can prove to be well-behaved. If that works as advertised, any further check or protection mechanism is redundant. I'm not sure why this would be any less secure than any other insulation system.

    ...

    All in all, I simply can't take Singularity seriously until it addresses the glaring problem: Singularity cannot run backwards-compatible apps, and until it decides to seriously address this issue, it simply isn't an OS that can be compared apples-to-apples with OSes like Windows, Mac and Linux, and instead probably deserves comparisons more akin to the XBox OS or microcontroller OSes.

    I don't understand which backwards-compatible apps you are talking about here. I see this as a v1.0 OS (a singularity, indeed), so it's not surprising that it doesn't run old code. Why should this make it less of an OS?

  • User profile image
    LiquidBoy

    @evildictaitor:  I found this other document that talks about the architecture of Singularity including background information on how they came up with those cycles calculations..

     

    ftp://ftp.research.microsoft.com/pub/tr/TR-2005-135.pdf

     

    Would love your input seeing as you know this stuff intimately (assuming you have time ofcourse as its 44 pages long)  BUT its a great read Smiley

     

     

    Quote from the document

    Singularity is a micro-kernel operating system that uses advances in programming languages and compilers to build lightweight, software-isolated processes, which provide code with protection and failure isolation at lower overhead than conventional, hardware supported processes. Singularity provides an isolation boundary by running verifiably safe programs and by preventing object pointers from passing between processes' object spaces.

    SIPs, in turn, enable a new solution to the problem of code extension in systems and applications. In Singularity's model, extensions are not loaded into their parent process, but instead run in their own process and communicate over strongly typed channels. This model fixes some of the major problems with extensions, since in Singularity, they cannot directly access their parents' data or interfaces, and, if they fail, they can be easily terminated by killing their parents.

    Singularity is above all a laboratory for exploring interactions among system architecture, programming languages, compilers, specification, and verification. Advances in each of these areas enable and reinforce advances in the others domains, which limits the benefit and impact of studying an area in isolation. Singularity is small and well structured, so it is possible to make changes that span the arbitrary boundaries between these domains. At the same time, it is large and realistic enough to demonstrate the practical advantages of new techniques.

     

  • User profile image
    evildictait​or

    , LiquidBoy wrote

    @evildictaitor:  I found this other document that talks about the architecture of Singularity including background information on how they came up with those cycles calculations.. 

    One of the problems I usually have with benchmarks like this, is that what someone thinks of as a valid comparison often isn't. For example, in this case they've chosen as their benchmark for what is a fast "syscall" the ProcessService.GetCyclesPerSecond() function in singularity, but chosen "SetFilePointer" for the Windows side-by-side comparison.

    The problem is that unless you actually understand what Windows is doing, you'd probably not realize quite how unfair that comparison really is, particularly if you own ProcessService.GetCyclesPerSecond() and can tweak it for the benchmark.

    For example,

    double GetCyclesPerSecond() { return _previouslyComputedValue; }

    is a valid implementation.

    Let's look at what SetFilePointer needs to do in contrast. Let's take the 32-bit process running on 64-bit Win7 just because that's the default C++ project that Visual Studio gives me on my machine.

    So first of all, SetFilePointer calls not into the kernel - but into kernel32.dll in usermode. This first checks to see if the handle is a console handle via a comparison in order to quickly fail. Next, it inspects the move method chosen (which the benchmarker hasn't told us what value they used). If either FILE_CURRENT or FILE_END are chosen, this triggers a call to NtQueryInformationFile in the 32-bit ntdll to get the base address that we can add our offset to - the kernel doesn't have a concept of relative moves, so we need to do this stage first.

    But wait - we're a 64-bit kernel, so what actually happens is a call to the Wow64 thunking layer, which causes a processor switch into 64-bit, triggering a natural flush of the pipeline and cache and an expensive reload of the entire processor's GDT before eventually landing us in wow64.dll.

    wow64.dll then does a big switch statement to jump to Wow64!NtQueryInformationFileImpl, which then calls the kernel via a syscall. This goes through the syscall switch statement that takes us to nt!NtQueryInformationFile which then calls nt!ZwQueryInformationFile, which then checks the handle via nt!ObReferenceObjectByHandle to get a kernelmode pointer out of the handle, and then calls nt!IoQueryInformationFile, which gets the file pointer out of the handle - although not before recording the fact that all of this happens because the system collects IO usage statistics for things like Task Manager and entropy information for things like CryptGenRandom.

    This then returns down the chain back to usermode which gives us a FILE_POSITION_INFORMATION64 in wow64, which then gets mashed into a FILE_POSITION_INFORMATION32 for our 32-bit process. We then do a full processor switch back into 32-bit with all of the associated cost of doing so.

    Yay, now we're back in 32-bit land, and we've resolved the FILE_CURRENT / FILE_END issue, so we now do a 64-bit add to add our offset (we don't have 64-bit registers anymore so this is more expensive too), and now we need to actually set the filepointer, we call into ntdll!NtSetInformationFile, which does a processor switch, a wow64 jump table, a mash of our FILE_POSITION_INFORMATION32 into a FILE_POSITION_INFORMATION64 followed by a syscall followed by a syscall table lookup followed by an ntNtSetInformationFile followed by an nt!ZwSetInformationFile followed by an nt!ObReferenceObjectByHandle followed by an nt!IoSetInformationHandle.

    But wait - what if someone else is using the handle?

    So what then happens is that we have to lock the file via IopLockFileObject() - a full blown acquire of a critical section.

    Only after all of this jiggerypokery can we actually go ahead and set the value on the handle to actually set the file pointer. We can only then return our success condition by performing a massive return all the way back out via wow64 and a processor switch to kernel32 which eventually returns to our tiny C program.

    Which feels like a bit of an unfair comparison to me.

  • User profile image
    evildictait​or

    For example, based on my understanding of how Windows works, I could choose a slightly disingenuous example to beat even Singularity's ABI count:

    #include "stdafx.h"
    #include <Windows.h>
    #include <intrin.h>
    
    #define ITERATION_COUNT    1000000
    int _tmain(int argc, _TCHAR* argv[])
    {
        size_t start, end;
        size_t i;
        size_t cycles;
        DWORD _result = 0;
    
        while(TRUE)
        {
            // setup the thread so the scheduler doesn't get in the way of our measurements:
            SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST);
            Sleep(10); // yield the thread
    
            // we now have a full timeslice to play with:
            start = __rdtsc();
            for(i = 0; i < ITERATION_COUNT; i++)
            {
                _result ^= GetTickCount();
            }
            end = __rdtsc();
    
            cycles = end - start;
            cycles /= ITERATION_COUNT;
    
            printf("Cycles: %d\n", cycles);
        }
        return 0;
    }
    

    Which prints

    Cycles: 11
    Cycles: 11
    Cycles: 11
    Cycles: 12
    Cycles: 11
    Cycles: 11
    Cycles: 11
    Cycles: 10
    Cycles: 11

    on my machine - six times faster than the number that Singularity is claiming victory with.

    It's slightly disingenuous because GetTickCount() doesn't actually perform a switch into kernel-mode, even though the result is computed by the kernel, but rather uses a special region of memory designed precisely for sharing precomputed values between the kernel and usermode (which seems like a fair comparison if the Singularity team are going to play shenanigans with "kernel api"s that just return precomputed results in order to cheat on benchmarks)

    But even if I choose something that actually does do a proper kernel mode switch such as NtClose(NULL), you'll see a nearly six-fold difference between what a syscall actually costs and what that paper is reporting it to cost.

    So in summary: benchmarks without a good understanding of what the benchmark is benchmarking and without careful analysis of whether the benchmark is a valid comparison have a tendency to be devious and to bias strongly in favour of whatever the author wants it to say.

    So yeah. Again, I call shenanigans.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.