Tech Off Thread

32 posts

Need help making simplest UMS scheduler work!

Back to Forum: Tech Off
  • User profile image
    sergeyn

    I've read many articles about User Mode Scheduling, but I have problems making it work.  I've composed minimal sample exposing UMS and I need any tips to make it work.

    Please find the code of the sample in this forum:

    http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/c9cc2361-dab4-45cb-ab51-4b10a029b3f8

     

    Thanks,

    Sergey.

  • User profile image
    Charles

    I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...

    C

  • User profile image
    Charles

    Charles said:

    I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...

    C

    You should have an answer now. Feel free to post the resolution here to close the loop on this specific issue as well providing useful knowledge in multiple places on the web.

    C

  • User profile image
    sergeyn

    Charles said:

    I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...

    C

    Thanks Charles, I've finally got the answer!

     

    My mistake was in my wrong assumption that CreateRemoteThreadEx  apon return has a ready for execution ums thread. Apparently as I was explained (see the above mentioned forum for details) CreateRemoteThreadEx does the work asynchronously and you need to wait explicitly for the thread to be created.

     

    >Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...

    I'm one of those guys who write their own rasterizer Wink even if there are plenty other alternatives to use  - I simply feel the urge to try out every hard problem myself.

     

     

    And by any chance someone here knows what ums bug in win7 is about ? (mentioned in concrt sources).

     

    Thanks,

    Sergey.

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    Thanks Charles, I've finally got the answer!

     

    My mistake was in my wrong assumption that CreateRemoteThreadEx  apon return has a ready for execution ums thread. Apparently as I was explained (see the above mentioned forum for details) CreateRemoteThreadEx does the work asynchronously and you need to wait explicitly for the thread to be created.

     

    >Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...

    I'm one of those guys who write their own rasterizer Wink even if there are plenty other alternatives to use  - I simply feel the urge to try out every hard problem myself.

     

     

    And by any chance someone here knows what ums bug in win7 is about ? (mentioned in concrt sources).

     

    Thanks,

    Sergey.

    UMS bug? Can you be more specific? The guy who wrote UMS is the person who I had engage your problem, BTW. He mentioned nothing about a known bug (related to your issue, which was a coding mistake, or rather a misunderstanding of behavior, on your part Smiley).

     

    Keep rolling your own solutions to hard problems. This will just make you a great(er) engineer!

     

    Do send me the details (ctorre [at] microsoft <dot> com) of the bug you mention and I'll FW it along...

    C

  • User profile image
    sergeyn

    Charles said:
    sergeyn said:
    *snip*

    UMS bug? Can you be more specific? The guy who wrote UMS is the person who I had engage your problem, BTW. He mentioned nothing about a known bug (related to your issue, which was a coding mistake, or rather a misunderstanding of behavior, on your part Smiley).

     

    Keep rolling your own solutions to hard problems. This will just make you a great(er) engineer!

     

    Do send me the details (ctorre [at] microsoft <dot> com) of the bug you mention and I'll FW it along...

    C

    I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.

     

    By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly - hangs forever in some deadlock at exit.

     

    And if by any chance you are curious what my task-parallel solution will look like - I could poke you when I'm done - it's always nice to have alternative implementations!

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.

     

    By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly - hangs forever in some deadlock at exit.

     

    And if by any chance you are curious what my task-parallel solution will look like - I could poke you when I'm done - it's always nice to have alternative implementations!

    Please do. Would love to see your implementation. The beauty of an open platform is that there is no limit to the "houses" you can build on top it Smiley I'll FW your bug issues to the ConCRT people.

     

    Happy coding,

    C

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.

     

    By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly - hangs forever in some deadlock at exit.

     

    And if by any chance you are curious what my task-parallel solution will look like - I could poke you when I'm done - it's always nice to have alternative implementations!

    Sergey, can you point us to the Fibonacci sample code that contains the deadlock (there are multiple Fib samples in the ConCRT documentation...). Thanks.

    C

  • User profile image
    thompet

    sergeyn said:
    Charles said:
    *snip*

    I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.

     

    By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly - hangs forever in some deadlock at exit.

     

    And if by any chance you are curious what my task-parallel solution will look like - I could poke you when I'm done - it's always nice to have alternative implementations!

    Hi sergeyn,

     

    Can you point us to which Fibonacci sample you were specifically looking at, and let us know what (if any) modifications that you made to cause the deadlock? If you were looking at the MSDN documentation, the URL would be great. Thanks!

  • User profile image
    sergeyn

    Charles said:
    sergeyn said:
    *snip*

    Sergey, can you point us to the Fibonacci sample code that contains the deadlock (there are multiple Fib samples in the ConCRT documentation...). Thanks.

    C

    I have modified it to make it use ums scheduler.

     

    //--------------------------------------------------------------------------
    // 
    //  Copyright (c) Microsoft Corporation.  All rights reserved. 
    // 
    //  File: fibonacci.cpp
    //
    //--------------------------------------------------------------------------
    #include "windows.h"
    #include <ppl.h>
    using namespace Concurrency;
    int SPINCOUNT = 25;
    //Spins for a fixed number of loops
    #pragma optimize("", off)
    void delay()
    {
        for(int i=0;i < SPINCOUNT;++i);
    };
    #pragma optimize("", on)
    //Times execution of a functor in ms
    template <class Functor>
    __int64 time_call(Functor& fn)
    {
        __int64 begin, end;
        begin = GetTickCount();
        fn();
        end = GetTickCount();
        return end - begin;
    };
    //Computes the fibonacci number of 'n' serially
    int fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;	
        n1 = fib(n-1);
        n2 = fib(n-2);
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' in parallel
    int struct_fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;
        //declare a structured task group
        structured_task_group tasks;
        //invoke the first half as a task
        auto task1 = make_task([&n1,n]{n1 = struct_fib(n-1);});
        tasks.run(task1);
        //run the second recursive call inline
        n2 = struct_fib(n-2);
        //wait for completion
        tasks.wait();
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' allocating storage for integers on heap
    int struct_fib_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        //n1 and n2 are now allocated on the heap
        int* n1;
        int* n2;
        //declare a task_group
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) malloc(sizeof(int));
            *n1 = struct_fib_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) malloc(sizeof(int));
        *n2 = struct_fib_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        free(n1);
        free(n2);
        return result;
    }
    //Computes the fibonacci number of 'n' using the ConcRT suballocator
    int struct_fib_concrt_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        int* n1;
        int* n2;
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) Concurrency::Alloc(sizeof(int));
            *n1 = struct_fib_concrt_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) Concurrency::Alloc(sizeof(int));
        *n2 = struct_fib_concrt_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        Concurrency::Free(n1);
        Concurrency::Free(n2);
        return result;
    }
    int main()
    {
      CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault));
      
        int num = 30;
        SPINCOUNT = 500;
        double serial, parallel;
        //compare the timing of serial vs parallel fibonacci
        printf("computing fibonacci of %d serial vs parallel\n",num);
        printf("\tserial:   ");
        serial= (double)time_call([=](){fib(num);});
        printf("%4.0f ms\n",serial);
        printf("\tparallel: ");
        parallel = (double)time_call([=](){struct_fib(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        //compare the timing of malloc vs Concurrency::Alloc,
        //where we expect to get speedups because there are a large
        //number of small malloc and frees.
        
        //increase the number of tasks
        num = 34;
        //reduce the amount of 'work' in each task
        SPINCOUNT = 0;
        //execute fib using new & delete
        printf("computing fibonacci of %d using heap\n",num);
        printf("\tusing malloc:             ");
        serial= (double)time_call([=](){struct_fib_heap(num);});
        printf("%4.0f ms\n",serial);
        //execute fib using the concurrent suballocator
        printf("\tusing Concurrency::Alloc: ");
        parallel = (double)time_call([=](){struct_fib_concrt_heap(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        return 0;
    }
    

     

    I can also provide stack trace of the hang if you need it (or a full memory dump).

     

    Sergey.

  • User profile image
    sergeyn

    sergeyn said:
    Charles said:
    *snip*

    I have modified it to make it use ums scheduler.

     

    //--------------------------------------------------------------------------
    // 
    //  Copyright (c) Microsoft Corporation.  All rights reserved. 
    // 
    //  File: fibonacci.cpp
    //
    //--------------------------------------------------------------------------
    #include "windows.h"
    #include <ppl.h>
    using namespace Concurrency;
    int SPINCOUNT = 25;
    //Spins for a fixed number of loops
    #pragma optimize("", off)
    void delay()
    {
        for(int i=0;i < SPINCOUNT;++i);
    };
    #pragma optimize("", on)
    //Times execution of a functor in ms
    template <class Functor>
    __int64 time_call(Functor& fn)
    {
        __int64 begin, end;
        begin = GetTickCount();
        fn();
        end = GetTickCount();
        return end - begin;
    };
    //Computes the fibonacci number of 'n' serially
    int fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;	
        n1 = fib(n-1);
        n2 = fib(n-2);
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' in parallel
    int struct_fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;
        //declare a structured task group
        structured_task_group tasks;
        //invoke the first half as a task
        auto task1 = make_task([&n1,n]{n1 = struct_fib(n-1);});
        tasks.run(task1);
        //run the second recursive call inline
        n2 = struct_fib(n-2);
        //wait for completion
        tasks.wait();
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' allocating storage for integers on heap
    int struct_fib_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        //n1 and n2 are now allocated on the heap
        int* n1;
        int* n2;
        //declare a task_group
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) malloc(sizeof(int));
            *n1 = struct_fib_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) malloc(sizeof(int));
        *n2 = struct_fib_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        free(n1);
        free(n2);
        return result;
    }
    //Computes the fibonacci number of 'n' using the ConcRT suballocator
    int struct_fib_concrt_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        int* n1;
        int* n2;
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) Concurrency::Alloc(sizeof(int));
            *n1 = struct_fib_concrt_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) Concurrency::Alloc(sizeof(int));
        *n2 = struct_fib_concrt_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        Concurrency::Free(n1);
        Concurrency::Free(n2);
        return result;
    }
    int main()
    {
      CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault));
      
        int num = 30;
        SPINCOUNT = 500;
        double serial, parallel;
        //compare the timing of serial vs parallel fibonacci
        printf("computing fibonacci of %d serial vs parallel\n",num);
        printf("\tserial:   ");
        serial= (double)time_call([=](){fib(num);});
        printf("%4.0f ms\n",serial);
        printf("\tparallel: ");
        parallel = (double)time_call([=](){struct_fib(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        //compare the timing of malloc vs Concurrency::Alloc,
        //where we expect to get speedups because there are a large
        //number of small malloc and frees.
        
        //increase the number of tasks
        num = 34;
        //reduce the amount of 'work' in each task
        SPINCOUNT = 0;
        //execute fib using new & delete
        printf("computing fibonacci of %d using heap\n",num);
        printf("\tusing malloc:             ");
        serial= (double)time_call([=](){struct_fib_heap(num);});
        printf("%4.0f ms\n",serial);
        //execute fib using the concurrent suballocator
        printf("\tusing Concurrency::Alloc: ");
        parallel = (double)time_call([=](){struct_fib_concrt_heap(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        return 0;
    }
    

     

    I can also provide stack trace of the hang if you need it (or a full memory dump).

     

    Sergey.

    And I think I have downloaded the sample using this link:

    http://code.msdn.microsoft.com/Project/Download/FileDownload.aspx?ProjectName=concrtextras&DownloadId=9496

     

    which I found here:

    http://blogs.msdn.com/nativeconcurrency/archive/2010/03/10/samples-updated-for-concrt-ppl-and-agents.aspx

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    I have modified it to make it use ums scheduler.

     

    //--------------------------------------------------------------------------
    // 
    //  Copyright (c) Microsoft Corporation.  All rights reserved. 
    // 
    //  File: fibonacci.cpp
    //
    //--------------------------------------------------------------------------
    #include "windows.h"
    #include <ppl.h>
    using namespace Concurrency;
    int SPINCOUNT = 25;
    //Spins for a fixed number of loops
    #pragma optimize("", off)
    void delay()
    {
        for(int i=0;i < SPINCOUNT;++i);
    };
    #pragma optimize("", on)
    //Times execution of a functor in ms
    template <class Functor>
    __int64 time_call(Functor& fn)
    {
        __int64 begin, end;
        begin = GetTickCount();
        fn();
        end = GetTickCount();
        return end - begin;
    };
    //Computes the fibonacci number of 'n' serially
    int fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;	
        n1 = fib(n-1);
        n2 = fib(n-2);
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' in parallel
    int struct_fib(int n)
    {
        delay();
        if (n< 2)
            return n;
        int n1, n2;
        //declare a structured task group
        structured_task_group tasks;
        //invoke the first half as a task
        auto task1 = make_task([&n1,n]{n1 = struct_fib(n-1);});
        tasks.run(task1);
        //run the second recursive call inline
        n2 = struct_fib(n-2);
        //wait for completion
        tasks.wait();
        return n1 + n2;
    }
    //Computes the fibonacci number of 'n' allocating storage for integers on heap
    int struct_fib_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        //n1 and n2 are now allocated on the heap
        int* n1;
        int* n2;
        //declare a task_group
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) malloc(sizeof(int));
            *n1 = struct_fib_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) malloc(sizeof(int));
        *n2 = struct_fib_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        free(n1);
        free(n2);
        return result;
    }
    //Computes the fibonacci number of 'n' using the ConcRT suballocator
    int struct_fib_concrt_heap(int n)
    {
        delay();
        if (n< 2)
            return n;
        int* n1;
        int* n2;
        structured_task_group tg;	
        auto t1 = make_task([&]{
            n1 = (int*) Concurrency::Alloc(sizeof(int));
            *n1 = struct_fib_concrt_heap(n-1);
        });
        tg.run(t1);
        n2 = (int*) Concurrency::Alloc(sizeof(int));
        *n2 = struct_fib_concrt_heap(n-2);
        tg.wait();
        int result = *n1 + *n2;
        Concurrency::Free(n1);
        Concurrency::Free(n2);
        return result;
    }
    int main()
    {
      CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault));
      
        int num = 30;
        SPINCOUNT = 500;
        double serial, parallel;
        //compare the timing of serial vs parallel fibonacci
        printf("computing fibonacci of %d serial vs parallel\n",num);
        printf("\tserial:   ");
        serial= (double)time_call([=](){fib(num);});
        printf("%4.0f ms\n",serial);
        printf("\tparallel: ");
        parallel = (double)time_call([=](){struct_fib(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        //compare the timing of malloc vs Concurrency::Alloc,
        //where we expect to get speedups because there are a large
        //number of small malloc and frees.
        
        //increase the number of tasks
        num = 34;
        //reduce the amount of 'work' in each task
        SPINCOUNT = 0;
        //execute fib using new & delete
        printf("computing fibonacci of %d using heap\n",num);
        printf("\tusing malloc:             ");
        serial= (double)time_call([=](){struct_fib_heap(num);});
        printf("%4.0f ms\n",serial);
        //execute fib using the concurrent suballocator
        printf("\tusing Concurrency::Alloc: ");
        parallel = (double)time_call([=](){struct_fib_concrt_heap(num);});
        printf("%4.0f ms\n",parallel);
        printf("\tspeedup: %4.2fX\n",serial/parallel);
        return 0;
    }
    

     

    I can also provide stack trace of the hang if you need it (or a full memory dump).

     

    Sergey.

    Yes. We need a memory dump since your sample code does not hang on the ConcRT dev's machine who's engaged here.... Send it to me at ctorre at microsoft dot com.

    Thanks,

    C

  • User profile image
    sergeyn

    Charles said:
    sergeyn said:
    *snip*

    Yes. We need a memory dump since your sample code does not hang on the ConcRT dev's machine who's engaged here.... Send it to me at ctorre at microsoft dot com.

    Thanks,

    C

    Done, check your mail.

     

    Sergey

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    Done, check your mail.

     

    Sergey

    Got it. Thanks!
    C

  • User profile image
    sergeyn

    Charles said:
    sergeyn said:
    *snip*

    Got it. Thanks!
    C

    Hello Charles,

     

    How is it going with the hang at exit ?

     

    I also wonder if you could share e-mail of Pedro Teixeira with me, I have more issues using ums under debugger, and would like to poke him about ums workaround used in concrt since he is a bit slow using the forum at social.microsoft.com.

     

    Thanks,

    Sergey.

  • User profile image
    Charles

    sergeyn said:
    Charles said:
    *snip*

    Hello Charles,

     

    How is it going with the hang at exit ?

     

    I also wonder if you could share e-mail of Pedro Teixeira with me, I have more issues using ums under debugger, and would like to poke him about ums workaround used in concrt since he is a bit slow using the forum at social.microsoft.com.

     

    Thanks,

    Sergey.

    The workaround is due to a minor bug in Windows 7. It is fixed in SP1. For simple schedulers, like yours, you probably won't run into the bug...

     

    In terms of the dump file you sent in, I haven't heard anything from the developer yet. I'll ping him.


    C

  • User profile image
    sergeyn

    Charles said:
    sergeyn said:
    *snip*

    The workaround is due to a minor bug in Windows 7. It is fixed in SP1. For simple schedulers, like yours, you probably won't run into the bug...

     

    In terms of the dump file you sent in, I haven't heard anything from the developer yet. I'll ping him.


    C

    It's not going to be that simple all the time Wink, thus I'd prefer knowing the details.

     

    I also have 0x57 error generated when I step through the code inside ums scheduler, is this known issue ?

     

     

    Thanks,

    Sergey.

  • User profile image
    sergeyn

    sergeyn said:
    Charles said:
    *snip*

    It's not going to be that simple all the time Wink, thus I'd prefer knowing the details.

     

    I also have 0x57 error generated when I step through the code inside ums scheduler, is this known issue ?

     

     

    Thanks,

    Sergey.

    Actually, in case debugger is present, ignoring 0x57 error (treating it the same way as ERROR_RETRY) seem to work fine. Without the debugger the error 0x57 is not generated.

     

    Sergey.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.