Tech Off Thread
32 postsNeed help making simplest UMS scheduler work!

I've read many articles about User Mode Scheduling, but I have problems making it work. I've composed minimal sample exposing UMS and I need any tips to make it work.
Please find the code of the sample in this forum:
http://social.msdn.microsoft.com/Forums/enUS/vcgeneral/thread/c9cc2361dab445cbab514b10a029b3f8
Thanks,
Sergey.

I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
C

Charles said:
I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
C
You should have an answer now. Feel free to post the resolution here to close the loop on this specific issue as well providing useful knowledge in multiple places on the web.
C

Charles said:
I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
C
Thanks Charles, I've finally got the answer!
My mistake was in my wrong assumption that CreateRemoteThreadEx apon return has a ready for execution ums thread. Apparently as I was explained (see the above mentioned forum for details) CreateRemoteThreadEx does the work asynchronously and you need to wait explicitly for the thread to be created.
>Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
I'm one of those guys who write their own rasterizer even if there are plenty other alternatives to use  I simply feel the urge to try out every hard problem myself.
And by any chance someone here knows what ums bug in win7 is about ? (mentioned in concrt sources).
Sergey.

sergeyn said:Charles said:*snip*
Thanks Charles, I've finally got the answer!
My mistake was in my wrong assumption that CreateRemoteThreadEx apon return has a ready for execution ums thread. Apparently as I was explained (see the above mentioned forum for details) CreateRemoteThreadEx does the work asynchronously and you need to wait explicitly for the thread to be created.
>Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
I'm one of those guys who write their own rasterizer even if there are plenty other alternatives to use  I simply feel the urge to try out every hard problem myself.
And by any chance someone here knows what ums bug in win7 is about ? (mentioned in concrt sources).
Sergey.
UMS bug? Can you be more specific? The guy who wrote UMS is the person who I had engage your problem, BTW. He mentioned nothing about a known bug (related to your issue, which was a coding mistake, or rather a misunderstanding of behavior, on your part ).
Keep rolling your own solutions to hard problems. This will just make you a great(er) engineer!
Do send me the details (ctorre [at] microsoft <dot> com) of the bug you mention and I'll FW it along...
C

Charles said:sergeyn said:*snip*
UMS bug? Can you be more specific? The guy who wrote UMS is the person who I had engage your problem, BTW. He mentioned nothing about a known bug (related to your issue, which was a coding mistake, or rather a misunderstanding of behavior, on your part ).
Keep rolling your own solutions to hard problems. This will just make you a great(er) engineer!
Do send me the details (ctorre [at] microsoft <dot> com) of the bug you mention and I'll FW it along...
C
I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.
By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly  hangs forever in some deadlock at exit.
And if by any chance you are curious what my taskparallel solution will look like  I could poke you when I'm done  it's always nice to have alternative implementations!

sergeyn said:Charles said:*snip*
I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.
By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly  hangs forever in some deadlock at exit.
And if by any chance you are curious what my taskparallel solution will look like  I could poke you when I'm done  it's always nice to have alternative implementations!
Please do. Would love to see your implementation. The beauty of an open platform is that there is no limit to the "houses" you can build on top it I'll FW your bug issues to the ConCRT people.
Happy coding,
C

sergeyn said:Charles said:*snip*
I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.
By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly  hangs forever in some deadlock at exit.
And if by any chance you are curious what my taskparallel solution will look like  I could poke you when I'm done  it's always nice to have alternative implementations!
Sergey, can you point us to the Fibonacci sample code that contains the deadlock (there are multiple Fib samples in the ConCRT documentation...). Thanks.
C 
sergeyn said:Charles said:*snip*
I'm talking about a separate codepath for some win7 ums issue, ruled by ResourceManager::RequireUMSWorkaround() function in conctrt sources. I'm Just wondering if this is something I should also be aware of rolling out my own scheduler. At least on my win7 box concrt detects that it needs a workaround.
By the way, when playing with concrt fibonacci sample I noticed that the application can't exit properly  hangs forever in some deadlock at exit.
And if by any chance you are curious what my taskparallel solution will look like  I could poke you when I'm done  it's always nice to have alternative implementations!
Hi sergeyn,
Can you point us to which Fibonacci sample you were specifically looking at, and let us know what (if any) modifications that you made to cause the deadlock? If you were looking at the MSDN documentation, the URL would be great. Thanks!

Charles said:sergeyn said:*snip*
Sergey, can you point us to the Fibonacci sample code that contains the deadlock (there are multiple Fib samples in the ConCRT documentation...). Thanks.
CI have modified it to make it use ums scheduler.
// // // Copyright (c) Microsoft Corporation. All rights reserved. // // File: fibonacci.cpp // // #include "windows.h" #include <ppl.h> using namespace Concurrency; int SPINCOUNT = 25; //Spins for a fixed number of loops #pragma optimize("", off) void delay() { for(int i=0;i < SPINCOUNT;++i); }; #pragma optimize("", on) //Times execution of a functor in ms template <class Functor> __int64 time_call(Functor& fn) { __int64 begin, end; begin = GetTickCount(); fn(); end = GetTickCount(); return end  begin; }; //Computes the fibonacci number of 'n' serially int fib(int n) { delay(); if (n< 2) return n; int n1, n2; n1 = fib(n1); n2 = fib(n2); return n1 + n2; } //Computes the fibonacci number of 'n' in parallel int struct_fib(int n) { delay(); if (n< 2) return n; int n1, n2; //declare a structured task group structured_task_group tasks; //invoke the first half as a task auto task1 = make_task([&n1,n]{n1 = struct_fib(n1);}); tasks.run(task1); //run the second recursive call inline n2 = struct_fib(n2); //wait for completion tasks.wait(); return n1 + n2; } //Computes the fibonacci number of 'n' allocating storage for integers on heap int struct_fib_heap(int n) { delay(); if (n< 2) return n; //n1 and n2 are now allocated on the heap int* n1; int* n2; //declare a task_group structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) malloc(sizeof(int)); *n1 = struct_fib_heap(n1); }); tg.run(t1); n2 = (int*) malloc(sizeof(int)); *n2 = struct_fib_heap(n2); tg.wait(); int result = *n1 + *n2; free(n1); free(n2); return result; } //Computes the fibonacci number of 'n' using the ConcRT suballocator int struct_fib_concrt_heap(int n) { delay(); if (n< 2) return n; int* n1; int* n2; structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) Concurrency::Alloc(sizeof(int)); *n1 = struct_fib_concrt_heap(n1); }); tg.run(t1); n2 = (int*) Concurrency::Alloc(sizeof(int)); *n2 = struct_fib_concrt_heap(n2); tg.wait(); int result = *n1 + *n2; Concurrency::Free(n1); Concurrency::Free(n2); return result; } int main() { CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault)); int num = 30; SPINCOUNT = 500; double serial, parallel; //compare the timing of serial vs parallel fibonacci printf("computing fibonacci of %d serial vs parallel\n",num); printf("\tserial: "); serial= (double)time_call([=](){fib(num);}); printf("%4.0f ms\n",serial); printf("\tparallel: "); parallel = (double)time_call([=](){struct_fib(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); //compare the timing of malloc vs Concurrency::Alloc, //where we expect to get speedups because there are a large //number of small malloc and frees. //increase the number of tasks num = 34; //reduce the amount of 'work' in each task SPINCOUNT = 0; //execute fib using new & delete printf("computing fibonacci of %d using heap\n",num); printf("\tusing malloc: "); serial= (double)time_call([=](){struct_fib_heap(num);}); printf("%4.0f ms\n",serial); //execute fib using the concurrent suballocator printf("\tusing Concurrency::Alloc: "); parallel = (double)time_call([=](){struct_fib_concrt_heap(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); return 0; }
I can also provide stack trace of the hang if you need it (or a full memory dump).
Sergey.

sergeyn said:Charles said:*snip*
I have modified it to make it use ums scheduler.
// // // Copyright (c) Microsoft Corporation. All rights reserved. // // File: fibonacci.cpp // // #include "windows.h" #include <ppl.h> using namespace Concurrency; int SPINCOUNT = 25; //Spins for a fixed number of loops #pragma optimize("", off) void delay() { for(int i=0;i < SPINCOUNT;++i); }; #pragma optimize("", on) //Times execution of a functor in ms template <class Functor> __int64 time_call(Functor& fn) { __int64 begin, end; begin = GetTickCount(); fn(); end = GetTickCount(); return end  begin; }; //Computes the fibonacci number of 'n' serially int fib(int n) { delay(); if (n< 2) return n; int n1, n2; n1 = fib(n1); n2 = fib(n2); return n1 + n2; } //Computes the fibonacci number of 'n' in parallel int struct_fib(int n) { delay(); if (n< 2) return n; int n1, n2; //declare a structured task group structured_task_group tasks; //invoke the first half as a task auto task1 = make_task([&n1,n]{n1 = struct_fib(n1);}); tasks.run(task1); //run the second recursive call inline n2 = struct_fib(n2); //wait for completion tasks.wait(); return n1 + n2; } //Computes the fibonacci number of 'n' allocating storage for integers on heap int struct_fib_heap(int n) { delay(); if (n< 2) return n; //n1 and n2 are now allocated on the heap int* n1; int* n2; //declare a task_group structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) malloc(sizeof(int)); *n1 = struct_fib_heap(n1); }); tg.run(t1); n2 = (int*) malloc(sizeof(int)); *n2 = struct_fib_heap(n2); tg.wait(); int result = *n1 + *n2; free(n1); free(n2); return result; } //Computes the fibonacci number of 'n' using the ConcRT suballocator int struct_fib_concrt_heap(int n) { delay(); if (n< 2) return n; int* n1; int* n2; structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) Concurrency::Alloc(sizeof(int)); *n1 = struct_fib_concrt_heap(n1); }); tg.run(t1); n2 = (int*) Concurrency::Alloc(sizeof(int)); *n2 = struct_fib_concrt_heap(n2); tg.wait(); int result = *n1 + *n2; Concurrency::Free(n1); Concurrency::Free(n2); return result; } int main() { CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault)); int num = 30; SPINCOUNT = 500; double serial, parallel; //compare the timing of serial vs parallel fibonacci printf("computing fibonacci of %d serial vs parallel\n",num); printf("\tserial: "); serial= (double)time_call([=](){fib(num);}); printf("%4.0f ms\n",serial); printf("\tparallel: "); parallel = (double)time_call([=](){struct_fib(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); //compare the timing of malloc vs Concurrency::Alloc, //where we expect to get speedups because there are a large //number of small malloc and frees. //increase the number of tasks num = 34; //reduce the amount of 'work' in each task SPINCOUNT = 0; //execute fib using new & delete printf("computing fibonacci of %d using heap\n",num); printf("\tusing malloc: "); serial= (double)time_call([=](){struct_fib_heap(num);}); printf("%4.0f ms\n",serial); //execute fib using the concurrent suballocator printf("\tusing Concurrency::Alloc: "); parallel = (double)time_call([=](){struct_fib_concrt_heap(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); return 0; }
I can also provide stack trace of the hang if you need it (or a full memory dump).
Sergey.
And I think I have downloaded the sample using this link:
which I found here:

sergeyn said:Charles said:*snip*
I have modified it to make it use ums scheduler.
// // // Copyright (c) Microsoft Corporation. All rights reserved. // // File: fibonacci.cpp // // #include "windows.h" #include <ppl.h> using namespace Concurrency; int SPINCOUNT = 25; //Spins for a fixed number of loops #pragma optimize("", off) void delay() { for(int i=0;i < SPINCOUNT;++i); }; #pragma optimize("", on) //Times execution of a functor in ms template <class Functor> __int64 time_call(Functor& fn) { __int64 begin, end; begin = GetTickCount(); fn(); end = GetTickCount(); return end  begin; }; //Computes the fibonacci number of 'n' serially int fib(int n) { delay(); if (n< 2) return n; int n1, n2; n1 = fib(n1); n2 = fib(n2); return n1 + n2; } //Computes the fibonacci number of 'n' in parallel int struct_fib(int n) { delay(); if (n< 2) return n; int n1, n2; //declare a structured task group structured_task_group tasks; //invoke the first half as a task auto task1 = make_task([&n1,n]{n1 = struct_fib(n1);}); tasks.run(task1); //run the second recursive call inline n2 = struct_fib(n2); //wait for completion tasks.wait(); return n1 + n2; } //Computes the fibonacci number of 'n' allocating storage for integers on heap int struct_fib_heap(int n) { delay(); if (n< 2) return n; //n1 and n2 are now allocated on the heap int* n1; int* n2; //declare a task_group structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) malloc(sizeof(int)); *n1 = struct_fib_heap(n1); }); tg.run(t1); n2 = (int*) malloc(sizeof(int)); *n2 = struct_fib_heap(n2); tg.wait(); int result = *n1 + *n2; free(n1); free(n2); return result; } //Computes the fibonacci number of 'n' using the ConcRT suballocator int struct_fib_concrt_heap(int n) { delay(); if (n< 2) return n; int* n1; int* n2; structured_task_group tg; auto t1 = make_task([&]{ n1 = (int*) Concurrency::Alloc(sizeof(int)); *n1 = struct_fib_concrt_heap(n1); }); tg.run(t1); n2 = (int*) Concurrency::Alloc(sizeof(int)); *n2 = struct_fib_concrt_heap(n2); tg.wait(); int result = *n1 + *n2; Concurrency::Free(n1); Concurrency::Free(n2); return result; } int main() { CurrentScheduler::Create(SchedulerPolicy(1, SchedulerKind, UmsThreadDefault)); int num = 30; SPINCOUNT = 500; double serial, parallel; //compare the timing of serial vs parallel fibonacci printf("computing fibonacci of %d serial vs parallel\n",num); printf("\tserial: "); serial= (double)time_call([=](){fib(num);}); printf("%4.0f ms\n",serial); printf("\tparallel: "); parallel = (double)time_call([=](){struct_fib(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); //compare the timing of malloc vs Concurrency::Alloc, //where we expect to get speedups because there are a large //number of small malloc and frees. //increase the number of tasks num = 34; //reduce the amount of 'work' in each task SPINCOUNT = 0; //execute fib using new & delete printf("computing fibonacci of %d using heap\n",num); printf("\tusing malloc: "); serial= (double)time_call([=](){struct_fib_heap(num);}); printf("%4.0f ms\n",serial); //execute fib using the concurrent suballocator printf("\tusing Concurrency::Alloc: "); parallel = (double)time_call([=](){struct_fib_concrt_heap(num);}); printf("%4.0f ms\n",parallel); printf("\tspeedup: %4.2fX\n",serial/parallel); return 0; }
I can also provide stack trace of the hang if you need it (or a full memory dump).
Sergey.
Yes. We need a memory dump since your sample code does not hang on the ConcRT dev's machine who's engaged here.... Send it to me at ctorre at microsoft dot com.
Thanks,
C

Charles said:sergeyn said:*snip*
Yes. We need a memory dump since your sample code does not hang on the ConcRT dev's machine who's engaged here.... Send it to me at ctorre at microsoft dot com.
Thanks,
C
Done, check your mail.
Sergey

sergeyn said:Charles said:*snip*
Done, check your mail.
Sergey
Got it. Thanks!
C 
Charles said:sergeyn said:*snip*
Got it. Thanks!
CHello Charles,
How is it going with the hang at exit ?
I also wonder if you could share email of Pedro Teixeira with me, I have more issues using ums under debugger, and would like to poke him about ums workaround used in concrt since he is a bit slow using the forum at social.microsoft.com.
Thanks,
Sergey.

sergeyn said:Charles said:*snip*
Hello Charles,
How is it going with the hang at exit ?
I also wonder if you could share email of Pedro Teixeira with me, I have more issues using ums under debugger, and would like to poke him about ums workaround used in concrt since he is a bit slow using the forum at social.microsoft.com.
Thanks,
Sergey.
The workaround is due to a minor bug in Windows 7. It is fixed in SP1. For simple schedulers, like yours, you probably won't run into the bug...
In terms of the dump file you sent in, I haven't heard anything from the developer yet. I'll ping him.
C 
Charles said:sergeyn said:*snip*
The workaround is due to a minor bug in Windows 7. It is fixed in SP1. For simple schedulers, like yours, you probably won't run into the bug...
In terms of the dump file you sent in, I haven't heard anything from the developer yet. I'll ping him.
CIt's not going to be that simple all the time , thus I'd prefer knowing the details.
I also have 0x57 error generated when I step through the code inside ums scheduler, is this known issue ?
Thanks,
Sergey.

sergeyn said:Charles said:*snip*
It's not going to be that simple all the time , thus I'd prefer knowing the details.
I also have 0x57 error generated when I step through the code inside ums scheduler, is this known issue ?
Thanks,
Sergey.
Actually, in case debugger is present, ignoring 0x57 error (treating it the same way as ERROR_RETRY) seem to work fine. Without the debugger the error 0x57 is not generated.
Sergey.
Comments closed
Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.
Pagination