Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Programming in the Age of Concurrency: The Accelerator Project

Download

Right click “Save as…”

David Tarditi and Sidd Puri are doing some really cool work over in Microsoft Research. They've built a development technology, Accelerator, that  "provides a high-level data-parallel programming model as a library that is available for all .Net programming languages. The library translates the data-parallel operations on-the-fly to optimized GPU pixel shader code and API calls. Future versions will target multi-core cpus." Watch this video!

Download the Accelerator library and SDK!

Check out the Accelerator Wiki for more info.

Tags:

Follow the Discussion

  • ktrktr two sides to everything

    watching it now..

    12:35 - hahaha I love Charles
  • Chris PietschmannCRPietschma​nn Chris Pietschmann

    Awesome! I'll have to download it and check it out.

  • I like it. I was doing something similar with this 2 years ago, except mine you had to write the PS code by hand so it wasn't nearly as cool.

  • jvervoornjvervoorn Caption
    This is pretty cool, it looks like you guys are thinking ahead some.

    I am trobled by how many times charles is the one trying to introduce groups to what other people are working on. It would be great to improve the communcation that should be going on while at the same time cutting down on the e-mail that is overwhelming you all there.

    If you can improve comunication and get all these ideas to work together I see a real future with some of the things that may be possible in the next 10-15 year range. Also if microsoft can learn from it's mistakes and be more agile, have more ctp (customer technology preview) and take the feedback from that maybe microsoft can get it great in the second version instead of the third.
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    Defered evalution is definately a very interesting subject.  The work being done with LINQ is along the same lines. The compiler generates an expression tree that can then be passed around as data and transformed before evaluation. 
    I wonder if its possible to take expression trees generated by LINQ and transform them into parallelisable computations. I suppose it really comes down to "map" and "reduce" functions in the end. Whilst you are kind of limited to pure arithmetic operations in the GPU, the future of multi-cores certainly could widen the scope.

    Of course, I can't talk about abstract syntax trees without once again mentioning syntactic macros Wink It would be interesting to look at using syntactic macros to perform staged computation. I'm sure some of the parallelising of operations can be decided at compile time. That could make for even more performance increases since you can take some weight off the JIT compiler. Big Smile

    Anyway... Awesome work and great video Charles.
    BTW: Charles, you need to get a secondary job at MSR being "social glue"! We need to get all these academics down to the bar to mix their ideas.
  • Fascinating stuff. The potential for this is truly astounding!
    How does programming with this library compare with other concurrent languages such as Occam-pi?

    BTW Is that a pyramid of 14 patent cubes in the background??

    - Jonathan
  • Thanks for the video. It should be noted, however, that one can create an array "computation" type library today. It is merely a matter of syntax and abstracting away the loops from the end user of your functions. Whether it would take advantage of a multi-core system setup is another thing entirely, so good work here.

    The brief GPU discussion/explanation was also interesting. Additionally, I'm curious, however, as to what the conversion routines for data parallel arrays and regular arrays look like. Perhaps I should check out the SDK? Is much code shared in the SDK or is it pretty much a "black box" type approach?

  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    Even if the SDK has no source it's managed code, so you can get Reflector in there and have a snoop around Tongue Out
  • CharlesCharles Welcome Change
    I'd highly recommend that you download the bits and play with the Accelerator "platform". Research needs your input because you represent the real world and will find problems and or find needs that will help the team build what ultimately serves your purposes.

    Play with it!
    C
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    Data parallelism for big numerical problems is kind of obvious. I think the next challenge is bringing parallelism to regular business apps. For example, if I have a list of business objects and want to validate them all, or maybe check for changes against a web service, doing a simple "foreach" loop is dumb when I have 2 or more CPUs. Maybe one day we will languages and compilers smart enough to just express "validate all these objects" and have it work out the most efficient way to do it...
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    I wonder if I can justify a shiny new graphics card under the guise of "research" Wink
  • rhmrhm
    It's an interesting project and an interesting video.

    I like the idea of having a library that makes general purpose computation on GPU really easy.

    However, the example used in the video of the 5x5 convole is both illusory and hints at the core problem with the approach to making parallelism easy to code in the wider world.

    If you load a big (1600x1200) image into photoshop and do a radius 5 gaussian blur, you're looking at about 1 second of processing even on my relatively old PC (AMD Athlon XP 2000).  And that's because Adobe have hand-optimized their filter routines using the most efficient approach running on a CPU. That is it performs the whole matrix operation on a small area of the image at a time, it 'touches' areas of memory before it needs them so they'll be in the cache when it does need them, and of course it uses MMX/SSE to exploit the small amount of SIMD power current CPUs have.

    The routine shown to us in the video takes a different approach. It composits the whole image repeatedly, offset by a given number of pixels each time. That's the definitive way to perform that operation on a GPU, but it's devastatingly inefficient to do that on a CPU compared to the conventional way of doing it (as shown by Adobe).

    Now it was kind-of glossed over in the video, but I believe the interviewees were saying that they are trying to come up with a way of making data-level parallelism easy to code for both GPUs and multi-core scenarios. They also touted their library approach as being a lot simpler than the conventional high-performance computing approach of having a special compiler pick apart the loops in the problem and work out what to run where. They state that by encoding higher level operations in library calls, that the intention of the program is encoded and the library then works out what to do.

    The problem there is that the intention here - to perform a 5x5 convolve by repeatedly compositing an offset image - it right for the GPU, but wrong for the CPU.

    Now I suppose you could be really clever with your deferred computation and 'unpick' the intention from the series of composition calls that the nested for loops in the example produce and then work out a more efficient way to execute them on a CPU. But that's likely to only work in limited situations, where the same operation is exectuted over and over. I think it would be better to admit that there's no way to succictly encode the intention of program to a computer (this is a problem that mathematicians have grappled with since before there were computers) and just concentrate on producing useful libraries for the two different scenarios. But hey, you're the researchers!
  • jvervoorn wrote:
    I am trobled by how many times charles is the one trying to introduce groups to what other people are working on. It would be great to improve the communcation that should be going on while at the same time cutting down on the e-mail that is overwhelming you all there.

    I think the exact same thing every time.  A lot of projects with overlapping goals.  I know Microsoft is a big company, but a keyword searchable database of current/past projects might do wonders.
  • CharlesCharles Welcome Change
    JonR800 wrote:
    
    jvervoorn wrote: I am trobled by how many times charles is the one trying to introduce groups to what other people are working on. It would be great to improve the communcation that should be going on while at the same time cutting down on the e-mail that is overwhelming you all there.

    I think the exact same thing every time.  A lot of projects with overlapping goals.  I know Microsoft is a big company, but a keyword searchable database of current/past projects might do wonders.


    I'm not sure I fully understand the problem here. Programming concurrent applications is hard and there is no single silver bullet to make it easy (it's a hard problem). Accelerator is but one approach to a specific subset of the problem, just as Software Transactional Memory (video to appear on C9 next week) and language-level solutions are. I am just investigating what's being done around the company to address this important programming topic.

    Can you elaborate more on what you see as the problem with this approach? I'm open to suggestions, as always. In fact, I'd love some more feedback.

    C
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    What happens for those lucky people with dual video cards? Can Accelerator use both in parallel? Big Smile

    (No I don't have dual cards, I just like the idea!)
  • Charles wrote:
    I'm not sure I fully understand the problem here. Programming concurrent applications is hard and there is no single silver bullet to make it easy (it's a hard problem). Accelerator is but one approach to a specific subset of the problem, just as Software Transactional Memory (video to appear on C9 next week) and language-level solutions are. I am just investigating what's being done around the company to address this important programming topic.

    Can you elaborate more on what you see as the problem with this approach? I'm open to suggestions, as always. In fact, I'd love some more feedback.

    C

    Sorry that must have come out wrong.  I'm not complaining that there are videos covering the similar topics.

    I'm just surprised that projects with overlapping goals are unaware of each other.  It seems to me that the CCR team and the Accelerator team might be able to share some useful informationg with each other.  I think it's great that you are able to drop in the recommendation that they take a look at each other's solutions.  Again, I'm just surprised it doesn't happen automatically.

    However, and let this be my disclaimer, I'm not a Microsoft insider and further have no clue how these things work. Smiley
  • Andrew Davey wrote:
    Defered evalution is definately a very interesting subject.  The work being done with LINQ is along the same lines. The compiler generates an expression tree that can then be passed around as data and transformed before evaluation. 
    I wonder if its possible to take expression trees generated by LINQ and transform them into parallelisable computations. I suppose it really comes down to "map" and "reduce" functions in the end. Whilst you are kind of limited to pure arithmetic operations in the GPU, the future of multi-cores certainly could widen the scope.

    Of course, I can't talk about abstract syntax trees without once again mentioning syntactic macros It would be interesting to look at using syntactic macros to perform staged computation. I'm sure some of the parallelising of operations can be decided at compile time. That could make for even more performance increases since you can take some weight off the JIT compiler.


    Yes, staged computation is definitely an interesting way to go.    As you point out, some of the work done by the libary could be at "compile-time" (or at least earlier than currently is done).

    In general, this would also fit with the LINQ work that is going on.    There is some interesting work by Don Syme on connecting F# (another MSR project) to Accelerator.   See Leveraging .NET Meta-programming Components from F#: Integrated Queries and Interoperable Heterogeneous Execution, to be published at the ML Workshop, 2006, Portland, Oregon, available from Don's Web page at http://research.microsoft.com/~dsyme/publications.aspx, where he describes connecting F# to Accelerator.



  • rhm wrote:
    It's an interesting project and an interesting video.

    I like the idea of having a library that makes general purpose computation on GPU really easy.

    However, the example used in the video of the 5x5 convole is both illusory and hints at the core problem with the approach to making parallelism easy to code in the wider world.

    If you load a big (1600x1200) image into photoshop and do a radius 5 gaussian blur, you're looking at about 1 second of processing even on my relatively old PC (AMD Athlon XP 2000).  And that's because Adobe have hand-optimized their filter routines using the most efficient approach running on a CPU. That is it performs the whole matrix operation on a small area of the image at a time, it 'touches' areas of memory before it needs them so they'll be in the cache when it does need them, and of course it uses MMX/SSE to exploit the small amount of SIMD power current CPUs have.

    The routine shown to us in the video takes a different approach. It composits the whole image repeatedly, offset by a given number of pixels each time. That's the definitive way to perform that operation on a GPU, but it's devastatingly inefficient to do that on a CPU compared to the conventional way of doing it (as shown by Adobe).

    Now it was kind-of glossed over in the video, but I believe the interviewees were saying that they are trying to come up with a way of making data-level parallelism easy to code for both GPUs and multi-core scenarios. They also touted their library approach as being a lot simpler than the conventional high-performance computing approach of having a special compiler pick apart the loops in the problem and work out what to run where. They state that by encoding higher level operations in library calls, that the intention of the program is encoded and the library then works out what to do.

    The problem there is that the intention here - to perform a 5x5 convolve by repeatedly compositing an offset image - it right for the GPU, but wrong for the CPU.

    Now I suppose you could be really clever with your deferred computation and 'unpick' the intention from the series of composition calls that the nested for loops in the example produce and then work out a more efficient way to execute them on a CPU. But that's likely to only work in limited situations, where the same operation is exectuted over and over. I think it would be better to admit that there's no way to succictly encode the intention of program to a computer (this is a problem that mathematicians have grappled with since before there were computers) and just concentrate on producing useful libraries for the two different scenarios. But hey, you're the researchers!


    Actually, if we computed all the intermediate arrays implied by the high-level code, performance would be disastrous on the GPU too, because you'd use way too much memory bandwidth and destroy the spatial locality.

    All of the C# for-loops end up unrolled and you end up with one large expression graph being passed to the library.    The graph would imply lots of intermediate arrays being computed.

    We actually convert the graph to something of the following form:

    1.  For each output pixel of the convolution, execute a sequential piece of code.
    2. The sequential piece of code fetches the neighboring pixels and adds them together.

    The sequential piece of code corresponds to the body of the pixel shader.   Now, if you want good performance, you need to traverse the output pixels in the correct order to preserve spatial locality.   Fortunately, the GPU traverses the output pixels in a reasonable order (these are 2-D images, after all).

    Details of how we do this are described in our technical report (accessible from the Accelerator Wiki).   The TR will soon be superceded by a paper that will appear in ASPLOS '06 that we hope does a better job of describing the details.

    You are correct that it is quite difficult to capture the "intention" of a programmer.     Our point was simple, which is that a good start would be to avoid over-specifying the behavior of the program, which is what happens if you write the code in C/C++ using for loops that specify the exact order in which individual array elements are accessed.    One must wonder why Adobe had to hand-code the blocking that you describe and why a compiler couldn't do that.   The answer, as you allude to, is that in the conventional high-performance computing approach, the compiler has to do some pretty heroic stuff.

    To argue that other side, you could say say that our approach results in a program that is too underspecified ...  the area in between overspecified and underspecified is the interesting area to investigate.
  • Andrew Davey wrote:
    What happens for those lucky people with dual video cards? Can Accelerator use both in parallel?

    (No I don't have dual cards, I just like the idea!)


    No, Accelerator can't use both in parallel.    We wish we could Smiley

    It's a neat idea, but it's a harder problem because it changes the hierarchy of the memory.   You have to figure out how to partition the program across that hierarchy.   With a single GPU, you are accessing the local memory on the graphics card, which is very high-bandwidth (>50 GB/s).    With  multiple GPUs, unless you partition the problem just right, you may need to access memory on another card across the bus.   PCI-Express is fast, but not nearly as fast as the memory on the graphics card.
  • MinhMinh WOOH!  WOOH!
    I'm curious - why not implement this as a library on top of multi - core CPU's (which seems  a much  moreuseful Scenario) rather than a GPU ?

    (or perhaps You find the limited Ps instruction set easier to start out with)
  • Now is this entire library written in managed code (i.e. on top of the .NET framework)? If so, how is your code can specifically target the GPU (as opposed to the CPU) without modifications to the underlying structure of the .NET framework? Perhaps I am not familiar enough with the deep internal structure and design of the .NET framework, whereas if I was, I may not ask such a question. Wouldn't that also impede performance if the garbage collector was involved? Could this be written to execute even faster under something unmanaged like C or C++ (or any compiled language)?
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    They are using Managed DirectX. So that takes care of talking to the video card for them.
  • Andrew DaveyAndrew Davey www.​aboutcode.​net
    Minh wrote:
    I'm curious - why not implement this as a library on top of multi - core CPU's (which seems  a much  moreuseful Scenario) rather than a GPU ?

    (or perhaps You find the limited Ps instruction set easier to start out with)

    Parallel data and parallel instructions are two different beasts I guess. Trying to operate on a single dataset from multiple processors causes all kinds of memory/cache issues. When you can split the data up and work independently then its fine. However when you can't, the only performant way to operate is in one processor. In this case taking advantage of the data parallelism inside a single GPU.
    Of course, I'm not an expert by any means in this area... hopefully the boffins at MSR are finding clever solutions to these tricky problems.
  • David Tarditi wrote:
    
    rhm wrote:[...]
    The problem there is that the intention here - to perform a 5x5 convolve by repeatedly compositing an offset image - it right for the GPU, but wrong for the CPU.
    [...]
    [...]
    Now, if you want good performance, you need to traverse the output pixels in the correct order to preserve spatial locality.
    [...]


    You're quite right, rhm, that different target platforms have different issues, and that you have to adapt the structure of your program to your processor if you want the comparison to be meaningful.  I can assure you that in our convolution benchmark, the CPU version we compare against is quite clever about how it iterates.  Smiley

    For our multi-core backend, we are indeed being as ambitious as you suggest.  Our goal is to tailor the loop ordering to suit the machine.  There have been decades of research into automatic loop transformations (strip-mining, tiling, skewing, ...), so the idea of doing this in a compiler isn't novel.  As David points out, the advantage we have is that the program is specified at a higher level, so we don't have to burn cycles trying to figure out which transformations we can legally apply without breaking a data dependency.

    Sidd
  • If i look to this video i can not prevent myself to think that there is nothing new here and i am quite surprised to see that Microsoft is so late in this. Yes Apple (now i am sure that many windows fanboy will treat me of mac troll, but anyway!!!!) have been doing many work on data parallelism for many years now. I mean Apple has been working on APIs for SIMD programming for many years that provide data parallelism for image processing, scientific application, signal processing, math computing, etc..... This API is called Accelerate framework and it just do all the job for the developper. No need to worry which architecture your programm will run (Powerpc or Intel), the APIs does the optimisation for you, the vectorizing for you, and the architecture dependent optmization for you. No need to worry about data alignment, or vector inctrcustion, etc... It just provide the all abstraction, and this certainly why SIMD computing has been far more spread on mac compared to windows. On pc you could use Intel vectorizing tools, but that's expensive and still the level of abstraction is not quite high or as high as a developper would like to be. Now talking about GPU processing, i can not see anything impressive in this video. Apple (yes again Apple, sorry!!) is already proposing TODAY (not a research project) an object oriented API for high-end application and data-parallelism computing. CoreImage and Corevideo does just that. They provide an abstraction model that provides to the developers a model for GPU programming, CoreImage uses OpenGl, OpenGl Shading language and works on programmable GPU. Developpers do no need to know how GPU works or how OpenGl works, CoreInage and CoreVideo provide all the abstraction with an object oriented programming model built with Cocoa. You don't need to know about graphical programming and computer graphics mathematics either, CoreImage/Video abstract all of these. Moreover CoreImage/Video does the optimization on the fly for a given application, depending on the architecture on which the program runs. It does optimize and scale performances depending on the ressource you have. In another words, it optimizes for the GPU if the hardware allows it, otherwise it will optimize for Altivec (SIMD computing) on G4/G5 or for SSE on Intel. It will also optimize for multi-processors machines or multicores machines if it needs/can do so. CoreImage/Video also provide a set of built in Image Units that perform general graphical effect, blur effect, distorsion, morphology, you name it, all running on GPU. CoreImage/Video use a non-destructive mechanism and 32-bit floating point numbers. The architecture is completely modular, any developper can buld it own image unit. Anyone call download a test application named "FunHouse" in the Apple development tools that performs REAL TIME image processing using the GPU. Much more impressive compared to their demo i would say. And more important high end applications like Motion and FinalCut Pro 5 Dynamic RT technology leverage CoreImage and Core Video, you get real time graphics and video processing!!! So i don't really think that what is shown in this video is new or a breakthrough (sorry!!!!), particularly when it is still a research project when CoreImage and CoreVideo already does even more and have been available for more than 1 year now. I would really advice people interested in Accelarator to have a look to CoreImage, CoreVideo too, they will find a state of the art GPU based data precessing and data-paralellism technology. Its not the future, its now.... Last point, in the video there is something that i don't agree. One of the guy said that scientific computing could be done on GPUs. I don't really think so, at least depending on you needs. I am geophysicist, specialist in fluid modelling and continuum mechanics. In most (if not all) scientific modelling work, double precision math is required to achieve acceptable precision for the results. The problem is that CPUs do not provide double precision floating point numbers support in their execution unit. They do provide only (so far!!) simple precision math as it is enough for 3D modelling and games. What i mean is that the vector units in the GPU (yes GPUs use a SIMD model for their execution unit, that's why they can achieve a high order of parallelism in data processing data) only support single precision floating point numbers. This is not enough for most of the scientific applications today. Now there are many research out there on how to use GPUs for non-graphical calculation involving large sets of data, but so far nothing really usuable forr scientifc computing. Apple had similar problem with Altivec becasue it does not support double precision floating point vectors, which prevent the G4 to provide vector computing for double precision floating point numbers. Some of the Accelerate APIs can do some double precision operations on Altivec but it was limited to some specific operations like double precision Fourrier transform. So the GPUs have therefore a similar problem, they do not scale well for double precision floating point computing which limits their use in scientific computing. On the other hand, this does not mean that some interesting work can not be done with the GPUs outside of the graphics world. There are some proposals on taking advantage of the GPUs power to encode or decode MP3 files, MPEG4 files, etc... Some ATI card do some H264 decoding in hardware but we could imagine to use the GPU to also encode H264. Another application is of course animation. Animation does require a lot of data paralelism computing, and GPUs can help a lot in that. Leopard Core Animation is a good application of what can be done.
  • Hakime wrote:
    If i look to this video i can not prevent myself to think that there is nothing new here and i am quite surprised to see that Microsoft is so late in this   

    I  mean Apple has been working on APIs for SIMD programming for many years that provide data parallelism for image processing, scientific application, signal processing, math computing, etc..... This API is called Accelerate framework and it just do all the job for the developper


    Hakime -

    The libraries that you mention are pre-compiled functions that use short-vector instruction sets (such as SSE3 or Altivec).    For example, they include a function that does convolution.  In contrast, Accelerator provides you with primitive operations that are a level below a domain-specific library function.     For example, you can do element-wise addition of 2 data-parallel arrays of 1 or 2-dimensions.    These operations can be used to construct domain-specific library functions, such as the convolution function.

    We have a paper available on our Wiki that describes in detail the kinds of primitives that Accelerator provides and the compilation approach that we use to generate reasonably efficient GPU code.

    The point of Accelerator to use data-parallelism to provide an easier way of programming GPUs and multi-cores, not to provide a set of domain-specific libraries.    

    You are correct that single-precision arithmetic will limit the use of GPUs for scientific computation.     However, there are still lots of interesting things that you can do.    You can look at http://www.gpgpu.org for more information (under "categories", look at "scientific computation").    There has also been some recent work on emulating double-precision floating point numbers using single-precision floating point numbers.

  • I am pretty interested in the stuff, but unfortunately I cannot download it. Is it only me?

    Thanks!
  • First a question: what were all those programs used? I think I saw cygwin and emacs being used, but what was the shell that would highlight the old commands on mouse over?

    And then a comment: my problem with offloading stuff to the GPU is that the numerical environment is a joke. You don't know anything about the radix, the range of fp values, if +, -, *, /, and sqrt follows the sane rounding rules of IEEE, controlling any reorderings or fusions (i.e. a*b+c -> fma(a,b,c)) that are allowed to take place, NaNs, -0, Inf (if so, affine or projected?), what happens on overflow/underflow/etc., all the nice functions in the latest draft IEEE standard or C99, controlling directed rounding, etc., etc., etc.

    It's fine for doing things like CoreImage or accelerating game physics, but not for some of the things I'd love to offload that requires careful analysis. I hope you guys nag the DX people (at least) for some pragmas or mode that will tighten up the fp environment and/or the ability to set/query anything interesting (see limits.h or float.h from C99 as an example).


    And the use of functional type stuff scares me. Do you guys automatically break data down into smaller tiles to keep the memory usage more manageable?
  • When I run your Life program then switch to the task manager the code errors out:

    =====================================

    See the end of this message for details on invoking
    just-in-time (JIT) debugging instead of this dialog box.

    ************** Exception Text **************
    Error in the application.
    -2005530520 (D3DERR_DEVICELOST)
       at Microsoft.DirectX.Direct3D.Device.GetRenderTargetData(Surface renderTarget, Surface destSurface)
       at AcceleratorDX.DxMachine.ConvertStreamToArray(AcceleratorFloatStream s1, Single[,]& afl)
       at AcceleratorDX.DxMachine.ConvertStreamToBitmap(AcceleratorStream s1, Bitmap& bmp)
       at Microsoft.Research.DataParallelArrays.ParallelArrays.ToBitmap(FloatParallelArray a, Bitmap& bm)
       at LifeDemo.Display(Graphics g, Rectangle rc) in C:\Program Files\Microsoft\Accelerator\samples\life.cs:line 61
       at LifeWindowsForm.OnPaint(PaintEventArgs e) in C:\Program Files\Microsoft\Accelerator\samples\life.cs:line 152
       at System.Windows.Forms.Control.PaintWithErrorHandling(PaintEventArgs e, Int16 layer, Boolean disposeEventArgs)
       at System.Windows.Forms.Control.WmPaint(Message& m)
       at System.Windows.Forms.Control.WndProc(Message& m)
       at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
       at System.Windows.Forms.ContainerControl.WndProc(Message& m)
       at System.Windows.Forms.Form.WndProc(Message& m)
       at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
       at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
       at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


    ************** Loaded Assemblies **************
    mscorlib
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
    ----------------------------------------
    Life
        Assembly Version: 0.0.0.0
        Win32 Version: 0.0.0.0
        CodeBase: file:///C:/Program%20Files/Microsoft/Accelerator/samples/Life/bin/Debug/Life.exe
    ----------------------------------------
    System.Windows.Forms
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
    ----------------------------------------
    System
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
    ----------------------------------------
    System.Drawing
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
    ----------------------------------------
    System.Configuration
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Configuration/2.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll
    ----------------------------------------
    System.Xml
        Assembly Version: 2.0.0.0
        Win32 Version: 2.0.50727.42 (RTM.050727-4200)
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml/2.0.0.0__b77a5c561934e089/System.Xml.dll
    ----------------------------------------
    Accelerator
        Assembly Version: 1.1.1.2
        Win32 Version: 1.1.1.2
        CodeBase: file:///C:/Program%20Files/Microsoft/Accelerator/samples/Life/bin/Debug/Accelerator.DLL
    ----------------------------------------
    Microsoft.DirectX.Direct3D
        Assembly Version: 1.0.2902.0
        Win32 Version: 9.05.132.0000
        CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX.Direct3D/1.0.2902.0__31bf3856ad364e35/Microsoft.DirectX.Direct3D.dll
    ----------------------------------------
    Microsoft.DirectX
        Assembly Version: 1.0.2902.0
        Win32 Version: 5.04.00.2904
        CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX/1.0.2902.0__31bf3856ad364e35/Microsoft.DirectX.dll
    ----------------------------------------
    Microsoft.DirectX.Direct3DX
        Assembly Version: 1.0.2906.0
        Win32 Version: 9.07.239.0000
        CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX.Direct3DX/1.0.2906.0__31bf3856ad364e35/Microsoft.DirectX.Direct3DX.dll
    ----------------------------------------
    Microsoft.VisualC
        Assembly Version: 8.0.0.0
        Win32 Version: 8.00.50727.42
        CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/Microsoft.VisualC/8.0.0.0__b03f5f7f11d50a3a/Microsoft.VisualC.dll
    ----------------------------------------

    ************** JIT Debugging **************
    To enable just-in-time (JIT) debugging, the .config file for this
    application or computer (machine.config) must have the
    jitDebugging value set in the system.windows.forms section.
    The application must also be compiled with debugging
    enabled.

    For example:

    <configuration>
        <system.windows.forms jitDebugging="true" />
    </configuration>

    When JIT debugging is enabled, any unhandled exception
    will be sent to the JIT debugger registered on the computer
    rather than be handled by this dialog box.


    =====================================

    Also, where is the code to monitor cpu performance and gpu performance. How to enable? Are there any managed C++ samples
    yet?

    Thanks,
    Chuck

  • dmarshdmarsh Knee draggin'
    This is a rather old post, but I'm wondering where this project stands these days.

    In light of projects like LINQ offering up expression trees that can now be interpreted and compiled into a completely different language and/or transferred off to be executed on a totally diff. piece of hardware, I'm kinda hoping this project picked up with that and basically implemented LINQ to GPUs.

    I started wondering when we'd see this with specific respect to WPF and a true approach to writing custom shader effects when I realized that LINQ could enable this kind of capability. I finally got around to writing a blog post about it and somebody alerted me to this project.

    Anyway, love to hear where the project stands!

    Cheers,
    Drew
  • Allan LindqvistaL_ Kinect ftw

    super awsome Big Smile
    id just like to bump this thread and ask how this project is doing now?
    how is this related to the shader stuff in wpf thats coming up?
    chales, a new interview with these guys whould be so cool Smiley

  • @aL_

    +1

     

    With LINQ one can write

     

    var grayImage = GPU.Compute(image,(Float4 color)=> {

      return color.R * 0.4 + color.G*0.3 + color.B*0.3);

    });

     

    without Accelerator's array proxies.

     

    With improved Expressions in v4.0 link one can expect very convenient api...

    Any news?

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.