Singularity IV: Return of the UI

Awesome! I'll have to download it and check it out.
I like it. I was doing something similar with this 2 years ago, except mine you had to write the PS code by hand so it wasn't nearly as cool.
Thanks for the video. It should be noted, however, that one can create an array "computation" type library today. It is merely a matter of syntax and abstracting away the loops from the end user of your functions. Whether it would take advantage of a multi-core
system setup is another thing entirely, so good work here.
The brief GPU discussion/explanation was also interesting. Additionally, I'm curious, however, as to what the conversion routines for data parallel arrays and regular arrays look like. Perhaps I should check out the SDK? Is much code shared in the SDK or is
it pretty much a "black box" type approach?
jvervoorn wrote:I am trobled by how many times charles is the one trying to introduce groups to what other people are working on. It would be great to improve the communcation that should be going on while at the same time cutting down on the e-mail that is overwhelming you all there.
JonR800 wrote:
jvervoorn wrote: I am trobled by how many times charles is the one trying to introduce groups to what other people are working on. It would be great to improve the communcation that should be going on while at the same time cutting down on the e-mail that is overwhelming you all there.
I think the exact same thing every time. A lot of projects with overlapping goals. I know Microsoft is a big company, but a keyword searchable database of current/past projects might do wonders.
Charles wrote:I'm not sure I fully understand the problem here. Programming concurrent applications is hard and there is no single silver bullet to make it easy (it's a hard problem). Accelerator is but one approach to a specific subset of the problem, just as Software Transactional Memory (video to appear on C9 next week) and language-level solutions are. I am just investigating what's being done around the company to address this important programming topic.
Can you elaborate more on what you see as the problem with this approach? I'm open to suggestions, as always. In fact, I'd love some more feedback.
C
Andrew Davey wrote:Defered evalution is definately a very interesting subject. The work being done with LINQ is along the same lines. The compiler generates an expression tree that can then be passed around as data and transformed before evaluation.
I wonder if its possible to take expression trees generated by LINQ and transform them into parallelisable computations. I suppose it really comes down to "map" and "reduce" functions in the end. Whilst you are kind of limited to pure arithmetic operations in the GPU, the future of multi-cores certainly could widen the scope.
Of course, I can't talk about abstract syntax trees without once again mentioning syntactic macrosIt would be interesting to look at using syntactic macros to perform staged computation. I'm sure some of the parallelising of operations can be decided at compile time. That could make for even more performance increases since you can take some weight off the JIT compiler.
rhm wrote:It's an interesting project and an interesting video.
I like the idea of having a library that makes general purpose computation on GPU really easy.
However, the example used in the video of the 5x5 convole is both illusory and hints at the core problem with the approach to making parallelism easy to code in the wider world.
If you load a big (1600x1200) image into photoshop and do a radius 5 gaussian blur, you're looking at about 1 second of processing even on my relatively old PC (AMD Athlon XP 2000). And that's because Adobe have hand-optimized their filter routines using the most efficient approach running on a CPU. That is it performs the whole matrix operation on a small area of the image at a time, it 'touches' areas of memory before it needs them so they'll be in the cache when it does need them, and of course it uses MMX/SSE to exploit the small amount of SIMD power current CPUs have.
The routine shown to us in the video takes a different approach. It composits the whole image repeatedly, offset by a given number of pixels each time. That's the definitive way to perform that operation on a GPU, but it's devastatingly inefficient to do that on a CPU compared to the conventional way of doing it (as shown by Adobe).
Now it was kind-of glossed over in the video, but I believe the interviewees were saying that they are trying to come up with a way of making data-level parallelism easy to code for both GPUs and multi-core scenarios. They also touted their library approach as being a lot simpler than the conventional high-performance computing approach of having a special compiler pick apart the loops in the problem and work out what to run where. They state that by encoding higher level operations in library calls, that the intention of the program is encoded and the library then works out what to do.
The problem there is that the intention here - to perform a 5x5 convolve by repeatedly compositing an offset image - it right for the GPU, but wrong for the CPU.
Now I suppose you could be really clever with your deferred computation and 'unpick' the intention from the series of composition calls that the nested for loops in the example produce and then work out a more efficient way to execute them on a CPU. But that's likely to only work in limited situations, where the same operation is exectuted over and over. I think it would be better to admit that there's no way to succictly encode the intention of program to a computer (this is a problem that mathematicians have grappled with since before there were computers) and just concentrate on producing useful libraries for the two different scenarios. But hey, you're the researchers!
Andrew Davey wrote:What happens for those lucky people with dual video cards? Can Accelerator use both in parallel?
(No I don't have dual cards, I just like the idea!)
Minh wrote:I'm curious - why not implement this as a library on top of multi - core CPU's (which seems a much moreuseful Scenario) rather than a GPU ?
(or perhaps You find the limited Ps instruction set easier to start out with)
David Tarditi wrote:[...]
rhm wrote:[...]
The problem there is that the intention here - to perform a 5x5 convolve by repeatedly compositing an offset image - it right for the GPU, but wrong for the CPU.
[...]
Now, if you want good performance, you need to traverse the output pixels in the correct order to preserve spatial locality.
[...]
Hakime wrote:If i look to this video i can not prevent myself to think that there is nothing new here and i am quite surprised to see that Microsoft is so late in this
I mean Apple has been working on APIs for SIMD programming for many years that provide data parallelism for image processing, scientific application, signal processing, math computing, etc..... This API is called Accelerate framework and it just do all the job for the developper
When I run your Life program then switch to the task manager the code errors out:
=====================================
See the end of this message for details on invoking
just-in-time (JIT) debugging instead of this dialog box.
************** Exception Text **************
Error in the application.
-2005530520 (D3DERR_DEVICELOST)
at Microsoft.DirectX.Direct3D.Device.GetRenderTargetData(Surface renderTarget, Surface destSurface)
at AcceleratorDX.DxMachine.ConvertStreamToArray(AcceleratorFloatStream s1, Single[,]& afl)
at AcceleratorDX.DxMachine.ConvertStreamToBitmap(AcceleratorStream s1, Bitmap& bmp)
at Microsoft.Research.DataParallelArrays.ParallelArrays.ToBitmap(FloatParallelArray a, Bitmap& bm)
at LifeDemo.Display(Graphics g, Rectangle rc) in C:\Program Files\Microsoft\Accelerator\samples\life.cs:line 61
at LifeWindowsForm.OnPaint(PaintEventArgs e) in C:\Program Files\Microsoft\Accelerator\samples\life.cs:line 152
at System.Windows.Forms.Control.PaintWithErrorHandling(PaintEventArgs e, Int16 layer, Boolean disposeEventArgs)
at System.Windows.Forms.Control.WmPaint(Message& m)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ScrollableControl.WndProc(Message& m)
at System.Windows.Forms.ContainerControl.WndProc(Message& m)
at System.Windows.Forms.Form.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
************** Loaded Assemblies **************
mscorlib
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
----------------------------------------
Life
Assembly Version: 0.0.0.0
Win32 Version: 0.0.0.0
CodeBase: file:///C:/Program%20Files/Microsoft/Accelerator/samples/Life/bin/Debug/Life.exe
----------------------------------------
System.Windows.Forms
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
----------------------------------------
System
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
----------------------------------------
System.Drawing
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
----------------------------------------
System.Configuration
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Configuration/2.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll
----------------------------------------
System.Xml
Assembly Version: 2.0.0.0
Win32 Version: 2.0.50727.42 (RTM.050727-4200)
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Xml/2.0.0.0__b77a5c561934e089/System.Xml.dll
----------------------------------------
Accelerator
Assembly Version: 1.1.1.2
Win32 Version: 1.1.1.2
CodeBase: file:///C:/Program%20Files/Microsoft/Accelerator/samples/Life/bin/Debug/Accelerator.DLL
----------------------------------------
Microsoft.DirectX.Direct3D
Assembly Version: 1.0.2902.0
Win32 Version: 9.05.132.0000
CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX.Direct3D/1.0.2902.0__31bf3856ad364e35/Microsoft.DirectX.Direct3D.dll
----------------------------------------
Microsoft.DirectX
Assembly Version: 1.0.2902.0
Win32 Version: 5.04.00.2904
CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX/1.0.2902.0__31bf3856ad364e35/Microsoft.DirectX.dll
----------------------------------------
Microsoft.DirectX.Direct3DX
Assembly Version: 1.0.2906.0
Win32 Version: 9.07.239.0000
CodeBase: file:///C:/WINDOWS/assembly/GAC/Microsoft.DirectX.Direct3DX/1.0.2906.0__31bf3856ad364e35/Microsoft.DirectX.Direct3DX.dll
----------------------------------------
Microsoft.VisualC
Assembly Version: 8.0.0.0
Win32 Version: 8.00.50727.42
CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/Microsoft.VisualC/8.0.0.0__b03f5f7f11d50a3a/Microsoft.VisualC.dll
----------------------------------------
************** JIT Debugging **************
To enable just-in-time (JIT) debugging, the .config file for this
application or computer (machine.config) must have the
jitDebugging value set in the system.windows.forms section.
The application must also be compiled with debugging
enabled.
For example:
<configuration>
<system.windows.forms jitDebugging="true" />
</configuration>
When JIT debugging is enabled, any unhandled exception
will be sent to the JIT debugger registered on the computer
rather than be handled by this dialog box.
=====================================
Also, where is the code to monitor cpu performance and gpu performance. How to enable? Are there any managed C++ samples
yet?
Thanks,
Chuck
super awsome
id just like to bump this thread and ask how this project is doing now?
how is this related to the shader stuff in wpf thats coming up?
chales, a new interview with these guys whould be so cool
@aL_
+1
With LINQ one can write
var grayImage = GPU.Compute(image,(Float4 color)=> {
return color.R * 0.4 + color.G*0.3 + color.B*0.3);
});
without Accelerator's array proxies.
With improved Expressions in v4.0 link one can expect very convenient api...
Any news?