Hakime wrote:If i look to this video i can not prevent myself to think that there is nothing new here and i am quite surprised to see that Microsoft is so late in this
I mean Apple has been working on APIs for SIMD programming for many years that provide data parallelism for image processing, scientific application, signal processing, math computing, etc..... This API is called Accelerate framework and it just do all the job for the developper
The libraries that you mention are pre-compiled functions that use short-vector instruction sets (such as SSE3 or Altivec). For example, they include a function that does convolution. In contrast, Accelerator provides you with primitive operations that are a level below a domain-specific library function. For example, you can do element-wise addition of 2 data-parallel arrays of 1 or 2-dimensions. These operations can be used to construct domain-specific library functions, such as the convolution function.
We have a paper available on our Wiki that describes in detail the kinds of primitives that Accelerator provides and the compilation approach that we use to generate reasonably efficient GPU code.
The point of Accelerator to use data-parallelism to provide an easier way of programming GPUs and multi-cores, not to provide a set of domain-specific libraries.
You are correct that single-precision arithmetic will limit the use of GPUs for scientific computation. However, there are still lots of interesting things that you can do. You can look at http://www.gpgpu.org for more information (under "categories", look at "scientific computation"). There has also been some recent work on emulating double-precision floating point numbers using single-precision floating point numbers.