GoingNative 10: Welcome Ale Contenti, VC11 and Beyond with Steve Teixeira and Tarek Madkour

The VC++ 2012 auto-vectorizer tries to make loops in your code run faster by automatically vectorizing your code using the SSE instructions available in all current mainline Intel and AMD chips. In Visual C++ 2012, auto-vectorization is on by default and requires only that you write your code—that is, there are no compiler switches, #pragmas, or hints. It just works. Of course, it's one thing to say that, but how does it work, exactly? When does it vectorize and when doesn't it? Why?
Auto-vectorization is a powerful compiler feature. In VS 12 it represents outstanding engineering by a few folks on the Microsoft Visual C++ compiler team. The engineering leader of this team is Jim Radigan. Fortunately for us, Jim has agreed to do a series of C9 lectures digging into the nuts and bolts of automatic vectorization in general and specifically as it relates to the latest version of VC++. Thank you, Jim!
In the first part of this n-part series, Jim introduces the series, describes improvements to the VC++ 2012 compilers, shares, introduces auto-vectorization, demos a few apps that benefit from compiler-optimized performance via auto-vectorization, and begins to describe how/when user code is vectorized (typical and atypical patterns alike - more to come as the lectures progress, of course). Over the course of this series, Jim will present both the practical and theoretical foundations of auto-vectorization.
(You can learn more about auto-vectorization in VC++ by reading the blog posts by Jim Hogg, another member of the VC++ compiler team working on this technology.)
Tune in. Ask questions. Learn.
Very pleased to see this. Great start to the series and looking forward to digging in more. I have a question though, wow that's it's been officially announced that some time later this Autumn, Visual Studio 2012 will be updated to be able to directly create binaries for Windows XP without the sledge hammer of having to have Visual Studio 2010 installed, will auto-vectorization extend to the binaries targeting Windows XP?
I've said it before and I'll say it again -- it's absolutely awesome that you guys are able to do all of this with the compiler.
This guy was amazing in the previous video and I'll be downloading this now to watch asap!
I wonder what Jim(or anyone who's interested here) would think of my parallel execution engine/vm which schedules and tracks data dependance in a sort of VM environment with a 1 extra layer of memory location indirection instead of making the compiler know all data & locations aware at compilation time?
http://toxprox.tumblr.com/post/184172056/the-many-core-answer-relativistic-computation
It is a little vague and it does it's break up horizontally accross cores instead of vertically to fill the vectors but I did try to explain for a broader audience but I'd love feedback on if my architecture & design seems possible! If it's just wrong or if any of you long time researchers know if it can't work as I tried to explain please let me know .
Awesome video!
Its possible to check auto-parallelizer in the task manager, but how do I know if my code is auto-vectorized? this was briefly mentioned in the video but it was not clear.
@r00k You can look at the disassembly and see what instructions the compiler generated.
DL is broken for me in Chrome, FF, and IE. Downloads a couple of kb and finishes.
Briliant! Waiting for deep dive! One of the things that it would be really cool is if you could create a document where you define the loop forms that you autovectorize, so that we can use the forms in our code by default.
I am more interested to know what type of loops the compiler are able to auto-vectorize and those loops compiler cannot auto-vectorize.
Maybe you can think also about support for 80-bit and/or 128-bit presicion in Visual Studio?
excellent
Cant wait to get my hands on this!
I do wonder how the compiler decides what it should try to vectorise.
I mean, a typical program will have thousands of loops, many of which will run over a small number of items. Vectorising all of those will only increase the exe size, which is bad for system perf as a whole.
And what if i do range based for's(c++11 feature) over stl vectors? Will the vectoriser recognize those loops too?