<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" media="screen" href="/styles/xslt/rss.xslt"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:c9="http://channel9.msdn.com">
<channel>
	<title>Comment Feed for Channel 9 - Getting the most out of the C++ compiler</title>
	<atom:link rel="self" type="application/rss+xml" href="http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer/rss"></atom:link>
	<image>
		<url>http://media.ch9.ms/ch9/a90e/3369c950-486f-48ec-93cd-aaa6bd15a90e/Win8CppEventJimHoggAutoV_220.jpg</url>
		<title>Channel 9 - Getting the most out of the C++ compiler</title>
		<link></link>
	</image>
	<description>The C&amp;#43;&amp;#43; compiler in Visual Studio 11 includes a new feature, called auto-vectorization.&amp;nbsp; It analyses the loops in C&amp;#43;&amp;#43; code and tries to make them run faster by using the vector registers, and instruction, inside the processor.&amp;nbsp; This short talk explains what&#39;s going on. [This session was pre-recorded] </description>
	<link></link>
	<language>en</language>
	<pubDate>Thu, 23 May 2013 11:17:23 GMT</pubDate>
	<lastBuildDate>Thu, 23 May 2013 11:17:23 GMT</lastBuildDate>
	<generator>Rev9</generator>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[Jim, whatever happened to Phoenix&#63; The last time it was spoken of &#40;by yourself&#41; it seemed it was on the verge of being used in the x86 backend for VC but not x64. That was years ago. I used the framework two or three years back and even then it seemed &#39;dead&#39;, although very powerful. LLVM has gone on to show the importance of a like-framework.<p>posted by Granville Barnett</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634732871833059519</link>
		<pubDate>Tue, 22 May 2012 12:39:43 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634732871833059519</guid>
		<dc:creator>Granville Barnett</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>I find the title misleading. I was expecting some tips on how to write code in order to increase the likelihood of the compiler vectorizing a loop. Furthermore, I am left with the following questions:</p><ol><li>Does the array size need to be known at compile time? </li><li>Do you need to use the index syntax (array[i]) or can you use pointers? What about iterators? </li><li>Can the compiler vectorize operations on a std::vector&lt;&gt; ? </li><li>Are there any operations that will prevent the compiler from vectorizing? ex. branching, trigonometry functions etc. </li><li>If the compiler detects a cross-iteration dependency on one of the many operations in a loop, will it split the work in one vectorized loop and one scalar loop? </li></ol><p>posted by philippecp</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733397171875187</link>
		<pubDate>Wed, 23 May 2012 03:15:17 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733397171875187</guid>
		<dc:creator>philippecp</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@<a href="/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733397171875187">philippecp</a>: I'd encourage you to check out Jim's excellent (and detailed) blog posts on this subject matter: <br><br><a href="http://blogs.msdn.com/b/nativeconcurrency/archive/2012/05/22/auto-vectorizer-in-visual-studio-11-did-it-work.aspx">http://blogs.msdn.com/b/nativeconcurrency/archive/2012/05/22/auto-vectorizer-in-visual-studio-11-did-it-work.aspx</a><br><br>C</p><p>posted by Charles</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733912189231619</link>
		<pubDate>Wed, 23 May 2012 17:33:38 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733912189231619</guid>
		<dc:creator>Charles</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@<a href="/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634732871833059519">Granville Barnett</a>:</p><p>Phoenix lives on quietly.&nbsp; A slimmed-down version of the framework was adopted into a Microsoft-internal project.</p><p>posted by jimhogg</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734218496380245</link>
		<pubDate>Thu, 24 May 2012 02:04:09 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734218496380245</guid>
		<dc:creator>jimhogg</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@<a href="/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733397171875187">philippecp</a>:</p><p>Here are brief answers:</p><ul><li>Does the array size need to be known at compile time? - no. but indexing into that array is limited to forms like [i *&nbsp;K &#43; j] where K must be a compile-time constant.&nbsp; If you are flattening a 2-D array onto one dimension, then K appears as the row-length. </li><li></li><li>Do you need to use the index syntax (array[i]) or can you use pointers? What about iterators?&nbsp; Pointers work. </li><li></li><li>Can the compiler vectorize operations on a std::vector&lt;&gt; ?&nbsp; Sometimes. </li><li></li><li>Are there any operations that will prevent the compiler from vectorizing? ex. branching, trigonometry functions etc.&nbsp; Conditionals, yes.&nbsp; Trig functions, no. </li><li></li><li>If the compiler detects a cross-iteration dependency on one of the many operations in a loop, will it split the work in one vectorized loop and one scalar loop?&nbsp; For certain patterns, yes.&nbsp; But I'd shy away from saying we've conquered this in first release. </li></ul><p>As Charles mentioned, please checkout the blog - it answers most of these topics in more detail.&nbsp; (Just published episode 6 earlier today)</p><p>posted by jimhogg</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734222305928437</link>
		<pubDate>Thu, 24 May 2012 02:10:30 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734222305928437</guid>
		<dc:creator>jimhogg</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@<a href="/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634733912189231619">Charles</a>:Great, that blog was exactly what I was looking for! Now if only an almighty moderator could add a link to this blog as a &quot;see also&quot; in the video description so that other curious minds that have had their interest piqued can dig deeper, that would be wonderful <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-5.gif?v=c9' alt='Wink' /></p><p>posted by philippecp</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734287547258330</link>
		<pubDate>Thu, 24 May 2012 03:59:14 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634734287547258330</guid>
		<dc:creator>philippecp</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>I have a question about AVX (YMM registers) support: does the new autovectorizer also support wider 256-bit YMM registers (Sandy Bridge&#43;)?</p><p>VC 10 also had support for SSE2 and AVX (via a command line) instructions. From the talk it seemed as if there wasn't any support for SIMD instructions.</p><p>Also the floating point addition was a bit misleading, since *AX and *BX registers are integer registers, while ADDPS means &quot;Add Packed Single-Precision FP Values&quot;. You should have used an x87 FPU or SSE2 registers (as 32-bit float) as your nonvectorized example.</p><p>posted by JohnSawyer</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634736459149981591</link>
		<pubDate>Sat, 26 May 2012 16:18:34 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634736459149981591</guid>
		<dc:creator>JohnSawyer</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[I found this presentation lacking in several technical ways&#58;<br><br>1&#41; The first example showed the values 1.10 and 1.20 being loaded into RAX and RBX. These are integer registers, and 1.10 and 1.20 are not integers. Oops.<br>2&#41; The optimization shown only works if there is guaranteed to be no aliasing between &#39;a&#39; and &#39;b&#39;. If &#39;a&#39; and &#39;b&#39; are pointers then the optimization cannot be done. This seems like an extremely important point and it makes me sad that it is glossed over.<br>3&#41; Jim goes to great lengths to explain that this optimization will not change the results at all. Given that his example is a moderately complex example using floating-point math &#40;sin and cos&#41; it is extremely unlikely that his guarantee is correct. If nothing else, the change from x87 to SSE &#42;will&#42; change the results, even if the vectorization itself does not.<br>4&#41; He suggested that the only reason he got a 2.9x speedup instead of a 4x speedup was because of compiler limitations. That suggests that developers should always expect a 4x speedup but that is not practical for all sorts of reasons such as processor detection overhead, memory bandwidth, etc.<br><p>posted by Bruce Dawson</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634737531633956823</link>
		<pubDate>Sun, 27 May 2012 22:06:03 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634737531633956823</guid>
		<dc:creator>Bruce Dawson</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@Bruce</p><p>1)&nbsp; Yes, the real story would compute these floats in the low 32-bits of the XMM registers.&nbsp; See the <a href="http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/24/auto-vectorizer-in-visual-studio-11-how-it-works.aspx">How it Works blog post</a> for an&nbsp;explanation that addresses this 'untruth'.</p><p>2) Yes, aliasing complicates the story.&nbsp; However,&nbsp;pointers don't always prevent vectorization.&nbsp; In general, the auto-vectorizer will include runtime checks against&nbsp;aliasing.&nbsp; Where possible, via whole-program-optimization, it can sometimes prove the lack of aliasing, and therefore elide the runtime checks.&nbsp; (I mentioned aliasing a couple of times during&nbsp;the blog; next episode covers it in a little more depth)</p><p>3)&nbsp; In VS11, default floating-point calculations will use SSE instructions.&nbsp; Auto-vectorization will produce the same results as the scalar, SSE instructions.&nbsp; (Yes, results might differ between 32-bit SSE and 80-bit x87.&nbsp; I should have been explicit - I was comparing scalar SSE versus vector SSE versions of the program).&nbsp; In other cases, such as reductions, auto-vectorization CAN produce results different from scalar SSE code (due to non-associativity).&nbsp; But we only perform auto-vectorization for such cases&nbsp;under the /fp:fast flag)</p><p>4)&nbsp; I was speaking about this particular example - future compiler improvements should raise the speedup above 2.9X, heading towards 4X.&nbsp; For other loops, there are, of course, many factors that limit the speedup.&nbsp; (The topic of a future blog post, already drafted but not published).</p><p>I'd encourage folks to read the <a href="http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/12/auto-vectorizer-in-visual-studio-11-overview.aspx">auto-vectorization blog</a> - it includes&nbsp;about 6 posts now, allowing more time to dig into details than the brief 15-minutes&nbsp;available in this video.</p><p>posted by jimhogg</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634738593032503789</link>
		<pubDate>Tue, 29 May 2012 03:35:03 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634738593032503789</guid>
		<dc:creator>jimhogg</dc:creator>
	</item>
	<item>
		<title>Re: Getting the most out of the C++ compiler</title>
		<description>
			<![CDATA[<p>@JohnSawyer:</p><p>No, I'm afraid the auto-vectorizer doesn't use AVX in this first release.&nbsp; (But high up on our&nbsp;TODO list!)</p><p>In&nbsp;VS2010, the /arch:SSE and /arch:SSE2 switches, on x86,&nbsp;tell the compiler to use XXM registers for floating-point calculations, rather than x87 registers.&nbsp; Similarly, the /arch:AVX switch tells the compiler to emit AVX instructions, rather than SSE instructions.&nbsp; But in all those cases, it emits just&nbsp;SCALAR instructions.&nbsp; It is only with the advent of the auto-vectorizer that the compiler emits SIMD instructions that make full use of the wide&nbsp;XMM vector registers.</p><p>*AX, *BX.&nbsp; Yes, sorry.&nbsp; No excuse!&nbsp; (As I replied to Bruce, above, I was more careful with this example in the blog)</p><p>posted by jimhogg</p>]]>
		</description>
		<link>http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634740103449493931</link>
		<pubDate>Wed, 30 May 2012 21:32:24 GMT</pubDate>
		<guid isPermaLink="true">http://channel9.msdn.com/Events/Windows-Camp/Developing-Windows-8-Metro-style-apps-in-Cpp/Getting-the-most-out-of-the-MSVC-compiler-AutoVectorizer#c634740103449493931</guid>
		<dc:creator>jimhogg</dc:creator>
	</item>
</channel>
</rss>