abcs abcs

Niner since 2010


  • GoingNative 7: VC11 ​Auto-​Vectorizer, C++ NOW, Lang.NEXT

    Windows XP is 11 years old, and VC is finally dropping support for it.

    SSE2 is also 11 years old, but VC is only starting to support vectorization.

    This begs the question: What changes in the landscape prompted Microsoft to make autovectorization a priority for VC11 now in 2012? Compared to other optimizations, why was the addition of autovectorization into VC not justified in previous releases? Especially considering that even the now 10-year-old Intel C++ 7.0 supported this performance feature.

    (20'30") VC11 recognizes non-unit stride array references. Does this imply that VC11 implements gather/scatter-style vectorization (movsd/movlpd + movhpd)?

    (23'30") VC11 is capable of replacing a loop with library calls. Besides memset/memcpy, what other idioms are recognized?

    (30'00") VC11 has an equivalent of Intel's SVML for vectorizing transcendental functions. What functions are covered? What are their accuracy in terms of ulps? How do they compare against SVML in performance? SVML requires the default floating point enviroment (rounding mode, etc.). Does VC11 have the same limitation?

    (35'20") The issue of data alignment was brought up. Does VC11 generate multiple versions of a loop when data alignment is unknown or simply use load/store sequences (movsd + movhpd or movupd) for unaligned data? How does it cater to microachitecture characteristics of processors of different generations and/or from different vendors, especially when their instruction set support is identical? For example, assuming only SSE2 support, pre-Nehalem Intel processors have great latencies with unaligned accesses, but K10+ AMD processors have no problem with that.

    (37'00") Charles asked about targeting GPU using pure C++ without language extensions. What is Microsoft's vision on OpenACC? Furthermore, OpenMP is expected to absorb OpenACC when the latter matures. Is Microsoft considering going beyond OpenMP 2.0 and supporting more declarative parallel programming models?

    (45'00) SPEC2006 was mentioned. How does VC11's generated code performance fare against state-of-the-art vectorizing compilers such as Intel C++ in SPEC2006? Besides benchmarks, how much benefit does autovectorization bring about when compiling Microsoft products?