Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Derp

Derp Derp

Niner since 2012

  • GoingNative 7: VC11 Auto-Vectorizer, C++ NOW, Lang.NEXT

    @jimhogg:

    > If you are asking whether MSVC gets the right answer for the following snippet, the answer is yes - for both a Debug (/Od) and Release (/O2) build.  ie, it correctly handles both no-overlap, and exact-overlap.

    Okay but did VC generate code which handles exact overlapping because the compiler detected the exact overlap idiom or did the compiler generate code which was based on the assumption that the 2 arrays don't overlap at all but which just happens to work for exact overlap by coincidence?

    You can't answer this question by examining the assembly generated. You can only answer it by examining the passes inside the compiler and/or by asking the team members who implemented these passes if they're aware of the exact overlap idiom and if the passes they wrote implemented it correctly.

    e.g. if you ask a random compiler team member, "what is vadd1 supposed to do or why it was written that way?" could they answer "it adds arrays which either overlap exactly or don't overlap at all" and fully understand why?

    > I'm not sure whether you are concerned that MSVC produces wrong answer in the presence of __restrict (we don't know of any).  Or whether we ignore opportunities for optimizations (as permitted by the standard) that __restrict makes possible?

    My concern is that the standard uses very tricky wording and the exact overlap idiom is not that obvious so I'm wondering if VC generated the correct code by design or by coincidence. I'm also trying to make the VC team members aware of this subtlety in the standard so they can continue to generate fast and correct code in the future.

  • GoingNative 7: VC11 Auto-Vectorizer, C++ NOW, Lang.NEXT

    @jimhogg:(I'm assuming the compiler extension __restrict uses the same exact semantics as specified in C99).

    T foo;
    vadd1(&foo, &foo, sizeof(foo));

    is NOT undefined. The C99 standard doesn't care what's assigned to a restricted pointer, it only cares about tracking the flow of "expressions based on restricted pointers" when it comes time to deference them. Pay attention to the variable s in vadd1. Using the semantics specified in C99, s is treated as if it were "an expression based on the restricted pointer dest" or "an expression based on the restricted pointer src".

    e.g. This code is fine:
    T * restrict a = &whatever;
    T * b = a;
    *a = *b;

    But this code is not:
    T * restrict a = &whatever;
    T * restrict b = a;
    *a = *b;

    Because in the first case, b is treated as "an expression based on the restricted pointer a". In the second case, b is a new restrict pointer (and this only matters at deference time, not at assignment time, i.e. if the last statement *a = *b weren't there, both would be fine).

    In the vadd1 example, we assign s (which is NOT a restrict pointer but an expression based on a restrict pointer) using the ternary operator to make it "an expression based on the restricted pointer src" or "an expression based on the restricted pointer dest" which is subtle trick that makes this code valid for exact overlap.

    So my 4 questions were:

    1. Are the members of the Visual C++ compiler team aware of the "exact overlap using restrict" idiom and understand the subtlety it's based on?
    2. Is the Visual C++ optimizer written to NOT transform:
      T * restrict a = whatever1;
      const T * restrict b = whatever2;
      const T * c = (a == b ? a : b);
      *a = *c;

      As if you had written:
      T * restrict a = whatever1;
      const T * restrict b = whatever2;
      const T * c = b;
      *a = *c;

      What I mean is, does the Visual C++ optimizer algebraically simplify the expression (a == b ? a : b) to just b? If so, this is an invalid transformation because even though the values compare equal, the "based on-ness" of c would change from "either based on a or based on b" to "based on b" which would be wrong for exact overlap.

      In other words, C99 added an implicit property to pointers called "based on-ness" which compilers can use to augment its alias analysis code. If a compiler aggressively optimizes using restrict, it must carefully keep track of an expression's "based on-ness" in its optimization passes and NOT assume that because 2 pointer expressions evaluate to the same value, their based on-ness is the same too.

    3. If the team is aware of this, do they recognize this code as the programmer intending to inform the compiler that the code they are working on only handles exact overlap or complete non-overlap?

      In other words, I expect the compiler to recognize this idiom, elide the expression (a == b ? a : b) and then generate fast code which handles both exact overlap and complete non-overlap (and does something undefined with partial overlap).
    4. Is PGO aware of this idiom? I know it's notorious for screwing up tricky but valid code.

    My questions basically boil down to me wondering a. does Visual C++ handle "based on-ness" correctly in its optimization passes (especially PGO) b. does Visual C++ correctly handle the exact overlap idiom c. does Visual C++ generate great code for the exact overlap idiom (and if it generates great code, is it because it correctly recognized the exact overlap idiom or is it because it botched an optimization pass and generated code assuming no overlap which just happened to also work with exact overlap)?

  • GoingNative 7: VC11 Auto-Vectorizer, C++ NOW, Lang.NEXT

    @jimhogg: does VC correctly recognize the "exact overlap with restrict" idiom (even with tools like PGO)?

    void vadd1(T * restrict dest, const T * restrict src, const size_t n)
    {
        const T * s = dest == src ? dest : src;
    
        for (size_t i = 0; i != n; ++i) {
            *dest++ += *s++;
        }
    }


    s is either based on dest or based on src meaning vadd1 either supports non-overlapping or exact overlapping ranges (no partial overlap). If you wrote this instead:

    void vadd2(T * restrict dest, const T * restrict src, const size_t n)
    {
        for (size_t i = 0; i != n; ++i) {
            *dest++ += *src++;
        }
    }


    This code would be undefined when you pass in two exact overlapping ranges (i.e. vadd2 only supports non-overlapping ranges). In other words Visual Studio is not allowed to rewrite vadd1 as:

    const T * s = dest == src ? src : src;
    const T * s = src;


    Because these expressions no longer carry the "based on dest" as the original vadd1 does and would be undefined for exact overlap.