Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Comments

Matt Matt_PD
  • C++ and Beyond 2012: Herb Sutter - You don't know [blank] and [blank]

    @SteveRichter: Clang gives a more readable diagnostic message:

    source.cpp:6:79: error: call to deleted constructor of 'std::unique_ptr<std::wstring>'
    void CallTester2( ) { std::unique_ptr<std::wstring> pString ; Tester2( pString ) ; }
                                                                         ^~~~~~~
    note: function has been explicitly marked deleted here unique_ptr(const unique_ptr&) = delete;
    

    See: http://liveworkspace.org/code/1zPqd0$1

    In other words, copy constructor is deleted -- in other words, unique_ptr is non-copyable.

    BTW, regarding enums, you can always implement something like this: http://www.ishani.org/web/2012/fancy-c-enums/

  • YOW! 2012: Fred George - Programmer Anarchy

    Hi Charles!

    Really liked the questions on DirectX/GameDev scenario, bridging the CS education <-> RealWorld practice gaps, and legacy software!

    Thanks for all the great interviews!

  • Stephan T. Lavavej - Core C++, 7 of n

    Thank you for all your work, Stephen! Merry Christmas & Happy New Year! Smiley

  • Stephan T. Lavavej - Core C++, 4 of n

    @STL: Thanks for the lecture! // Awesome as always Smiley

    Regarding the NVI -- what are some good examples where we'd prefer "protected" access over "private" (and vice versa)?

    I guess it boils down to choosing between protected-virtuals [23.3] and private-virtuals [23.4] in the C++ FAQ, but unfortunately there's no direct comparison in there:
    http://www.parashift.com/c++-faq-lite/protected-virtuals.html

    Another question -- is NVI the same as or different from the Template Method pattern?

    This source calls TM pattern "more generic" (without specifying what's more generic about it):
    http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Non-Virtual_Interface

    While this one seems to show TM as exactly the same thing:
    http://www.parashift.com/c++-faq-lite/private-virtuals.html

  • Alexandrescu, Meyers, Sutter: On Static If, C++11 in 2012, Modern Libraries, and Metaprogramming

    Thanks for passing on the (admittedly long-winded) question, Charles!
    And, of course, thanks for the insightful answers from all! Smiley

    In particular, Andrei's sentiments are quite compatible to mine here -- for instance, the fact that more metaprogramming facilities go hand in hand with the greater use of (and the need for) the supporting infrastructure (or "a web of supporting facilities in the language and in the std. lib.," as in CTFE in the tangent function values compile-time tabulation example). That's also why I'm looking into this area with a renewed interest now, given C++11's progress -- in fact, constexpr, also mentioned in the discussion, is one of the new C++11 features I'm most happy about in this context and I'm already thinking of it as being a part of this very supporting infrastructure Smiley

    In particular, CTFE (D) seems to be one of the requirements for the aforementioned example and constexpr (C++11) seems to provide just that: http://en.wikipedia.org/wiki/Compile_time_function_execution

    // Incidentally, constexpr being another form of compile-time language is also a very good point, there are already some nice examples out there illustrating the code simplification (relative to C++03 TMP) it allows for: http://kaizer.se/wiki/log/post/C++_constexpr/ & http://kaizer.se/wiki/log/post/C++_constexpr_foldr/

    Charles has already linked to the questions thread with the link to Agner's example so I won't spam here again, but can also add Compile-Time Language (CTL) available in High Level Assembly (HLA) as another illustration:
    http://www.phatcode.net/res/260/files/html/HLACompileTimeLanguagea3.html#999072
    // See also 8.3 Writing Compile-Time "Programs": http://www.phatcode.net/res/260/files/html/Macros2.html#1009074

    I think the point on making the language available for everyone ("language should be comphrehensible and usable for 100% of its users") is also a great point which I fully agree with. In fact, that's an important motivating factor -- ideally TMP should *not* be a feature just for the "select few" and I think we agree that it's more a result of a historical coincidence in the case of TMP rather than an inherent difficulty of TMP itself. Hence, Herb's and Scott's points (as in the worries about repelled-not-attracted or run-screaming reactions ;]) are fully and well taken (it's also true TMP is indeed just one area of C++ and not in quite as widespread use as the others). At the same time, simplifying the syntax / removing the awkwardness are directly meant to address this exact point Smiley In other words, this is how I view the role of extensions like "static_if" (instead of hand-crafted template specialization equivalent) or, say, "static_while" (instead of, say, hand-crafted recursive template instantiation equivalent) -- similarly to "constexpr", they allow to achieve what's already (at least theoretically / in some cases) possible, but do it in a natural fashion providing accessibility and ease of use to a wider base of developers.

  • Stephan T. Lavavej: Core C++, 2 of n

    @STL: Thanks for the answer (and the great lectures)! Now I'm looking forward to see the next episode even more Smiley

     

  • Stephan T. Lavavej: Core C++, 2 of n

    @NotFredSafe: Take a look a the following three-parter:

    http://codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/
    http://codesynthesis.com/~boris/blog/2012/06/26/efficient-argument-passing-cxx11-part2/
    http://codesynthesis.com/~boris/blog/2012/07/03/efficient-argument-passing-cxx11-part3/

    Incidentally, STL, would you agree with the above guidelines (summarized in part 3)?

     

  • C++ Accelerated Massive Parallelism in Visual C++ 2012

    Kate: I have a question about the call to member-function "synchronize" with the goal of including the copy-out time when measuring total execution time for benchmarking purposes.

    I understand this is recommended due to asynchronicity of "parallel_for_each" and the associated copy-(only-)on-demand optimization of the captured concurrency::array changed in the lambda passed to "parallel_for_each" (which could prevent a copy-out from occurring in the benchmarked execution path).

    I'm wondering, would a deep copy operation (from a GPU-bound array to a CPU-bound vector) called before stopping the timer also count? As in, for instance:

    std::vector<double> CPU_V;
    concurrency::array<double> GPU_V;
    // ...
    CPU_V = GPU_V;

    A sub-question, just to make sure I understand this correctly -- I assume the above call to the assignment operator invokes a (synchronous) copy (as opposed to copy_async) due to having to go via the result of the following implicit conversion operator present in "amp.h" (in the definition of the "concurrency::array" class template):

        /// <summary>
        ///     Implicitly converts this array into a vector by copying.
        /// </summary>
        operator std::vector<_Value_type>() const __CPU_ONLY
        {
            std::vector<_Value_type> _return_vector(extent.size());
            Concurrency::copy(*this, _return_vector.begin());
            
            return _return_vector;
        }


    Is this correct?

  • GoingNative 7: VC11 Auto-Vectorizer, C++ NOW, Lang.NEXT

    Great interviews!

    Definitely, more compiler guys in the future sounds fantastic! Smiley

    I'll use this opportunity to ask a few questions -- feel free to sneak them in the future interviews, although answers in the comments aren't bad at all, of course! I'll try to tie them up to the video for better context Smiley


    0.  Loop vectorization limitations and requirements (the example for-loop; also reduce/sum around 26:50)
    Are there any  divisibility requirements (XMM is 128 bit, what if we have more/less data than the amount divisible by this) -- is there support for auto duplication/truncation/padding or is such an irregular loop (i.e., one with data length not satisfying the divisibility property) removed from the optimizer's considerations for now or are there techniques to handle this?

    1. Beyond {SSE(1), SSE2, AVX}; in particular, SSE4.1 (mentioned around 22:50)

    a. Are there any plans for the SSE4.1 support?
    I admit I'm asking since I have a vested interest here, numerical linear algebra is quite useful in what I do and there are some interesting instructions in this set, such as:
    DPPS, DPPD (dot product a.k.a. inner product) // they're useful for a lot of applications, in fact: http://www.virtualdub.org/blog/pivot/entry.php?id=150
    // correspond to _mm_dp_ps, _mm_dp_pd intrinsics --  http://msdn.microsoft.com/en-us/library/bb514034%28v=vs.110%29.aspx

    Intel did a mini-benchmark some time ago demonstrating a speed-up of the dot product computation in the examples; already the SSE3 version (using HADDPS) was 26% faster, while SSE4 version (using DPPS) was 72% faster than the base case:
    http://www.intel.com/technology/itj/2008/v12i3/3-paper/6-examples.htm

    This makes SSE4.x very exciting, AutoVec support would be great here!

    b. Another, perhaps more far-reaching question -- if/when this gets supported, will there be an integration with STL, such as, say, std::inner_product would automatically make use of the above instructions where applicable?


    2. Comparison of the current features and future evolution thereof with GCC (benchmarking w/ GCC mentioned around 41:50)

    Some topics of interest:
    a. GCC Graphite comparison -- how does the AutoVec fare, relatively?

    Example: http://openwall.info/wiki/internal/gcc-local-build#Parallel-processing-with-GCC
    Features/flags" -floop-parallelize-all -ftree-parallelize-loops=8

    There's a nice discussion of some topics for GCC that could be interesting to relate to:
    - limitations // http://gcc.gnu.org/wiki/Graphite/Parallelization
    - behind the scenes // http://gcc.gnu.org/wiki/Graphite?action=AttachFile&do=view&target=graphite_lambda_tutorial.pdf
    What were the analogous implementation choices and the resulting limitations in AutoVec?

    // out of curiosity -- is the polyhedral model /* http://en.wikipedia.org/wiki/Frameworks_supporting_the_polyhedral_model */ also employed by AutoVec or is it something different here?

    b. Profile Mode // "Goal: Give performance improvement advice based on recognition of suboptimal usage patterns of the standard library."

    This is actually pretty cool and integrates nicely with C++ STL -- e.g., if you try a sub-optimal insertion pattern with std::Vector you'll have a nice, human-readable advice suggesting std::list:
    http://gcc.gnu.org/onlinedocs/libstdc++/manual/profile_mode.html#manual.ext.profile_mode.using

    Is there a similar feature in plans?


    3. Compilation back-end parallelization and inlining

    Does the parallel compilation work with inlining? For instance in the discussed case of {main-foo-a1, main-bar-a2} call tree (around 39:20), if "foo" gets inlined or "a2" gets inlined (note the depth change) does the compiler have to recompile it in any of these cases?


    4. Devirtualization (around 40:50) -- limits/changes.

    Some devirtualization was already available a while ago: http://msdn.microsoft.com/en-us/magazine/cc301407.aspx
    What are the most interesting changes in the current release / what limits have been pushed / what limits remain?


    Once again, thanks for the great episode!

  • C++ AMP: Yossi Levanoni - Architecture and Design

    Nice interview! I've got one question, perhaps you can pass it along: is there any overhead involved in the rank-N to rank-3 translation done by the GPU stub (e.g., is it compile-time or run-time)?