Louis Lafreniere - VC++ backend compiler

Download this episode

Download Video

Description

Louis Lafreniere has been a developer on the VC++ compiler team for a long time; 15 years, to be exact. Specifically, Louis works on the backend compiler. What's a backend compiler? How's it evolved over the years? Where's it going? Watch and listen. Good stuff.

Embed

Format

Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image
      Pon
      Cool, rather interesting Smiley

      I must say, I enjoy these compiler videos. Keep em' coming Big Smile
    • User profile image
      pierrelecle​rcq

      Great interview!

      Concerning the ia64 architecture, there was a mention saying
      the compiler had to do more of the smart to optimize code layout.
      So what would be the reasoning for this change? Is this about
      making the architecture simpler? (Assuming it's more complex
      on other aspects).

      Also appreciate a lot the improvements in back-end code generation for VC++. This is nice to see a video like this, as there are good
      surprises in code generation that we could only discover by
      stepping through the disassembler window.

      Additions to the language, or new libraries change the way we
      write code, but discovering new optimizations really gives a
      different perspective. For example, the removal of the copy of an
       object being returned from a function allows the writing of code
      that will do much more use of automatic variables (and therefore
      will release a lot from pointer management).

      I guess that someone who was writing C++ code, 10 or 15 years
      ago, and now still doing so, would certainly have the feeling he/she
      is using a different language, even though it's still C++.

      By the way, as more developers get familiar with c# coding style,
      it may be that more and more C++ classes could be written in a
      header, rather than the usual .h/.cpp pair. If a Visual studio guy
      reads this, this would be nice to factor this into the smart indent.



    • User profile image
      Sven Groot
      Interesting video. One thing though, Charles: in the beginning your wording kind of implies that this frontend/backend setup is something unique to C++, while in fact every compiler works this way. Heck, I wrote a compiler for a subset of pascal in third year Computer Science, and even that had a separate frontend and backend. Smiley I'm sure you didn't mean it like that though, it just sounded that way.

      In the end you talked about making the compiler multithreaded. I think it's worth mentioning that although the compiler in VS2005 isn't, msbuild is. If you have a solution with more than one project, msbuild/VS2005 will build more than one project at the same time (if possible based on the project's dependencies) based on the number of CPUs in you system.
    • User profile image
      Charles
      Sven Groot wrote:
      Interesting video. One thing though, Charles: in the beginning your wording kind of implies that this frontend/backend setup is something unique to C++, while in fact every compiler works this way. Heck, I wrote a compiler for a subset of pascal in third year Computer Science, and even that had a separate frontend and backend. I'm sure you didn't mean it like that though, it just sounded that way.


      The topic at hand is C++, so I related the frontend backend statement to, well,  C++...  Smiley

      C
    • User profile image
      louisl
      Hi Pierre,
      Glad you liked the interview.  

      Memory speed has not kept up with CPU speed increases in the past few decades.  So memory latency has become a caused a big bottleneck.  There are 2 different ways currently of approaching this problem. 

      One is to really on the hardware to dynamically figure out the dependencies between instructions, and allow them to execute out-of-order as soon as their inputs are ready.  This is the approach used by most chips today.

      The IA64 took a different approach, by adding flexibility in the instruction set to give tools to the compilers to schedule the instructions easily in a way such that loads can be executed far from their uses.  For example, if for "if (x) { y = *p; }", the compiler would normally not be able to hoist the load of *p outside of the if(), in case it was protecting the load to cause an exception.  IA64 provides a way to hoist this load, and differ the exception until you get inside the if().  If you don't, no exception is generated. 

      For "*q = x; y = *p;", the compiler would also not normally be able to hoist the *p load above the *q store in case they point to the same address.  The IA64 however provides a way to do this load ahead, and then check at the y= if the load was invalidated by the subsequent store.

      Branch misprediction is also a problem for CPUs with deep pipeline.  But the IA64 instruction can be set to be conditionaly executed based on true/false register predicates, which allows us to generate straigh line code if we want for if/else construct, avoiding the chance of mispredicted branches.

      This approach does avoid a lot of the complexity of the out-of-order execution, but these tools themselves do add a lot of complexity as well.

      The belief back when the IA64 was designed was that the x86 speed was approaching is peek, that out-of-order execution wouldn't be enough to avoid the memory bottleneck, and that they couldn't keep cranking up the clock speed on x86.  The though was that they would be able to crank it up higher on ia64.

      But doing a good job at generating code for IA64 is a very hard problem.  Using these "tools" isn't usually free, and so they involve a lot of trade-offs.  Profile guided optimization does provide a lot of info to the compiler to help making these decisions, but it is still very hard to take full advantage of the machine.


      -- Louis Lafreniere
    • User profile image
      billh

      Again, great video. More! You should interview some assembly language people...I would like to hear about the differences and changes over the years in the Pentium architecture and how your teams have adapted to that on very low levels. You kind of hit on that a bit with the multicore discussion here. I've thought a lot about getting back into some assembly programming just for fun (I did a fair amount of it back in the days of the 6502 chips), but am wondering how easy that will be considering the optimization that occurs on the chip itself, the caches, etc.

      Question: how do you target your compiler for different Pentium architectures? From what I remember, Intel seems to alter a few instructions with every generation (from the Pentium to the Pentium II, on up to the current ones). Does your compiler recognize the user's chip and pick the best optimization? How about for programs that are shipped? How do those recognize the user's chip? Or do you not take advantage of the latest additions made by Intel?

      Unfortunately, I do not own a copy of Visual Studio, so maybe those are options in the IDE, I don't know.

    • User profile image
      louisl
      Hi Bill,
      We are working very closely with Intel and AMD to stay on top of the latest architecture changes, and adjust/tune the compiler accordingly.

      We've stopped giving customers the ability to pick which particular chip flavor they want to dirrectly target, since most people want their apps to run fast on the variety of chips on people's desk at that time.  So instead, we try to tune the compiler for the set of chips we thing will be dominent not only after we ship, but after our customer ship their own apps.  So this usually means the current chip that Intel/AMD is working on, plus the current shipping generation, and maybe the one before that as well.  We do provide the /arch:SSE and /arch:SSE2 switches to enable the compiler to use these new instructions (as well as CMOV), but the generated program will not run on the older architectures which don't support these.

      Tuning the generated code (or your assembly code) is a lot harder then it used to be, mainly because of the out-of-order execution.  Back in the 386/486 and even first generation Pentiums, we used to be able to pick up the instruction manual and figure out exactly how many cycles a particular instruction sequence would take, but you can't do that anymore.  You need to know how the machine works and identify the patterns that might cause problems in the out-of-order execution.

      As far as runtime detection of the architecture we run on, the CRT does look at it and take advantages of the SSE/SSE2 instruction when available to speed up some computations, and to move larger chunks of memory at a time.  The generated code from the compiler doesn't do this however.  Doing so would cause a lot of code duplication and our experience has showed that code size is very important for medium to large apps.

      -- Louis Lafreniere
    • User profile image
      pierrelecle​rcq
      louisl wrote:


      As far as runtime detection of the architecture we run on, the CRT does look at it and take advantages of the SSE/SSE2 instruction when available to speed up some computations, and to move larger chunks of memory at a time.  The generated code from the compiler doesn't do this however.  Doing so would cause a lot of code duplication and our experience has showed that code size is very important for medium to large apps.

      -- Louis Lafreniere


      How interesting. We could think the JIT should be able to take
      advantage of runtime detection of the hardware to generate code
      specific to the current processor. Still, as Brandon Bray was pointing
      out the JIT has time constraints stricter than for a regular
      compiler, and therefore cannot spend too much time optimizing.
      One could also wonder how this would impact performance in
      general, as most of the time the difference should be small. (?)

      Are these considerations part of the Phoenix project?

    • User profile image
      pierrelecle​rcq
      louisl wrote:


      As far as runtime detection of the architecture we run on, the CRT does look at it and take advantages of the SSE/SSE2 instruction when available to speed up some computations, and to move larger chunks of memory at a time.  The generated code from the compiler doesn't do this however.  Doing so would cause a lot of code duplication and our experience has showed that code size is very important for medium to large apps.

      -- Louis Lafreniere


      How interesting. We could think the JIT should be able to take
      advantage of runtime detection of the hardware to generate code
      specific to the current processor. Still, as Brandon Bray was pointing
      out the JIT has time constraints stricter than for a regular
      compiler, and therefore cannot spend too much time optimizing.
      One could also wonder how this would impact performance in
      general, as most of the time the difference should be small. (?)

      Are these considerations part of the Phoenix project?

    • User profile image
      pierrelecle​rcq
      Oops, sorry for the double message. Can be edited out?
    • User profile image
      louisl
      Yes the JIT throughput is very important, still instruction selection is quick to do and this would be quite appropriate for a JIT.  The win though wouldn't be very big, and I could be wrong but I don't believe our JITs do any optimization dependent on the host CPU.

      We are currently working on the high level optimizations right now on Phoenix, and will tune the low level machine dependent code generation later on.  This is certainly something we'll consider if we see opportunities.

         -- Louis Lafreniere
    • User profile image
      ldbfrank
      Hi All,
      Anybody could tell me which one is better to develop applications on Windows Mobile Smartphone?
      We want better performance and powerful.
    • User profile image
      ldbfrank
      Hi All,
      Anybody could tell me which one is better to develop applications on Windows Mobile Smartphone?
      We want better performance and powerful.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.