Louis Lafreniere - VC++ backend compiler
- Posted: May 11, 2006 at 1:24 PM
- 45,442 Views
- 13 Comments
Loading User Information from Channel 9
Something went wrong getting user information from Channel 9
Loading User Information from MSDN
Something went wrong getting user information from MSDN
Loading Visual Studio Achievements
Something went wrong getting the Visual Studio Achievements
Right click “Save as…”
Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation,
please create a new thread in our Forums,
or
Contact Us and let us know.
Follow the Discussion
Oops, something didn't work.
What does this mean?
Following an item on Channel 9 allows you to watch for new content and comments that you are interested in. You need to be signed in to Channel 9 to use this feature.What does this mean?
Following an item on Channel 9 allows you to watch for new content and comments that you are interested in and view them all on your notifications page.sign up for email notifications?
I must say, I enjoy these compiler videos. Keep em' coming
Great interview!
Concerning the ia64 architecture, there was a mention saying
the compiler had to do more of the smart to optimize code layout.
So what would be the reasoning for this change? Is this about
making the architecture simpler? (Assuming it's more complex
on other aspects).
Also appreciate a lot the improvements in back-end code generation for VC++. This is nice to see a video like this, as there are good
surprises in code generation that we could only discover by
stepping through the disassembler window.
Additions to the language, or new libraries change the way we
write code, but discovering new optimizations really gives a
different perspective. For example, the removal of the copy of an
object being returned from a function allows the writing of code
that will do much more use of automatic variables (and therefore
will release a lot from pointer management).
I guess that someone who was writing C++ code, 10 or 15 years
ago, and now still doing so, would certainly have the feeling he/she
is using a different language, even though it's still C++.
By the way, as more developers get familiar with c# coding style,
it may be that more and more C++ classes could be written in a
header, rather than the usual .h/.cpp pair. If a Visual studio guy
reads this, this would be nice to factor this into the smart indent.
In the end you talked about making the compiler multithreaded. I think it's worth mentioning that although the compiler in VS2005 isn't, msbuild is. If you have a solution with more than one project, msbuild/VS2005 will build more than one project at the same time (if possible based on the project's dependencies) based on the number of CPUs in you system.
The topic at hand is C++, so I related the frontend backend statement to, well, C++...
C
Glad you liked the interview.
Memory speed has not kept up with CPU speed increases in the past few decades. So memory latency has become a caused a big bottleneck. There are 2 different ways currently of approaching this problem.
One is to really on the hardware to dynamically figure out the dependencies between instructions, and allow them to execute out-of-order as soon as their inputs are ready. This is the approach used by most chips today.
The IA64 took a different approach, by adding flexibility in the instruction set to give tools to the compilers to schedule the instructions easily in a way such that loads can be executed far from their uses. For example, if for "if (x) { y = *p; }", the compiler would normally not be able to hoist the load of *p outside of the if(), in case it was protecting the load to cause an exception. IA64 provides a way to hoist this load, and differ the exception until you get inside the if(). If you don't, no exception is generated.
For "*q = x; y = *p;", the compiler would also not normally be able to hoist the *p load above the *q store in case they point to the same address. The IA64 however provides a way to do this load ahead, and then check at the y= if the load was invalidated by the subsequent store.
Branch misprediction is also a problem for CPUs with deep pipeline. But the IA64 instruction can be set to be conditionaly executed based on true/false register predicates, which allows us to generate straigh line code if we want for if/else construct, avoiding the chance of mispredicted branches.
This approach does avoid a lot of the complexity of the out-of-order execution, but these tools themselves do add a lot of complexity as well.
The belief back when the IA64 was designed was that the x86 speed was approaching is peek, that out-of-order execution wouldn't be enough to avoid the memory bottleneck, and that they couldn't keep cranking up the clock speed on x86. The though was that they would be able to crank it up higher on ia64.
But doing a good job at generating code for IA64 is a very hard problem. Using these "tools" isn't usually free, and so they involve a lot of trade-offs. Profile guided optimization does provide a lot of info to the compiler to help making these decisions, but it is still very hard to take full advantage of the machine.
-- Louis Lafreniere
Again, great video. More! You should interview some assembly language people...I would like to hear about the differences and changes over the years in the Pentium architecture and how your teams have adapted to that on very low levels. You kind of hit on that a bit with the multicore discussion here. I've thought a lot about getting back into some assembly programming just for fun (I did a fair amount of it back in the days of the 6502 chips), but am wondering how easy that will be considering the optimization that occurs on the chip itself, the caches, etc.
Question: how do you target your compiler for different Pentium architectures? From what I remember, Intel seems to alter a few instructions with every generation (from the Pentium to the Pentium II, on up to the current ones). Does your compiler recognize the user's chip and pick the best optimization? How about for programs that are shipped? How do those recognize the user's chip? Or do you not take advantage of the latest additions made by Intel?
Unfortunately, I do not own a copy of Visual Studio, so maybe those are options in the IDE, I don't know.
We are working very closely with Intel and AMD to stay on top of the latest architecture changes, and adjust/tune the compiler accordingly.
We've stopped giving customers the ability to pick which particular chip flavor they want to dirrectly target, since most people want their apps to run fast on the variety of chips on people's desk at that time. So instead, we try to tune the compiler for the set of chips we thing will be dominent not only after we ship, but after our customer ship their own apps. So this usually means the current chip that Intel/AMD is working on, plus the current shipping generation, and maybe the one before that as well. We do provide the /arch:SSE and /arch:SSE2 switches to enable the compiler to use these new instructions (as well as CMOV), but the generated program will not run on the older architectures which don't support these.
Tuning the generated code (or your assembly code) is a lot harder then it used to be, mainly because of the out-of-order execution. Back in the 386/486 and even first generation Pentiums, we used to be able to pick up the instruction manual and figure out exactly how many cycles a particular instruction sequence would take, but you can't do that anymore. You need to know how the machine works and identify the patterns that might cause problems in the out-of-order execution.
As far as runtime detection of the architecture we run on, the CRT does look at it and take advantages of the SSE/SSE2 instruction when available to speed up some computations, and to move larger chunks of memory at a time. The generated code from the compiler doesn't do this however. Doing so would cause a lot of code duplication and our experience has showed that code size is very important for medium to large apps.
-- Louis Lafreniere
How interesting. We could think the JIT should be able to take
advantage of runtime detection of the hardware to generate code
specific to the current processor. Still, as Brandon Bray was pointing
out the JIT has time constraints stricter than for a regular
compiler, and therefore cannot spend too much time optimizing.
One could also wonder how this would impact performance in
general, as most of the time the difference should be small. (?)
Are these considerations part of the Phoenix project?
How interesting. We could think the JIT should be able to take
advantage of runtime detection of the hardware to generate code
specific to the current processor. Still, as Brandon Bray was pointing
out the JIT has time constraints stricter than for a regular
compiler, and therefore cannot spend too much time optimizing.
One could also wonder how this would impact performance in
general, as most of the time the difference should be small. (?)
Are these considerations part of the Phoenix project?
We are currently working on the high level optimizations right now on Phoenix, and will tune the low level machine dependent code generation later on. This is certainly something we'll consider if we see opportunities.
-- Louis Lafreniere
Anybody could tell me which one is better to develop applications on Windows Mobile Smartphone?
We want better performance and powerful.
Anybody could tell me which one is better to develop applications on Windows Mobile Smartphone?
We want better performance and powerful.
Remove this comment
Remove this thread
close