Going Deep

Silviu Calinoiu: Inside Windows 7 - Fault Tolerant Heap

Download this episode

Download Video


The Fault Tolerant Heap (FTH) is a subsystem of Windows 7 responsible for monitoring application crashes and autonomously applying mitigations to prevent future crashes on a per application basis. For the vast majority of users, FTH will function with no need for intervention or change on their part.

Principal Development Lead and rock star developer Silviu Calinoiu is the mastermind behind FTH. Here, we go deep into how FTH works and why it's designed the way it is.

The Fault Tolerant Heap is another example of the low level efficiency built into the system: FTH automatically corrects memory faults that cause applications to crash which has the pleasant side effect of preventing future crashes. How does FTH work, exactly? What types of memory problems does it address, specifically? How do developers monitor FTH events and can they override FTH's behavior? What does this all mean to the average user?

FTH, as an autonomous monitoring and correction system, represents a step in the right direction for the evolution of a more homeostatic general purpose operating system. Simply, Windows is getting smarter in the sense that it's increasingly becoming better at self-regulation and self-healing. Yes, there's a very long way to go, but we're making real progress.

You will continue to learn about recoverability in Windows over the coming months here on C9.  

Tune in.



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • pdidenko
    • JohnFrum

      Neat feature and good video.  I learned about this feature a few weeks ago while debugging my service.  It had been crashing and then it seemed fixed and I couldn't repro it again.  Moved it to another box and it started crashing again.  Finally I noticed a line in the debugger that I knew wasn't mine about FTH and thought "WTH?  Ahh, nice!".  I like it but I'll turn it off on my test systems.

    • efremovda

      Question: With FTH on will it show me stack trace of the moment when buffer overrun occurs? Or it just hides (because of this additional memory) it and then later I will discover that sometime it was? We did kind of this stuff (a wrapper for malloc realloc and free functions) for our app but we can't control pointers so we saw actually that buffer was overrun only when this block was freed.

    • doncote

      I think the article pdidenko recommended is useful

    • Silviu Calinoiu

      Unfortunately it detects it during free. In order to detect it when the buffer overruns happens you need to play tricks with how the blocks are laid out in memory (each in a different page) and then play tricks with the protection of the following page. There is actually something that does this called page heap which is part of Application Verifier and you can download it from here: http://www.microsoft.com/DownLoads/details.aspx?familyid=C4A25AB9-649D-4A1B-B4A7-C9D8B095DF18&displaylang=en 


      This however has drastic performance implications and therefore is used only in testing environments. So the deal with FTH is just to shave some of these issues and try to detect as much as possible. It is still better then crashing with an innocent victim on the stack because now at least you know the entity freeing the block had some issue in the code paths manipulating that block. Essentially it is not different from the approach you used in your project. Just generalizes it for everybody. And debugging still requires lots of thinking but less than before.

    • raptor3676

      While I think is great to have such tolerance systems in the OS,  I can't help thinking about the hords of lazy/mediocre developers that are going to leave those bugs behind.

    • efremovda

      Thanks for reply. Anyway it's great to have such things. It will help us to calm down customers while we search through issues. And I hope that it will not lazy us Smiley as raptor3676 said.

    • Charles

      This isn't a mechanism for creating lazy developers. If anything, it should help you find and fix bugs faster while the customer experiencing the execution of your problematic code does not have to suffer through the Crash Experience...


      Also, as Silviu made very clear in the discussion, this is not a silver bullet and is only the first version of the technology. Still, Silviu should be commended for his great engineering and we all look forward to what he's cooking up for next time...



    • tomkirbygre​en

      Finally got round to watching this, it’s a classic Channel 9 video. I couldn’t help but grin when Charles became sufficiently involved to say that this was part of the awesome reality behind the marketing cr*p. So true, it’s this kind of deep thinking that hopefully results in fewer frowning users and who knows, maybe some of ‘em will even actually smile. But back the video: the perfect balance of real and fascinating detail, humanity (you got to love it when someone doing work as important as Silviu has to prop his displays up on printer paper), humour and deep honesty. Kudos to all involved, not least Silviu Calinoiu himself.

    • Charles

      Thank you. In some sense, sometimes I feel that the honest conversational approach to Channel 9 is undervalued (or just taken for granted). It's really nice to hear this kind of feedback. Makes me want to keep doing this stuff.



    • zian

      Thanks so much for the deep dive into this effectively automagically-repairing feature in Windows.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.