Tech Off Thread

10 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

This is pretty serious bug in Windows (worse in x64) - any hope for a fix?

Back to Forum: Tech Off
  • User profile image
    BitFlipper

    I have had this one particular bug in our product assigned to me for a very long time now, and I have not been able to figure out what is going until yesterday. Basically what happens is that when you resize the main window, some child controls stop resizing and you get corrupted areas on the screen where the controls should have resized to.

    In the past I spent a lot of time debugging our app, thinking it must be something in one of our custom controls like a SuspendLayout/ResumeLayout getting out of sync or something. At that time I was able to determine that for some reason the native Win32 window was resizing, but the .Net control was still reporting the old size. I wasn't getting any OnSizeChanged events even though the native window was resizing.

    Then yesterday I took another stab at this bug since in Windows 7 this turned from a "sometimes reproduces" to "reproduces 100% of the time". I did a search and found out that it is a "limitation" in Windows in the sense that the kernel runs out of stack space if you reach a certain nested child window depth, throws an exception, and (thankfully) doesn't blue-screen but just stops sending any more window size messages to child controls. This issue becomes worse on 64-bit OSes because pointers are now 64-bit and hence the stack runs out of space sooner. Some people report that child windows stop resizing at a depth of 7, but in my case it happens at a depth of around ~15.

    This is absurd. Some questions, observations:

    1. I was not under the impression that the kernel was involved in any UI related activities.
    2. How on earth could there be such a limitation in Windows? Supposedly the "official" supported depth is 50, but 7 to 15 is well below that limit.
    3. This must be a source of older applications breaking on Windows 7, so wasn't this issue highlighted during the backwards-compatibility testing phase?
    4. There is no way I can modify our application at this point, since refactoring the UI to use fewer nested controls will be almost impossible without a redesign, and in addition, it will break all of our automated testing that relies on the current hierarchy. The proposed "fixes" in the link below all seem like pretty nasty hacks to me. How should I "fix" our application?

    Did anyone else run into this issue, and do you know of a good work-around that doesn't involve changing the hierarchy?

    To find out more about this issue, here is a pretty detailed description of it.

     

  • User profile image
    W3bbo

    This is absurd. Some questions, observations:

    1. I was not under the impression that the kernel was involved in any UI related activities.
    2. How on earth could there be such a limitation in Windows? Supposedly the "official" supported depth is 50, but 7 to 15 is well below that limit.
    3. This must be a source of older applications breaking on Windows 7, so wasn't this issue highlighted during the backwards-compatibility testing phase?
    4. There is no way I can modify our application at this point, since refactoring the UI to use fewer nested controls will be almost impossible without a redesign, and in addition, it will break all of our automated testing that relies on the current hierarchy. The proposed "fixes" in the link below all seem like pretty nasty hacks to me. How should I "fix" our application?

    Did anyone else run into this issue, and do you know of a good work-around that doesn't involve changing the hierarchy?

    • You'd be surprised, the windowing system is a key part of the operating system. That's why Windows (and Mac OS) both had lower system requirements than Linux as a desktop operating system in the early to mid-1990s because there wasn't any overhead of running an X-like environment.
    • Even so, having even 7 nested layers of Windows means you're probably doing something wrong.
    • The blog post you linked to says it's been around since XPx64, and I think it only applies to 64-bit executables. The post does say the issue exists on x86, but only for greater levels of nesting because less stack space is consumed by 4-byte pointers vs. 8-byte pointers.
    • What about running your .NET code in 32-bit mode?
  • User profile image
    BitFlipper

    @W3bbo:

    Come now, saying 7 nested windows are too many is like saying "You are holding it wrong". It is just the wrong answer on so many levels. The documentation doesn't even mention any limit from what I can tell. The 50 level limit is what was mentioned in the blog I linked to, I'm not sure where that value came from.

    Our particular product is a "plugin" to an existing application, and as such we are at the mercy of the host and how deep the container it presents to us is already nested. In our case that parent container is already 4 level deep. That means that for our own interface, which contains tabs, user controls, collapsable panels, etc, we only have 3 levels to play with? This is the first time I ran into any such a limit and I have been doing Windows programming for ~15 years. Where in the documentation does it state this limit?

    Seems the only practical "solution" in our case is to implement one of those nasty hacks mentioned in the blog to get around this Windows bug.

    Our application is already running in 32-bit mode, so that won't help us. Would it even have made a difference?

  • User profile image
    BitFlipper

    Check out this old post. I'm not sure where they get their figures from, but apparently the limit in NT was 100, which was then dropped to 50 in XP because it stopped working at around 75. So where does a ridiculously low value of 7 suddenly come from?

    Yes, shoot the messenger, don't acknowledge the root problem. 

  • User profile image
    W3bbo

    Our particular product is a "plugin" to an existing application, and as such we are at the mercy of the host and how deep the container it presents to us is already nested. In our case that parent container is already 4 level deep. That means that for our own interface, which contains tabs, user controls, collapsable panels, etc, we only have 3 levels to play with?

    Sorry, I didn't think of that scenario.

    Our application is already running in 32-bit mode, so that won't help us. Would it even have made a difference?

    The blog post describes the problem being related to the size taken up on the stack thanks to the increased pointer size, I assumed running in x86 mode would use less stack space.

  • User profile image
    BitFlipper

    , W3bbo wrote

     *snip*

    The blog post describes the problem being related to the size taken up on the stack thanks to the increased pointer size, I assumed running in x86 mode would use less stack space.

    I was just wondering whether it would make a difference on x64 since maybe all windows are handled as 64-bit inside the kernel, but I am no expert on kernel stuff so I don't know whether that is true or not. But since our app is already running as 32-bit it is somewhat irrelevant in this case.

    Here is an interesting KB article that is somewhat related. It is a hotfix that limits nesting depth in XP to 50, preventing a BSOD that previously occurred due to too deep nesting. The fact that there are actually applications that try to create "up to 100 nested windows" just points out how woefully inadequate the 7-15 levels in x64 is.

    As far as I'm concerned, if the design limit is 50 but Windows can't even get close to it (and silently stops working), it is a bug. If I create the 8th child window without any error, but Windows then can't manage that window properly, it is bug.

  • User profile image
    BitFlipper

    Ok, I found what I think is an acceptable work-around for this bug. One of the posters that replied on that blog gave a solution to use BeginInvoke on OnSizeChanged, and it works in our case. Fortunately we have our own custom Panel controls, and I found a strategically nested panel that, when applying the fix below, solves the problem. Basically, I did this:

    private bool m_fixResize;
    
    public bool FixResize
    {
        get { return m_fixResize; }
        set { m_fixResize = value; }
    }
    
    protected override void OnSizeChanged(EventArgs e)
    {
        if (m_fixResize && this.Handle != IntPtr.Zero)
        {
            this.BeginInvoke((MethodInvoker)delegate
            {
                AsyncOnSizeChanged(e);
            }
        }
        else
        {
            base.OnSizeChanged(e);
        }
    }
    
    private void AsyncOnSizeChanged(EventArgs e)
    {
        base.OnSizeChanged(e);
    }

    Then all I do is at startup is set strategicallyPlacedPanel.FixResize = true and things start working peoperly.

    I got a weird compiler warning when I put base.OnSizeChanged inside the anonymouse method (calling base methods from anonymous methods is bad apparently), so I moved it out of there into its own method.

  • User profile image
    BarbaraC1977

    The workaround does NOT work consistently.  We've tested it repeated, resulting in more time wasted for us.  This is a serious setback to our team for product release--we have to move the nested controls that WERE resizing nicely in 32bit out of the hierarchy and "faking" their attachment points. 

    We're having to hand-tune margins and settings since putting the controls into tables add layers to the UI.  I'm extremely frustrated with Microsoft on this topic, and the apparent closure of bugs with "team priorities are such that they've decided not to fix this." 

    We've been a loyal MS partner for a very long time, but are now wondering what is the point if we design to the controls provided and the object-oriented model suggested, only to have to UNDO our work as try to complete our application.

    Thanks for listening. 

    Barbara Crane, CEO
    Auction Systems

  • User profile image
    BitFlipper

    Yea I don't understand why this issue isn't getting more attention. How many existing applications are breaking now that 64-bit Windows is getting more commonplace?

    The correct way to fix this would be to convert the kernel function from a recursive to a heap-based stack implementation. This is simple programming 101 stuff. Here is an example of a non-recursive heap-based stack implementation that scans a directory structure.

  • User profile image
    DeathBy​VisualStudio

    I've see this with Office as well. Sometimes the calendar will stop resizing when you have too many calendars layers (say three!) or you'll get some areas around the border that don't update. I first saw it in Office 2007 and it still exists in Office 2010. No wonder why it was never fixed. Yep, even Microsofties can "hold it wrong

    Connect Issue #11213132, Status Fixed (its too hard for us to fix -- plus you're holding it wrong anyway)  Microsoft - Winning! 

     

    If we all believed in unicorns and fairies the world would be a better place.
    Last modified

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.