Site Feedback Thread

5 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Channel9 unavailable 1st day TechEd

Back to Forum: Site Feedback
  • User profile image
    Michael​Czepiel

    Myself and others were trying to get onto Channel9 yesterday during TechEd to do our evaluations and the system was pretty much unusable.  Couldn't even bring up my schedule on my phone at times.  I found it ironic that Microsoft's keynote and other sessions talked about how easy it is to spin up additional resources to meet the needs of a business but it seems Microsoft can't manage their own workloads. 

    I didn't see anything on the site commenting on the performance issues. Maybe MS will comment to this thread.

  • User profile image
    wkempf

    I had no issues with C9 at all yesterday. Far more likely the problem was with your access. The WiFi at conferences is almost always overloaded. That's at ALL conferences, and there's not much they can do about it.

  • User profile image
    Duncanma

    @wkempf: @MichaelCzepiel:  Hey Michael (and wkempf Smiley ), I'm the Channel 9 guy who you can beat up about yesterday's outage. So, it wasn't the conference Wi-Fi, it was completely us. Scaling up in Azure is easy, and we did scale up before the day started and we definitely scaled up once the CPU usage started to spike. And it was that CPU usage spiking up to 100% that caused the outage.

    The key is though, we couldn't determine the *cause* of this CPU usage increase, and increasing the # of instances was *not* reducing the CPU usage significantly, we were just bringing up new nodes which would then quickly ramp up to 100%. The load on our site was actually higher before the CPU started to spike, and even when the load went down, the CPU usage stayed at 100%.

    So, scaling up (easy as it was) was not solving our problem. Our theory is that some particular bit of code, that was being hit by TechEd attendees only, was causing the issue, not the overall site load. We spent most of the day and last night digging through code trying to figure out what event specific feature was causing the issue and we found quite a few areas where improvement was possible. We are watching the CPU load carefully today and have several people still digging into code, running traces and profiles.

    Essentially our failure is that our pre-event testing and load was not a true representation of the load that the real event puts onto our site. We definitely feel your pain on this, trust me we were actively engaged trying to fix it and stayed on site until after midnight last night attempting to debug.

    Hopefully today, and the rest of the conference, will proceed more smoothly.

  • User profile image
    felix9

    ah, good story from the trenches Wink

    hope your code get better for BUILD

  • User profile image
    Michael​Czepiel

    @Duncanma: Thanks for the insight.  While we don't like to see anyone fail, it's good to hear that you were able to scale up and that the issue was truly deeper.  It's good to see too that the MS folks are human like the rest of us and things happen.  I and I'm sure others appreciate the honesty.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.