.NET Debugging for the Production Environment

Case Study: Debugging the Load Test - 07

Download this episode

Download Video


You may wish to watch the video using one of the high quality links on the right so the tool output in the case study is readable.


Ever have a situation in which your load test isn't passing the SLA, or the test results simply "look wrong"?  Even when your test hardware is exactly like your production hardware?  Is the problem somewhere in the code or... maybe something no one else has considered?  How do you troubleshoot this when - like in the production environment - you can't afford to have invasive troubleshooting tools in place?


.NET Debugging for the Production Environment, Part 6

.NET Debugging for the Production Environment, Part 8



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • artisticche​ese

      So, if all requested are in ASP.NET queue but none inside specific ASP.NET application queue then what all those are doing? Why they are not submitted to ASP.NET specific application queue and instead queuing up in holding chamber instead? What stage of ASP.NET page processing are they spending their time in?

    • BradL

      @gt65345:The way the perfmon counters are updated has a say in this.  RequestsQueued is incremented when the request is posted from native code to the CLR threadpool, and then decremented when the callback is invoked.  This all happens in native code.  With a heavy load or a "burst", you may see this counter go above zero.

      On the other side, if the CLR threadpool is draining these requests very quickly (e.g., very lightweight requests), then they'll never have to wait in the app-specific queue.  And therefore Requests In Application Queue won't go above 0

      Hence, "Requests Queued by itself isn't an indicator of a problem, per se."  A healthy ASP.NET server can have Requests Queued > 0.

    • artisticche​ese

      So all those requests did not even enter "BeginRequest" and this handoff between managed/unmanaged code is causing high CPU?

    • Frank

      Why use perfmon instead of Visual Studio Load Testing to collect all the remote counters and consolidate them into one report?

    • BradL

      @gt65345:The handoff has little/nothing to do with the high cpu.  I didn't look at the dumps or have a profiler trace to verify what threads were consuming the cpu.  But based on past experience, it was the worker threads (from the CLR worker threadpool) - they are the ones doing all the work & executing the requests.  And maybe the GC threads, too (depending on how much memory was allocated).

    • BradL

      @Frank:That's definitely an option, and a good one at that.  Though, this test was done against one server. 

      But the customer didn't ask me what tool to use to run the test; they already had their tools in place.  They just asked me to help them find root cause of the problem.

    • artisticche​ese

      I don't understand why no further troubleshooting was done to isolate the reason for high CPU apart from telling customer not to overwhelm server with unreasonable load? I assume if there is high CPU and customer complaints about it you need to be diving into the reason for high CPU.

    • BradL

      @gt65345:The customer engaged me b/c they were up against a release date, and that was in danger of slipping due to this issue in the test environment.  Root cause for the high cpu *was* found - too much load.  Knowing this, their options were at that point A) continue to tshoot as an academic exercise, increasing the potential that their release date would sliip, or B) re-test with a realistic load, keeping alive the goal of their planned ship date.

      Taking all things into consideration, they chose what most/all customers would - choice A. 

    • artisticche​ese

      You mean they chose choice B not A.

      I understand that it preference of customer but this release series you are doing (which are excellent by the way) is actually academic experience for all of us. I'd really be interested in choice A since having high queue and CPU utilization in common queue but not invidual queues I would assume is interesting subject to explore.

    • BradL

      @gt65345:Whoops... yeah, meant choice B. Wink

      I never looked at the dumps from this customer for this issue - I solved it by looking at perfmon.  So I don't have any dump analysis to share.  But if I did, I would be employing the steps in Episode 5 to find what threads are consuming the cpu.  All I would expect to find are GC threads which may or may not be in the midst of a GC, and worker threads with or without custom code doing whatever work they're supposed to do on the stack.  I say "with or without" b/c the dump may be captured at a time when threads are performing certain work or waiting for work.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.