I never looked at the dumps from this customer for this issue - I solved it by looking at perfmon. So I don't have any dump analysis to share. But if I did, I would be employing the steps in Episode 5 to find what threads are consuming the cpu. All I would expect to find are GC threads which may or may not be in the midst of a GC, and worker threads with or without custom code doing whatever work they're supposed to do on the stack. I say "with or without" b/c the dump may be captured at a time when threads are performing certain work or waiting for work.
@gt65345:The customer engaged me b/c they were up against a release date, and that was in danger of slipping due to this issue in the test environment. Root cause for the high cpu *was* found - too much load. Knowing this, their options were at that point A) continue to tshoot as an academic exercise, increasing the potential that their release date would sliip, or B) re-test with a realistic load, keeping alive the goal of their planned ship date.
Taking all things into consideration, they chose what most/all customers would - choice A.
@gt65345:The handoff has little/nothing to do with the high cpu. I didn't look at the dumps or have a profiler trace to verify what threads were consuming the cpu. But based on past experience, it was the worker threads (from the CLR worker threadpool) - they are the ones doing all the work & executing the requests. And maybe the GC threads, too (depending on how much memory was allocated).
@gt65345:The way the perfmon counters are updated has a say in this. RequestsQueued is incremented when the request is posted from native code to the CLR threadpool, and then decremented when the callback is invoked. This all happens in native code. With a heavy load or a "burst", you may see this counter go above zero.
On the other side, if the CLR threadpool is draining these requests very quickly (e.g., very lightweight requests), then they'll never have to wait in the app-specific queue. And therefore Requests In Application Queue won't go above 0
Hence, "Requests Queued by itself isn't an indicator of a problem, per se." A healthy ASP.NET server can have Requests Queued > 0.
@Jehanzeb:I've never heard of clrdump before, but from debuginfo.com, it says the default is a mini, which sounds like doing a variation of .dump /m with cdb or the other debuggers from Microsoft's debugging tools package. Honestly, I never use these mini dumps, as they're very limited in what they provide and essentially useless for tshooting memory issues. I tried running !address from a dump obtained by .dump /m (see debugger.chm from our Debugging Tools package) and !address wouldn't even run.
In any case, I'd suggest following the sympath chages I suggested above. Then tell me the results.
1. What did you use to get this dump? What tool, and what command? You need to ensure you have a full user mode dump. While you can line up symbols correctly with a mini dump (which is typically ~1%-5% of the file size of a full user mode dump), they have limited use when it comes to debugging production application issues.
2. I don't know what your sympath is, but adding c:\windows\symbols isn't likely to help. Try using our public symbol server:
@JorgeF:Great minds think alike? One of the sessions I'm preparing for next time is directly related to proper load testing for ASP.NET. One of the other sessions, if I can complete it in time, may involve some live debugging. Because this series focuses on the production environment, and you'd never ideally want to do a live debug in prod, I don't plan to cover too much of this (unless, of course, folks keep asking for it).