Return to
PerformanceTestingGuidance
How to Identify a Disk Performance Bottleneck Using the Microsoft Server Performance Advisor (SPA) Tool
Applies To
* Microsoft Server Performance Advisor (SPA)
* Performance Testing
* Performance Analysis
* Microsoft Windows Server 2003
Summary
This How-To shows how to use the Microsoft Server Performance Advisor (SPA) tool to identify which processes and files may be causing a disk subsystem performance bottleneck on Windows Server 2003.
Contents
How to Identify a Disk Performance Bottleneck Using the Microsoft Server Performance Advisor (SPA) Tool 1
Applies To 1
Summary 1
Contents 1
Objectives 1
Overview 2
Download 2
Summary of Steps 2
Step 1. Run and Configure the Microsoft Server Performance Advisor(SPA) Tool 2
Step 2. Collect Data 6
Step 3. Compile the Report 7
Step 4. Analyze the Report 8
Conclusion 16
Production Server Considerations 16
Feedback 17
Technical Support 17
Community and Newsgroup 17
Contributors and Reviewers 17
Appendix 17
Objectives
In this module, you will learn to do the following:
* Identify a disk subsystem bottleneck
* Identify which processes are causing highest disk usage
* Identify which files are causing the highest disk usage
* Determine the data pattern (read/write bytes and I/O’s) of the disk usage
Overview
Microsoft Performance Monitor (perfmon) can gather performance counter data and Event Tracing for Windows (ETW) data, but it requires manual intervention to do the analysis. This is where the Microsoft Server Performance Advisor (SPA) picks up. The Microsoft Server Performance Advisor (SPA) tool collects performance data in the same manner as Performance Monitor. In addition, it analyzes the data and generates a detailed report on its findings.
Here is the disk related section of the SPA report:
*
Hot Files: Files Causing Most Disk I\O: This section of the report identifies specific files which are causing the most disk I\O, the process involved, and the read/write bytes and IO’s per second.
*
Disk Breakdown: Disk Totals: This section of the report identifies specific processes which are causing the most disk I\O on the physical disk.
In this how to article, we will use the SPA tool on a Windows 2003 Server to identify a disk subsystem bottleneck, identify which processes are causing the highest disk usage, identify which files are causing the highest disk usage, and determine the data pattern (read/write bytes and I/O’s) of the disk usage.
Download
You can download the Microsoft Service Performance Advisor (SPA) from the following location:
http://www.microsoft.com/downloads/details.aspx?FamilyID=09115420-8c9d-46b9-a9a5-9bffcd237da2&DisplayLang=en If you aren’t familiar with SPA, you can familiarize yourself by reviewing the help files that come with it… or that you can download… or whatever.
Summary of Steps
Here is a summary of steps:
- Run and configure the Microsoft Server Performance Advisor (SPA) Tool.
- Collect data
- Compile the report.
- Analyze the report.
Step 1. Run and Configure the Microsoft Server Performance Advisor(SPA) Tool
In order for the SPA tool to properly diagnose a performance problem, it must collect performance data from the computer when the problem is occurring.
- Run the SPA Tool
Run the SPA tool by clicking Start, All Programs, then click Server Performance Advisor (at the root of All Programs).
The SPA tool start page will appear, as shown in figure.
If you are not familiar with SPA, then consider taking the quick tour of the SPA tool by clicking “Quick Tour”. Otherwise, continue to the next step..
- Open the “System Overview” Data Collector Group
The data collected by Server Performance Advisor and the reports it can generate are specified by data collector groups. Data collector groups enable collection of data that is relevant to the server role of the computer, and when you install Server Performance Advisor, it automatically detects the server roles currently configured for the computer. When a role matches a data collector group included with Server Performance Advisor, that data collector group is installed automatically. You can also create your own data collector groups.
In this case, we are only interested in disk subsystem analysis which is provided in the “System Overview” data collector group.
Note: Not all data collector groups analyze data on the disk subsystem.
Click View, then Scope Tree. The scope tree will show.
Under Local Computer, Data Collectors and Reports, locate the System Overview role.
- Configure the “System Overview” Data Collector Group
- Optional Set the report generation to Manual
If this is a production server, then configure the report generation for the “System Overview” to be manual. SPA’s data collection mechanisms are very low overhead and designed to be ran in production, but the report generation/compilation takes up a lot of resources and should be ran on another non-production server.
- In the scope tree, right-click the “System Overview” role, and then click Properties. The System Overview Properties data sheet shows.
- Set Generate to Manual.
- *Optional Set the Data Collection Interval
*
Click the Schedule tab and set the Duration to the desired collection period in seconds. Keep in mind that SPA gathers a large amount of data quickly, so keep the collection interval as low as possible.
*
*
Click OK on the System Overview Properties window.
- Set the Disk Utilization Thresholds
Prior to using SPA v2.0 for disk analysis, it is necessary to set the disk utilization thresholds according to the I\O’s per second that your physical disks are expected to perform at. The following steps show how to set the disk utilization thresholds:
- Click Edit, then select Rules.
- Locate the Disk Utilization thresholds and set them to the performance specifications of your locally attached physical disks.
- Scroll to the bottom and click Apply. This will persist the new threshold settings. Keep in mind, this change affects all of the data collectors in SPA.
Step 2. Collecting Data
SPA’s analysis will focus only on the data collected during the collection period. Therefore, it is paramount to choose the appropriate collection duration and appropriate time. Ideally, you want to run SPA just prior to the performance problem and stop just after the performance problem is gone.
Configure SPA to collect data when the just before or during high disk activity:
- Start the System Overview Data Collector Group
- Select the System Overview data collector group, then click the green play button. Alternatively, you can click Record
- Wait for SPA to automatically stop
- The SPA tool will automatically stop collecting data when the elapsed time equals the Duration setting of the data collector.
Step 3. Compile the Report
After SPA has finished collecting data, it will automatically begin generating/compiling the report unless you optionally set the report generation to manual. SPA is finished generating/compiling the report when an icon under reports with a red clock icon shows.
If you choose to manually generate a report, then follow these steps on how to compile the report on another server.
Optional The following steps show how to compile the report on another server:
- Prior to capturing data, set the role report generation to “Manual”.
- After capturing data, move or copy the data to another server with SPA installed.
- Copy the contents of the SPA Data to the respective Data directory on another computer with SPA installed. For example, if both servers are using default installations, then copy the data “C:\PerfLogs\Data” to the “C:\PerfLogs\Data” on the server where you intend to compile the report.
- At a command prompt on the server where you want to compile the report, change directory to the SPA installation directory, then type:
spacmd compile “System Overview”
Compilation of the report can take a long time depending on how long your data collection period was. In addition, compiling a report can take up a large amount of resources.
Step 4. Analyze the Report
Once the report is generated, we need to review the report to see what is causing our disk bottleneck.
- Locating the report: To review the report, click on the icon with the red clock to see a list of reports that the server had generated.
The reports are listed by computer, year, month, day, and time corresponding to when the data was collected. Select the report that corresponds with when the performance problem occurred.
Note: The symbols in the Status column are relative to weather forecasts. For example, a cloudy symbol represents a server under distress while a sunny symbol represents a relatively idle server.
- Overview of the Report: After selecting the report, the report shows. The Summary section of the report shows us that cidaemon.exe is taking up 11% CPU and a file in the catalog.wci directory is using the most disk I/O.
The SPA tool will analyze the performance of the system. If it has significant findings, then it will show its recommendations in the Performance Advise section. The Performance Advise Section will only show if there are any significant findings.
Next, the System Health section is an overview of the overall health of the 4 subsystems of the computer.
As you can see here in the System Health section, SPA has detected a disk performance bottleneck. Normally, 78 I/Os per second is not considered to be high usage for a fast, locally attached hard disk. In this case, we ran our tests on a slow, externally attached hard disk and adjusted SPA’s thresholds accordingly.
- Analyzing the Disk SubSystem Performance: In this section we will look at more details of the disk response times and discover which processes and files are involved.
- Analyzing Disk Response Times: To determine if the disk subsystem is responding poorly, we need to look at the response times of the disks. To look at the details of the disk response times, then we need to look at the System Monitor view of the report. Click on the System Monitor icon at the top of the report.
- Clear the existing counters by clicking the “New counter set” button in the upper left hand corner.
- Next, click the “Add” (plus sign) button to add counters.
- Add all of the instances for the “Physical Disk\Avg. Disk sec/Read” and “Physical Disk/Avg. Disk sec/Write” counters. These counters are how long the disk responded in seconds.
- The System Monitor will show the counter values. We are looking for times when the response times were greater than 15ms (milliseconds) which is (0.015 seconds). 15ms is certainly not a hard threshold in determining if a disk is slow, but it can be used as something to go by. For example, some consider 10ms or even 20ms to be the deciding point.
In the chart below, all values above the black line (15ms) are considered a long response time and considered to be a bottleneck.
- Based on this data, we can conclude that C: drive (thin red line and thin green lines) has significant disk latency loads and is a performance bottleneck on the system.
- Identify the files and processes consuming the most disk I/O: Now that we have identified a disk bottleneck, let’s see which processes and files are involved with the bottleneck.
- Navigate to the Disk, Disk Breakdown, Disk Totals section.
In this section, we see a breakdown of each of the physical disks on the system and the processes that are most active on the disk. In this case, we see the cisvc.exe (Indexing Service) consuming the most I/O of physical disk 0 (C: drive).
- Next, navigate to the Disk, Hot Files, “Files Causing Most Disk IOs” section.
In this section, we see a breakdown of the files consuming the most disk I/O. Each breakdown shows the respective processes involved with that file and it’s data patterns (Read/sec, Kb/Read, Writes/sec, and Kb/Write). In this case, the Indexing Services’s catalog files are causing the most I/O on the disk.
Note: The Summary Section at the beginning of the report shows the file taking up the most I/O.
- Next Steps
The next steps are to first try to make the process taking up the most disk I/O more efficient. After the process is made as efficient as possible, then consider additional hardware to make the physical disk faster for this kind of disk I/O. For example, if high write I/O is the problem, then consider RAID0+1 because RAID5 has a 4 to 1 hit ratio for write operations. For more information on RAID type considerations, see the “RAID Type Considerations” below.
Disk optimization is large subject on its own and beyond the scope of this document. In this case, Index Server was misconfigured to index its own catalogs, so changing its catalog settings would make it more efficient.
Conclusion
The Microsoft Server Performance Advisor (SPA) tool is very good at showing which files and processes are causing the most disk I/O.
RAID Type Considerations
RAID Trade-Offs (excerpt from “Performance Tuning Guidelines for Windows Server 2003”)|||
RAID-0*Striped
||RAID-1*
Mirrored||
RAID-5Striped with Parity||
RAID-0+1Striped Mirrors||
| Minimum number of disks | 2 | 2 | 3 | 4 |
||Usable storage capacity||100%||50%||N-1/N
where N is the number of disks||50%||
| Fault tolerance | None. Losing a single disk causes all data on the volume to be lost. | Can lose multiple disks as long as a mirrored pair isn’t lost. | Can tolerate the loss of one disk. | Can lose multiple disks as long as a mirrored pair is not lost. Varies according to the number of mirrored pairs in the array. ^ 1^ |
| Read performance | Generally improved by increasing concurrency. | Good read performance | Generally improved by increasing concurrency. | Improvement from increasing concurrency and dual sources for each request. |
| Write performance | Generally improved by increasing concurrency. | Worse than JBOD (between 20% and 40% for most workloads) | Poor unless full-stripe writes (large requests) Can be as low as ~25% of JBOD (4:1 requests). | Can be better or worse depending on request size, hot spots (static or dynamic), and so on. |
||Best uses||Temporary data only||Operating system
log files|| * Operating system
* User and shared data
* Application files || * Operating system
* User and shared data
* Application files
* Log files||
If a disk fails, failure of its mirrored partner prior to replacement will cause data loss. However, the failure of any other member disk does not cause data loss.
Feedback
Pending response from the SPA Team
Technical Support
Pending response from the SPA Team
Community and Newsgroup
Pending response from the SPA Team
Contributors and Reviewers
<< List the names of people who have reviewed or contributed >>
Example: *
External Contributors and Reviewers: Name
*
Microsoft Product Group: Name (Product Group Name)
*
Microsoft IT Contributors and Reviewers: Name
*
Microsoft Services and PSS Contributors and Reviewers: Name
*
Microsoft patterns & practices Contributors and Reviewers: Name
*
Test team: Name
*
Edit team: Name
*
Release Management: Name
Appendix
Schema
This table contains each major section in the document and explains its purpose.
How To Schema Summary Table
| Section | Purpose |
| Title | Title of the document |
| MSDN Information | Team, logo, date. |
| Applies To | List the relevant products or technologies this document applies to. |
| Summary | Summarize purpose, key insights and solution that will be detailed in the document |
| Contents | List each major section of the document |
| Objectives | List the tasks that will be accomplished by following the steps in the document |
| Overview | Provide background information necessary in order to be successful with steps in the document |
| Summary of Steps | List each step in the How To |
| Step 1. | Expand on each step telling the user what to do |
| Step 2. | |
| *Step 3. * | |
| Considerations | Additional issues to consider when applying the How To |
| Additional Resources | |
| Feedback | |
| Technical Support | |
| Community and Newsgroup | |
| *Contributors and Reviewers* | |