When working with an HPC cluster- including thousands of machines and many different hardware components and software packages- errors and failures are unavoidable. As a result, most HPC administrators spend a large majority of their time on diagnostics. Windows HPC Server 2008, the current version, offers 16 built-in system tests and a GUI and scripting interface to run tests, receive and clear alerts, view results and preserve result history. In the latest release, Windows HPC Server 2008 R2, this diagnostics platform is becoming extensible: administrators and partners can create custom diagnostic tests to run in the same way as the built-in tests. This allows cluster administrators to troubleshoot more easily the custom and non-Microsoft software or hardware added to their HPC cluster.
This session will cover:
- What you can do with the extensible diagnostics framework
- How to create a custom diagnostic test, add it to WPHC, run it and view results
- How to aggregate test results for a large scale cluster so that administrator can easily spot the outliers
- Ideas for tests to help validating or troubleshooting ISV apps