Summary: Tips and strategy for debugging Hopper failures.




Hopper Debugging Strategy


You've been running Hopper and hitting failures. You may wonder what to do next. Unfortunately, there is no magic bullet that will solve all your Hopper problems. This article, however, will cover the techniques and tools that can assist you with debugging. This is certainly not a comprehensive list. What you take away from here can be used as the starting point to devise your own debugging strategy. Be creative and good luck!

Contents

*Prerequisite
*Basic Strategy
*Divide and Conquer
*KITL and Kernel Debugger
*Working with Radio
*Debugging Tools
*""CeDebugX""
*Application Verifier
*JTAG Debugger
*Advanced Topics
*Focusing on an App
*Watson and Post-Mortem Debugger
*Understanding the Hopper Log
*Log File Content
*Exit Conditions
*Resources

Prerequisite

Make sure to read this article first to develop a good process for running Hopper. Having a solid foundation will make the process more efficient and can save you a lot of time.

Basic Strategy


Divide and Conquer

The simplest strategy is to remove components until you get to a stable baseline. Begin with as few components as possible when you first start running Hopper. Components that should be removed are 3rd party apps, radio, and any driver that isn’t required to boot up the image to run Hopper. After achieving a stable baseline, slowly add components back in to the image. Start with drivers and work your way to 3rd party apps. In the process of running Hopper, if you find that your numbers begin to drop as components are added or updated, you can employ this strategy to isolate the component that’s affecting the stability of your platform.

KITL and Kernel Debugger

While you may use retail standalone devices to obtain Hopper runtime metrics, it is not the ideal environment for development. Use images with KITL and kernel debugger to debug Hopper issues. You will have access to the debugger for live debugging, a rich set of tools from Platform Builder, and powerful debugger extensions such as ""CeDebugX"".

Working with Radio

Radio issues are common and are almost always to be expected. Hopper can generate dial strings and make voice calls. Use fakeril to eliminate the effect of radio issues when obtaining a baseline number. When you switch to the true RIL driver, you may want to initially start running Hopper with the radio turned off or use an invalid SIM to prevent calls. After that, you will eventually need to run Hopper with a real SIM and work out any problems in the RIL or radio firmware.

Note: Don’t leave valid contacts in the SIM because Hopper can make calls to the contacts.

Debugging Tools

""CeDebugX""

""CeDebugX"" is an extension to the Platform Builder kernel debugger. It was originally developed to assist in debugging difficult Hopper problems. It has since been extended to include comprehensive diagnostic capabilities. To facilitate efficient debugging, ""CeDebugX"" has a rich set of commands that present detailed information about important elements of the system. ""CeDebugX’s"" diagnostic capabilities include detecting common system failures such as a deadlock, thread starvation, exception, stack overflow, heap corruption, and memory leaks.

It is a good idea to use ""CeDebugX’s"" auto diagnosis as the first step in your Hopper investigation. When you have a device failure:

  1. Put the debugger into break state.
  2. Load ""CeDebugX"" as a Platform Builder debugger extension. From the "Debug" menu, select "Load Extension" and navigate to ""CeDebugX.dll"".
  3. Run "diagnose all" from the ""CeDebugX"" command prompt to diagnose the failure.

The "diagnose all" command will attempt to diagnose an exception, deadlock, heap corruption, low memory, and thread starvation condition. You can begin your investigation from this point using the kernel debugger or other commands in ""CeDebugX"". For ""CeDebugX"" command usage, please refer to the documentation included with the ""CeDebugX"" utility.

To find the cause of a heap corruption or a memory leak, you can shim the heap ""APIs"" and perform a new test run for repro. The next describes using Application Verify to shim ""APIs"".

Application Verifier

Application Verifier is a tool that allows system ""APIs"" to be intercepted using a technique known as shimming. At the heart of Application Verifier is the kernel shim engine. The engine allows a shim DLL to be inserted between the calling function and target API. Calls are routed to the shim DLL before going to the target API.

Application Verifier comes with three shim ""DLLs"". The provided shim ""DLLs"" are:

*shim_heap.dll – For finding heap corruption and memory leak.
*shim_hleak.dll – For finding handle leaks.
*shim_usergdi.dll – For finding leaks in GDI objects.

You can use Application Verifier during Hopper runs to track down heap corruptions and memory or object leaks. You will need to rebuild your image with Application Verifier enabled. This article describes how to configure your image to use Application Verifier: http://blogs.msdn.com/hopperx/archive/2007/03/30/application-verifier-on-drivers-windows-mobile-6.aspx

JTAG Debugger

If for some reason, you cannot use KITL and the kernel debugger, your best option would be to use a hardware debugger like the Trace32 JTAG debugger if the target hardware supports it.

Advanced Topics

Focusing on an App

When you want Hopper to stress an app or want fast repro of a known problem in an app, you want Hopper to focus on that app and not move away. While it may not be possible to prevent Hopper from moving away, you can bring the app back into the foreground to let Hopper hammer away at it. This article shows how you can write a simple app that will sit in a tight loop and bring your targeted app to the foreground: http://blogs.msdn.com/hopperx/archive/2005/11/30/498113.aspx

Watson and Post-mortem Debugger

When using standalone retail devices, you will have limited resources available for debugging. Luckily, there is Watson and the post-mortem debugger. Watson is the Windows CE error reporting mechanism that captures the state of the device into a dump file to be uploaded to the Microsoft Watson server for analysis. However, you can also view the Watson dump file in the kernel debugger using the post-mortem debugger add-on for Platform Builder for Windows Mobile 5.0. Note that ""CeDebugX"" will also work during a post-mortem debugging session.

The following articles describe various ways to configure Watson for Hopper runs:

Part I: http://blogs.msdn.com/hopperx/archive/2005/10/07/478306.aspx
Part II: http://blogs.msdn.com/hopperx/archive/2005/10/12/480132.aspx
Part III: http://blogs.msdn.com/hopperx/archive/2006/02/21/536075.aspx

Watson dumps are generated automatically when an unhandled exception occurs. But if you want to view the state of the device when an exception hasn’t occurred such as during a hang, you can manually generate a Watson dump file. A simple way to do this is to instrument a key press on the device to generate a dump file by calling ""ReportFault"":

		 __try
		 {
		     // Raise an exception so we can call [ReportFault] from the exception filter.
		     RaiseException(INSTRUMENTED_EXCEPTION_CODE, 0, 0, NULL);
		 }
		 __except [(ReportFault(GetExceptionInformation(),] 0), 0), EXCEPTION_EXECUTE_HANDLER)
		 {
		     // Exception handler.
		 }
	

You can add this code to the keyboard driver or any other driver that handles user input. The priority of the thread executing this code should be set relatively high to allow the thread to run if a thread starvation condition occurs.

Understanding the Hopper Log

Hopper stores data to the log file on the device periodically. The log file contains a system snapshot at the moment prior to logging. Hopper overwrites the log file each time it logs data, so the log file will always contain the last state before Hopper stopped running.

Log File Content

The Hopper log file contains the following sections:

*Summary - This is the first section of the Hopper log file. It contains the OS build, random seed used to run Hopper, last runtime recorded, exit condition, average actions per minute (keypresses), and the total number of windows visited.

*System Info - The next section contains a list of running processes, loaded modules, and the call stack for all threads.

*App Statistics - This section contains statistics for Hopper's interaction with applications. The amount of time spent in the app, number of actions, and visits to the app are logged here.

*Free Disk and Memory - This section logs the amount of free storage space and memory. You can use this information to detect memory leak or condition causing storage space to fill up.

Exit Conditions

Hopper logs the reason why it stopped. The default message is "Default system crash unless terminated by user". Hopper writes this message to the log file unless it encounters a different reason to stop. A log file containing this message means that the system stopped before Hopper had a chance to exit. You will usually see this message if there is a terminal crash in the system that prevented Hopper from running, or if the device was reset unexpectedly or by the user.

Other exit conditions:

*Start menu not responding - Hopper is able to send keystrokes and touch events, but is not able to launch the start menu to switch to another app. This failure typically indicates a deadlock that is blocking the shell UI thread.

*Test was skipped for more than 30 minutes - Hopper was not able to run for more than 30 minutes. Hopper runs at 1 priority level above normal priority. This failure typically indicates a thread starvation condition caused by a higher priority thread spinning.

*Stuck in the same window - Hopper is able to send keystrokes and touch events, but it detects that it is stuck in the same window for more than 15 minutes. This is not a typical failure and can occur if you have an app that disables the start menu and soft keys.

*Failed to launch application - Hopper failed to launch an application.

*Over target runtime - Hopper ran past the target runtime specified by the /t option. This indicates a successful run.

*Radio reboot - Hopper detected a radio reboot condition.

*Received a stop event '/k' from user - Another instance of Hopper was started using the /k option to stop the current Hopper run.

Resources

Additional resources can be found here:
*The article "MTTF Testing – Hopper Demystified" in the Windows Mobile OEM documentation
*The ""HoppeRx"" site: http://blogs.msdn.com/hopperx
*The ""CeDebugX"" documentation included with the tool




Go up to BSP Exit Criteria
Go up to Big Book of BSP

Thank you for contributing to this BSP Wiki. To ensure your comments and concerns receive proper exposure, include bspwiki""@""microsoft"".""com when providing feedback or topical suggestions.




Microsoft Communities