Virtual Memory Optimization

Before attempting to optimize virtual memory (VM), be sure to understand the Windows CE 5.0 memory map.
VM Review for CE 5.0 (Windows Mobile 5.0 and 6.0)
A CE device has 4GB of VM, the upper 2GB are reserved for the kernel while the lower 2GB are used for application space. The 2GB application space is divided into 64 32MB slots. Slot 0 is reserved for the active process. Slot 1 is home to XIP ""DLLs"". Slots 2-32 are process slots. Slots 33-62 are for object store and memory mapped files (shared memory), and slot 63 is for resource ""DLLs"". See MSDN article http://msdn2.microsoft.com/en-us/library/aa450572.aspx for more information.

XIP DLLs
In Windows CE, modules are built with relocatable addresses. For certain ""DLLs"", addresses can be statically fixed-up during image generation. The ROMIMAGE utility (or ""DiskImage"" for Image Update builds) fixes-up the addresses of these ""DLLs"" and places them into slot 1 with overflows going into parts of slot 0. These ""DLLs"" are referred to as XIP ""DLLs"". ""Non-XIP DLLs"" are either compressed in the ROM MODULES section or loaded from the file system. These ""DLLs"" load into slot 0 and the addresses are relocated dynamically during load time.



VM Optimization Software

Shell The shell (target control in PB or VS) has a couple tools to quickly analyze device memory usage.
*“mi” quickly provides information about what processes are running in which slot. Vital statistics are also included like pages in use for code, data, and stack. Here is an example of “mi” output (BSPVMMIOutput).
*“mi kernel” dumps a brief listing of kernel data usage. Growth in the data line associated w/ ""HData"" can mean handle leaks. Growth in the data line associated with Crit/Evt/Sem/Mut can mean leaks in critical sections, events, semaphores, and mutexes. Here is an example of “mi kernel” output (BSPVMMIKernelOutput).
*“mi full” dumps a full listing of VM memory usage on a slot by slot basis. It is pretty good at telling you what area is in use by heaps, stacks, writable data, code segements, etc. Unfortunately it doesn’t do a spectacularly good job associating the used space to specific ""DLLs"". The tool “devHealth” lists all this information plus correlates it to DLL names. Here is an example of “mi full” output (BSPVMMIFullOutput).
DevHealth link (BSPVMDevHealth) enhances the output captured by the shell’s “mi full” command. It uses the memory format of the MI command and correlates data with information about what process is loading a DLL and captures the address it is getting loaded into. Additionally it provides:
*Each process slot is listed, including slot 0 & slot 1. This listing includes heap and stack, sizes and locations; as well as ""DLLs"" and memory location for each process.
*Within each process the tool also lists the DLL’s.
*Lists memory mapped files.
*Shared memory usage.
*Resource only DLL slot information.
*Summaries of processes and pages of VM used.
*A heap report.
*A file dependency report & reference count.
MemoRx (named for memory doctor) is a small simple utility that displays the VM usages in a visual manner so the user can quickly get an idea of which process is using the most / least amount of VM. This blog site contains more information. Here is an example of the tools output, (BSPVMMemoRxOutput) ""MemoRx"" is available to Windows Mobile Licensees on Jetstream.
Makeimg link (BSPVMMakeImg). Makeimg output is useful when trying to optimize the number of XIP ""DLLs"" overflowing into from slot 1 into slot 0. XIP ""DLLs"" can use the entire 32MB space of slot 1 and start using space in slot 0. It is this process of overflowing into slot 0 that starts using up the 32MB of VM each process starts out with. By analyzing makeimg’s output, the files going into slot 1 and those overflowing into slot 0 can be clearly seen.


VM Optimization Techniques


Optimize the use of VirtualAlloc. Virtual memory allocations are 64KB aligned no matter what the size. Consider combining multiple allocation calls into a single call if possible. Smaller VM allocations mapping hardware and chip resources can also be consolidated. Typically this means allocating a section of memory at boot time, when CEDDK initializes, for all hardware resources. Later when MmMapIoSpace is called to allocate space within this range, a pointer to the common memory is returned. This can have large VM savings when many drivers are independently allocating 64KB block VM to map 100 bytes of clock registers.
Allocate large blocks of memory out of shared memory. Allocating 2MB or greater using VirtualAlloc will automatically be allocated in shared memory. For allocation less than 2MB you can use use CeVirtualSharedAlloc instead of VirtualAlloc to allocate from shared memory. The drawback to allocating data in shared memory is that it is accessible to anyone so secure/private data should not be place there.
You can also create a heap to allocate from shared memory. Instead of using LocalAlloc for heap allocations from the default process heap, create a heap to allocate from shared memory to preserve process VM. Use CeHeapCreate to create a new heap and define custom allocator/deallocator functions to allocate/deallocate blocks from shared memory. The custom allocator/deallocator functions will be called when the heap size needs to change. Use HeapAlloc to allocate from your heap.

Analyze each driver. A driver's size, location in ROM, read-only data section (static data), read-write data section, pageable status, frequency of execution, and power management capabilities should all be considered when optimizing VM.
*Reduce the size of large ""DLLs"". Large ""DLLs"" take up a lot of space in slot 1. They typically use a lot of stack & heap space as well. These ""DLLs"" can occasionally be made smaller by not statically linking in libraries (use less SOURCELIBS) and by using more dynamic links (TARGETLIBS) of common code. SOURCELIBS should only be used when you want to ensure ALL code in the library your linking against is brought into your binary. SOURCELIBS prevents the linker from removing unused code from the libraries you're linking with, this means you get everything whether you want it or not. TARGETLIBS are used to dynamically link against other libraries that are available as ""DLLs"" at run time. TARGETLIBS allows the linker to resolve externals and optimize out unused code paths. Look DLL sources files that use SOURCELIBS and evaluate if TARGETLIBS can be used instead.
*Driver does not run that often? Camera drivers are a good example of a driver that runs infrequently. These drivers can be marked as compressed in the modules section (using the 'C' flag), which will prevent the DLL from being statically fixd-up into slot 1/0. These ""DLLs"" will load into slot 0 when they are used. ""DLLs"" can also be placed in the FILES section for the same effect.
*Make driver pageable. Drivers larger than 100KB should be pageable. Isolate the code used in XXXPowerDown and XXXPowerUp and wrap them in pragma’s to keep this code from paging out. This MSDN link outlines the needed pragma's.
*Move driver out of device.exe if not power managaged. Non-power managed drivers can be moved from device.exe into services.exe. Device.exe is usually where VM is the tightest so moving ""DLLs"" out can free up stack and heap space.

Reduce DLLs that go into slot 1 and overflowing into slot 0:
*OEMDRIVERSHIGH: By default, the OEMDRIVERS package is one of the last packages to be processed. Because ""DLLs"" from other packages are processed first, nearly all the ""DLLs"" in OEMDRIVERS package will overflow into slot 0 where the alignment is 64KB. Slot 1 DLL alignment is 4KB. To efficiently make use of DLL alignment, you can assign small ""DLLs"" to the OEMDRIVERSHIGH package, which will be placed into slot 1. The remaining larger ""DLLs"" will end up in slot 0.
*Remove unneeded DLLs. Search through BIB files for BSPNOXXX and IMGXXX flags for components that aren’t needed. For example, BSPNOPCMCIA=1 and BSP_NOPCCARD=1 can be set to prevent the PCMCIA serial driver (serial.dll) from being added to the image if not needed.
*Repeat Steps. Review the above steps to analyze a driver. Decrease XIP DLL code size by making the driver smaller, or making the DLL be non-XIP can reduce DLL overflow into slot 0 and save VM.

Build Flags: Depending on the type of flag being used these can be implemented in your build with either makeimg or blddemo -qbsp
*IMGVMCOMPACT=1 moves some packages from MODULES to the files section. On ""PocketPC"" (""WM6""). To enable set flag and run makeimg.
*OSModules->OSFiles(~756KB)
*MediaOSModules->MediaOSFiles(~121KB)
*BaseAppsModules->BaseAppsFiles(~515KB)
*Total ~1392KB
* Others to follow ...





Go back to BSP Features
Go up to Big Book of BSP

Thank you for contributing to this BSP Wiki. To ensure your comments and concerns receive proper exposure, include bspwiki""@""microsoft"".""com when providing feedback or topical suggestions.



Microsoft Communities