Seriously - I just saw on the Marc McDonald video (Awesome video!) thread someone said that the first two bites of windows exes are acii MZ for the initials of the guy who created them.
So, what the heck are EXEs? I read that for the Mac, exes are some header info, that tell it what code to load where in the file - for intel, in a block thats first, for powerPC in a block that starts after (Is Dr Dobbs journal awesome or what?). On windows,
what the heck is it?
And how would I generate an EXE? If I took assembly code, how would I get that into machine code? My friend today was saying "Yeah back in my day, we would edit the machine code by hand" - and my head basically popped.
I want to edit machine code by hand!
So what the heck is an EXE? I know how to open one, I know that its something I can't reference in a project, but something that is self contained, and runs, and has a main method - where the heck is the main method? Is it always at 20,000 bytes or something
in the file? Can I build an exe by hand in notepad? What would I need to build an exe by hand? And how many days would it take?
A lot of your threads were already weird, but now it's getting ridiculous...
start -> run -> cmd -> debug -> go nuts!
Think of an EXE like flat-pack furniture. It's a kit of parts used to produce a running program - a
process. And like a flat-pack, it includes assembly instructions - how to turn the kit into the working process.
A PE file contains a number of sections. The simplest sections contain code, read-only data, and data that has an initial value and can be updated. An OS component called the
loader reads the section headers and sets up the process's virtual address space appropriately. There can also be uninitialised data sections, where the file contains no data for that section and the loader simply allocates the right amount of memory
at the appropriate address. Then the loader looks for the executable's start address in the header, and jumps to that address.
Some programs, particularly for older systems, are self-contained, not requiring any components apart from those in the executable itself. I remember machine-code programming on the ZX Spectrum where all you really needed was a small BASIC program (that being
what the ROM loader understood) which could then load your program loader, and after that you could pretty much disregard the system-supplied software entirely. On DOS you had an environment where the system code was somewhat abstracted, through the use of
software interrupts (INT instruction) to invoke system code.
On Windows, every program needs to link to at least one dynamic-link library. Well, you could write a program that simply returns from its entry point, but that's not a very useful program! A DLL is
also a PE file. It has additional pieces of metadata which tell the loader the names and addresses of functions that it
exports. Any PE can import those functions, using two structures, the Import Name Table and the Import Address Table. The loader reads the Import Name Table and loads the referenced DLLs, setting the values in the Import Address Table to the
addresses of the exported functions.
The use of DLLs means that the system call codes can be changed without needing to change the binaries. Indeed, the mechanism to call into the kernel can be changed - Windows 2000 and earlier used a software interrupt while Windows XP and later use the SYSENTER
instruction which is a little quicker.
DLLs are loaded into the process's address space. The code in an executable is generated with some absolute, rather than relative, addresses (e.g. the code references address 10, rather than 'the address 10 bytes before this'). If the code is well written,
and the DLLs are given base addresses - expected addresses to load at - that don't clash, the loader simply loads them at the expected address. If the addresses do clash, the loader has to load (at least) one of them at a different address, then fix
the absolute addresses. These are called relocations, and the executable contains a relocation table which tells the loader where the addresses are and how to fix them, if necessary.
A Windows executable can contain resources. A resource is simply an embedded piece of data in a format that a particular Windows UI API is expecting. Because they're in a standard format, other applications can make use of these resources - for example, Explorer
when displaying an icon for your application.
The executable header also includes information about how much virtual memory should be reserved for your threads' stacks and how much initially committed, and the same for the initial default heap.
The format also allows for descriptions of how to unwind the stack in the case of an exception, and the location of exception handlers. You won't see this used in a 32-bit x86 Windows executable since the x86 model has a different, stack-based exception handling
mechanism; however, it's used for all other platforms that Windows runs on, including x64.
.NET EXEs are a little different because they contain IL, not native code. IIRC Windows XP's loader can handle them directly, but older operating systems can't. To fudge things so that things still work on older operating systems, the entry point address is
set to the address in the Import Address Table which points to the function _CorExeMain in mscoree.dll (which is native code, part of the .NET Framework). This function then loads the appropriate version of the virtual machine (mscorwks.dll) which JITs the
code and jumps to the actual entry point.
To inspect an executable, use the dumpbin tool supplied with Visual C++.
Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.