Posted By: KosherCoder | Feb 19th, 2005 @ 8:15 AM
page 1 of 2
Comments: 36 | Views: 122436
My new machine has 4GB RAM, the BIOS sees it just fine, but Windows XP Pro only sees a little over 3GB. Googling this phenomenon turns up nuggets of info that explain PCI devices take up much of this space.

I verified this by taking out my second video card, and the Windows memory jumped to roughly 3.5GB. Conversely, when I take 1GB out, and go down to 3, I still see all 3.

Can anyone explain why PCI devices steal system RAM, but only when there is 4GB installed in the machine? Also, why wouldn't that memory appear to Windows at all? I would expect the device resources to be listed somewhere.
NeoTOM
NeoTOM
OMG WTF REDESIGN

I'm assuming you have a 64-bit processor?

littleguru
littleguru
<3 Seattle
KosherCoder wrote:
My new machine has 4GB RAM, ...


WOW! I can't say anything else. 4 Giga of RAM!
Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...
I'm not entirely sure here, but I think it's this. PCI devices do not steal RAM, they steal address space. A 32-bit processor can address only 4GB worth of addresses, and since the PCI devices use up address space, not enough is left to address the whole of the RAM.

If you have a 32-bit processor, the problem can't be solved. I think most modern Intel processors support 36-bit addressing (for up to 64GB of RAM), but XP Pro doesn't support that. Windows Server 2003 does (maybe not all editions though, I'm not sure).

If you have a 64-bit processor, you can probably solve the problem by installing a 64-bit OS. You can download XP x64 RC2 evaluation here.
NeoTOM
NeoTOM
OMG WTF REDESIGN
Sven Groot wrote:
I'm not entirely sure here, but I think it's this. PCI devices do not steal RAM, they steal address space. A 32-bit processor can address only 4GB worth of addresses, and since the PCI devices use up address space, not enough is left to address the whole of the RAM.

If you have a 32-bit processor, the problem can't be solved. I think most modern Intel processors support 36-bit addressing (for up to 64GB of RAM), but XP Pro doesn't support that. Windows Server 2003 does (maybe not all editions though, I'm not sure).

If you have a 64-bit processor, you can probably solve the problem by installing a 64-bit OS. You can download XP x64 RC2 evaluation here.


I thought 32-bit had a 2 gig max.
Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...
NeoTOM wrote:
I thought 32-bit had a 2 gig max.

There's a 2GB max of per-process user-mode address space, but the system as a whole can handle 2^32=4294967296 bytes=4GB of memory.
msemack
msemack
Embedded Systems Guy
PCI devices always "steal" RAM, even on a system with less memory.  You just don't notice because PCI devices are usually sitting at an address beyond the end of physical memory.

Remember, 32-bit processors have a 4GB address space (from 0x00000000 to 0xFFFFFFFF).  If you have 1GB of RAM, any address above 0x40000000 is free.

Most PCI devices are memory mapped.  This means they are mapped to a memory address.  If the CPU wants to talk to your Ethernet Card, it reads and write to a certain memory address, which happens to be where your Ethernet Card is listening.

The memory address that your device uses is determined by the BIOS at boot time (although the operating system can remap the device later).  The device can claim anywhere from 0 to 6 memory regions, and they can be just about any size.

Exactly how many memory regions, and their size can vary gratly by the device.  Ethernet cards tend to use very little.  Graphics cards, however, can use a lot.  Grpahics cards have a lot of onboard RAM, and that RAM can be often addressed by the PCI bus.  If you have a video card with 128MB of RAM, it may claim a 128MB window on the PCI bus.  Some high-end SCSI controllers claim a lot of address space as well.

If you want to see the addresses claimed on your system, go to Control Panel - System - Hardware - Device Manager.  Click View - Resources by Type.  Expand the memory selection.  You should see all the memory-mapped devices in your system, and what they claim.

Since you have a 4GB limit, your system needs to make all of the devices, and your RAM fit.  That's why you see less RAM.  If it didn't allocate enough address space for your hardware, the device wouldn't work (which would be really bad).  So, it does the safe thing, and give you less ram.

You can do one of the following:

- Live with the limit.  Maybe sell the spare 1GB module.
- Get a CPU with PAE.  PAE give you a 36-bit address space.  This lets you use RAM at addresses above 0xFFFFFFFF.  You will need to use the "/PAE" switch during bootup.
- Switch to a 64-bit CPU.  You'll need a 64-bit operating system.

One little tidbit.  Most PCI devices are 32-bit, so they have to live at an address below 0xFFFFFFFF.  There are also 64-bit PCI devices that can be mapped to addresses above 4GB, but they're another issue entirely.
NeoTOM
NeoTOM
OMG WTF REDESIGN
Beer28 wrote:
msemack,

I thought most devices handled their own memory, and the OS interaction was doing IN/OUT to ports which the device is registered on through the driver ?

So you're saying some devices have their memory offboard on the ram sticks as virtual and it's shared by the OS?

I always thought most devices had onboard memory like video cards.

Can you clear this up a little since you're an expert?


I got what he was saying. It just has to map that onboard memory somewhere, so it maps it where there's no memory. If every space has memory, then it unmaps RAM to map the onboard PCI RAM.
Beer28 wrote:
I thought most devices handled their own memory, and the OS interaction was doing IN/OUT to ports which the device is registered on through the driver ?

So you're saying some devices have their memory offboard on the ram sticks as virtual and it's shared by the OS?


Very little I/O is now done through explicit IN or OUT instructions - basically only legacy devices. These instructions are basically a novelty of the x86 architecture anyway, and are a lot slower than the memory copy commands (this is effect rather than cause - they're slower because they're used less frequently). The 68000, PowerPC, MIPS, SPARC, ARM, you name it, don't have IN or OUT. In those architectures, all devices are memory-mapped. That is, their control registers occupy space in the regular memory map.

Any given physical address can map to ROM, RAM, or a device's control registers (or nothing). You can see roughly what's where in your system by opening Device Manager and selecting the Resources By Type view, then expanding the Memory node. You'll see a number of items below 0x00100000 (640KB), then a big block allocated to 'System Board' between that address and some large number. On this system I have 512MB which is 0x20000000 in hex, so this block ends at 0x1FFFFFFF. Then there's a big gap - on this system - to 0xF2000000 which is where the graphics card's resources start.

The processor has special registers which tell it which areas of 'memory' space are cacheable. You don't want the processor to cache any device registers, nor try to combine writes, because the values of the memory locations can change without the processor noticing (DMA writes are 'snooped' by the processor to keep caches up to date), and the order in which writes to the registers occur may be significant.

You can't normally write directly to all of the video card's memory. AGP supports an 'aperture' of memory address space that can be mapped to areas of the video card RAM using the Graphics Address Remapping Table [GART]. The default configuration is normally 64MB, which may not be enough any more.

I'm surprised that the original poster indicated that only 3GB was available, but perhaps the address decoding logic on his motherboard can only cope with decoding to a whole bank of RAM.

Most I/O for modern devices is performed using Direct Memory Access transfers, where the device is instructed where to transfer the data from/to and how much to transfer. This happens directly by the device asking to become the Bus Master. Once it is the bus master, it drives the memory address and read/write lines directly (at least, logically - with HyperTransport and inter-hub transport the physical configuration is point-to-point with multiplexing, but you get the idea). On completion it returns the bus to the processor. In the meantime, the processor can get on with performing useful work.

The system software still has to tell the device where and how to transfer data. It does this by programming the device's registers, which as I said above are now often memory-mapped.

This is why Programmed I/O is so much slower than DMA. For Programmed I/O, the system software sets the values of device registers to indicate where to read - either with a memory write or an OUT, the difference is simply whether the IO/memory line is set or unset - then reads from the device registers into the CPU registers with a memory read or an IN. The software then writes the values read from the device into main memory. The data has to come further, the processor has to wait for the device registers to become ready, and this code often can't be interrupted.
msemack
msemack
Embedded Systems Guy
Beer28 wrote:

Can you clear this up a little since you're an expert?



One thing that I didn't make clear earlier:  Those PCI devices are using about 1GB of address space, not 1GB of memory itself.

All the CPU does it read and write to memory addresses (from 0x00000000 to 0xFFFFFFFF).  The motherboard chipset (usually the northbridge), determines whether the address refers to memory, or to the bus.  If the address points to the PCI bus, the access gets forwarded to the root PCI bridge.  Otherwise it goes on to DRAM.

Very few devices these days are I/O mapped (using IN/OUT instructions).  I/O mapping more or less died out with the ISA bus.

I/O space is a unique feature of x86 (and is often considered one of its "warts").  The CPU maintains 2 address spaces, one for memory, and on for peripherals.  This was a useful thing back in the 16-bit days.  When you only had about 1MB of addressable memory, you really didn't want your precious address space eaten up by hardware.  So, there was a second address space for your devices.

With the advent of PCI, most devices started being memory mapped.  By this time, CPUs were 32 and 64-bit, but only a few megabytes of RAM (8 to 32 usually).  There was plenty of address space to go around.

You still have some things that are I/O mapped in your computer:
- Keyboard Controller
- PS/2 Mouse controller
- Serial Ports
- Parallel ports

Now that we're approaching the limit of 32-bit addressing, we are starting to see problems like original poster had.  With the switch to 64-bit CPUs, we'll be fine until systems start shipping with Exabytes of RAM (millions of Terabytes).
NeoTOM
NeoTOM
OMG WTF REDESIGN
msemack wrote:
we'll be fine until systems start shipping with Exabytes of RAM (millions of Terabytes).


What do you need that for, running a virtual planet?
msemack
msemack
Embedded Systems Guy

Mike Dimmick did a great job explaining things (he always does).

Just to expand on what he said about DMA:

You use DMA when you're transferring blocks of data between memory and a device.  "Blocks of data" could be network packets, screen pixels, file contents, etc.  DMA is great because the device transfers the data automatically, with little intervention from the CPU.

Example of DMA:  My company makes a video frame grabber.  The device has an onboard buffer.  When it captures enough data (about one video frame), it initiates a DMA transfer, and dumps the data into memory.  The frame grabber sends an interrupt to the CPU to let it know that a new frame is ready.

Without DMA, the CPU would have to sit in a for/while loop, reading one pixel at a time.  The CPU would be very busy, and wouldn't be able to get much work done.  Even a fast CPU can get bogged down without DMA.

Now, DMA is great for block transfers.  If you're doing a simple operation, like writing a single value to a register, you probably aren't using DMA.

In an earlier post, msemack mentioned PAE (Physical Address Extensions). It's worth saying that all Intel CPUs since the Pentium II and I think AMDs since the K6 support PAE. On the Intel family, this expands the CPU's physical address bus to 36 bits, giving a potential 64GB of physical address space.

However, it's not enough for the CPU to support it. The chipset has to support it also. Few - in fact, I think no - desktop/workstation chipsets do. Expensive server chipsets typically do.

For example, the venerable 440BX chipset's datasheet says this:

"The Pentium Pro processor family supports addressing of memory ranges larger than 4 GB. The 82443BX Host Bridge claims any access over 4 GB by terminating transaction (without forwarding it to PCI or AGP). Writes are terminated simply by dropping the data and for reads the 82443BX returns all zeros on the host bus."

The same is true for the 850E chipset in my home machine and the 925X in my new work system.

Enabling PAE has another side effect. Device drivers are, or may be, presented with 64-bit physical addresses. Many are not written to cope with this possibility and have problems. This had an interesting effect on XP SP2. The Data Execution Prevention [DEP] feature requires PAE to be enabled since the NX bit AMD added is bit 63 of a PAE-format page table entry [PTE]. There were no spare reserved bits in the 32-bit non-PAE PTE. To enable hardware DEP, the processor must run in PAE mode.

Incidentally, the use of bit 63 will prevent the x64 architecture from ever addressing a full 64-bits of address space (16 Exabytes or if you prefer the IEC's terminology for distinguishing binary from decimal units, 16 Exbibytes [yuch]). That's obviously not an issue at present! Indeed, in AMD's documentation bits 52 - 62 in the PTE are reserved for future use, limiting current implementations to 4 Petabytes (=4096 Terabytes).

To salvage the situation and only present 32-bit physical addresses to drivers and hardware, XP SP2 defines the /NoExecute switch. Only if both /NoExecute and /PAE are specified does it enable 64-bit physical addressing.
ScanIAm
ScanIAm
On a scale of 1 to 10, people are stupid.
KosherCoder wrote:
[I'm surprised that the original poster indicated that only 3GB was available]

I have two 256MB video cards, a TV card, a 24bit sound card, a 100Mb ethernet card, and a 1Gb ethernet card in this box. Looking up the resources list suggested in an earlier post, I was able to determine that they are all eating up my top 1GB.


You have too much disposable income.
PeterF
PeterF
Early Adopter
Hi KosherCoder

As Mike Dimmick allready mentioned you should look into the mainboard specs. Probably the hardware limitation lies there.

Which mainboard do you actually have?

Peter
ScanIAm
ScanIAm
On a scale of 1 to 10, people are stupid.
KosherCoder wrote:
ScanIAm wrote:
You have too much disposable income.


heh. I have 5 kids - there's no such thing as disposable income.

Seriously, I run my business off this one machine. I keep it fast and beefy with incremental upgrades. The best part is with MS VPC, I can run an entire network domain from within this one machine. Maybe I shouldn't tell you I have 4 screens hanging off this baby too.


Yeah, you shouldn't have.  Now I have box envy.
msemack
msemack
Embedded Systems Guy
Your extra gigabyte of DRAM would get mapped to the higher address.  The PCI devices wouldn't.

As I mentioned earlier, the PCI bus (in general) is a 32-bit bus.  In general, the largest possible value you could assign to the Base Address Register (BAR) of a PCI device is 0xFFFFFFFF (4GB).

Your memory map would be something like this:

0 to 3GB: Your first three memory sticks.
3GB to 4GB: PCI devices.
4GB+: Your remaining DRAM.

NOTE: This a gross simplifcation.  If someone wants to expand on the reserved parts of the lower 1 megabyte, feel free.

Now, there are 64-bit PCI devices.  These devices can be relocated to higher addresses.  I've seen some SCSI controller that support this.  To use this, you need a motherboard with 64-bit PCI slots, and an OS that supports 64-bit PCI (not sure if XP does).

I'm pretty sure all PCI express devices support 64-bit as well.
rjdohnert
rjdohnert
You will never know success until you know failure
I have never had much luck with anything over 2.5 gig on XP Pro, Windows Server 2003 handles 4 gig just fine.
If I have a 32-bit OS and 4GB of RAM, I am under the impression from reading this thread that only about 3.2GB of RAM will be usable, and the other 0.8GB will effectively be redundant. I also understand that a card/stick of RAM functions at an optimal level when used in conjunction with other sticks of RAM of the same size and speed. I will soon be purchasing a laptop that has 2 RAM slots, comes with 1gb of RAM (2x512MB) but can handle a maximum of 4GB of RAM and I will be looking to upgrade it. Taking this into account, do you guys think it would be better for me to get 2 identical sticks of 2GB RAM and suffer the 0.8(or whatever)GB wastage. Or would it be better to have one stick of 2GB and one stick of 1GB that will probably not be running optimally?
Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...

If you get 2x2GB you get dual channel mode. With 2GB+1GB you don't, so 2x2GB would be the faster option.

With RAM prices as they are, I'd go for 2x2GB. Plus if you ever decide to run a 64 bit OS (Windows or Linux doesn't matter) chances are you'll be able to use the full 4GB if you have a recently modern motherboard.

The way I understand it is this:

If you have a 32 bit system you can only access 4 gig, the lower 2gig for process space and the upper two gig for system space.

If you have a 32 bit OS on a 64 bit system, then it’s the same story with XP. But with Server 2003, or Server 2008 (installed in 32 bit mode) your memory can be swapped to and from the lower 4 gig space with the memory above that, to implement up to 64 gig (I think that’s the limit in this mode). This is probably true with Vista Ultimate as well (not sure). So yes, the old swapping scenario is back, like the old days when we transitioned from 16 bit to 32 bit.

Finally, if you install Server 2008, or Vista Ultimate, in 64 bit mode, on a 64 bit system, then we finally blow past the 4 gig limit and have a flat memory model. But you can no longer run 16 bit applications directly in the OS. The only way to run a 16 bit app in this scenario is to create a 16 or 32 bit virtual server running behind the hypervisor. But be ware, with Server 2008 your CPU must be a “VT” (virtual technology) capable chip to run in virtual mode. But it will be much more efficient that the Server 2003 virtualization.

With the XP system mentioned above; my guess would be that the video memory pushes the system over the 4 gig limit. So, one of the memory chips must opt out for the system to run.

 

page 1 of 2
Comments: 36 | Views: 122436
Microsoft Communities