Nice article. Definitively something to read:
http://www.joryanick.com/memcpy.htm
-
-
littleguru wrote:Nice article. Definitively something to read:
http://www.joryanick.com/memcpy.htm
I friend of mine wrote a verion of memcpy() in assembler that was twenty (two zero) times faster than the stock Windows version, when he worked for a graphics company a couple of years ago. It conditionally used MMX, SSE, or SSE2.
-
Cairo wrote:

littleguru wrote:Nice article. Definitively something to read:
http://www.joryanick.com/memcpy.htm
I friend of mine wrote a verion of memcpy() in assembler that was twenty (two zero) times faster than the stock Windows version, when he worked for a graphics company a couple of years ago. It conditionally used MMX, SSE, or SSE2.
Wouldn't it be even faster if his implementations were specfic to each CPU extension then conditionally call the specalized-implementations at initial-runtime to reduce the overhead of all the conditional jumps?
-
ASM optimisations in C/C++ code are evil...
Should this be implemented in the Windows Kernel? Will it be implemented for all x86-64 versions?
My opinion is that is should NOT be implemented for x86 CPUs at this time. Windows XP can, in theory, run on non-MMX hardware and having it in the kernel would make that impossible.
However; it should be supported in x86-64, as all those processors currently in production support MMX.
Said optimisation should be unnecessary for 98% of programs you will need to write. -
C:\Program Files\Microsoft Visual Studio 8\VC\crt\src\intel\memcpy.asm
Search for "; do fast SSE2 copy"
Also note that linked article was last updated in 2003.
-
Manip wrote:My opinion is that is should NOT be implemented for x86 CPUs at this time. Windows XP can, in theory, run on non-MMX hardware and having it in the kernel would make that impossible.
Selective algorithm based on CPUID is not exactly impossible.
-
Cairo wrote:
I friend of mine wrote... conditionally used MMX, SSE, or SSE2.
Best approach is be to have the run-time startup code determine the CPU capability and select the correct routine at startup (one time). Write the address to the preferred routine(s) (yes there are other routines benefiting from memory move/init optimizations).
Regarding MMX, SSE, SSE2... these would only be used if the CPU supported them. Also, the FPU can be used to move/init data faster as well.
Definitely should be in the managed runtimes (Jxxx, and .Net CLR/CLI)
During a research project i did on this topic, the judicious use of PREFETCH can accelerate the memory optimizations.
Thread Closed
This thread is kinda stale and has been closed but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.