Tech Off Thread

7 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Memcpy: the fast way! Should be included in the Windows kernel

Back to Forum: Tech Off
  • User profile image
    littleguru

    Nice article. Definitively something to read:
    http://www.joryanick.com/memcpy.htm

  • User profile image
    Cairo

    littleguru wrote:
    Nice article. Definitively something to read:
    http://www.joryanick.com/memcpy.htm


    I friend of mine wrote a verion of memcpy() in assembler that was twenty (two zero) times faster than the stock Windows version, when he worked for a graphics company a couple of years ago. It conditionally used MMX, SSE, or SSE2.

  • User profile image
    W3bbo

    Cairo wrote:
    littleguru wrote:Nice article. Definitively something to read:
    http://www.joryanick.com/memcpy.htm


    I friend of mine wrote a verion of memcpy() in assembler that was twenty (two zero) times faster than the stock Windows version, when he worked for a graphics company a couple of years ago. It conditionally used MMX, SSE, or SSE2.


    Wouldn't it be even faster if his implementations were specfic to each CPU extension then conditionally call the specalized-implementations at initial-runtime to reduce the overhead of all the conditional jumps?

  • User profile image
    Manip

    ASM optimisations in C/C++ code are evil...

    Should this be implemented in the Windows Kernel? Will it be implemented for all x86-64 versions?

    My opinion is that is should NOT be implemented for x86 CPUs at this time. Windows XP can, in theory, run on non-MMX hardware and having it in the kernel would make that impossible.

    However; it should be supported in x86-64, as all those processors currently in production support MMX.

    Said optimisation should be unnecessary for 98% of programs you will need to write.

  • User profile image
    DCMonkey

    C:\Program Files\Microsoft Visual Studio 8\VC\crt\src\intel\memcpy.asm

    Search for "; do fast SSE2 copy"

    Also note that linked article was last updated in 2003.

  • User profile image
    Tom Servo

    Manip wrote:
    My opinion is that is should NOT be implemented for x86 CPUs at this time. Windows XP can, in theory, run on non-MMX hardware and having it in the kernel would make that impossible.

    Selective algorithm based on CPUID is not exactly impossible.

  • User profile image
    Bugslayer

    Cairo wrote:
    

    I friend of mine wrote... conditionally used MMX, SSE, or SSE2.



    Best approach is be to have the run-time startup code determine the CPU capability and select the correct routine at startup (one time).  Write the address to the preferred routine(s) (yes there are other routines benefiting from memory move/init optimizations). 

    Regarding MMX, SSE, SSE2... these would only be used if the CPU supported them.   Also, the FPU can be used to move/init data faster as well. 

    Definitely should be in the managed runtimes (Jxxx, and .Net CLR/CLI)

    During a research project i did on this topic, the judicious use of PREFETCH can accelerate the memory optimizations.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.