Can anybody recommend a good multi-threaded file copy utility? (Besides vxCopy and ScriptLogic's SecureCopy)
-
-
Uhh is this some kind of special enviroment involving many storage volumes?
I honestly can't imagine you ever capping the CPU on an even remotely modern machine. HDDs are so slow that the CPU is likely looking at porn waiting for the fragment to get written before it can send the next one. You'd likely see no improvement using parallelism.
The only reason it is so slow on Vista is because Vista doesn't want to monopolise the entire HDD causing the entire system to become very unresponsive like the older versions of Windows did. The only real issue with this is that they never asked users if they wanted to run the file copy in this kind of reduced priority state. -
ManipUni said:Uhh is this some kind of special enviroment involving many storage volumes?
I honestly can't imagine you ever capping the CPU on an even remotely modern machine. HDDs are so slow that the CPU is likely looking at porn waiting for the fragment to get written before it can send the next one. You'd likely see no improvement using parallelism.
The only reason it is so slow on Vista is because Vista doesn't want to monopolise the entire HDD causing the entire system to become very unresponsive like the older versions of Windows did. The only real issue with this is that they never asked users if they wanted to run the file copy in this kind of reduced priority state.
I don't know if this would actually have an effect or not because I haven't been in a situation to test it, but would going into task manager and setting the process to real time priority help over come the resource cap that you mentioned?
-
You could roll your own using the TPL... You could also do it using CCR.
C -
I have tried this long time ago to test threads. Multiple threads was actually slower because of thread switch overhead. Could never get better than 1 thread going full speed. As others have said, this is do to copy being a HD bound operation and not cpu bound. Not saying you could not get something that goes a few ms faster, but not sure it is worth the effort.
-
It'll be dramatically slower to run it in multithreaded mode.
Disk IO is one of the slowest things your computer does, and copying files requires reading all of the data on disk that belongs to the file and putting it back to the disk somewhere else.
Consequently copying any (non-trivial) file will max out your computer's disk IO lines long before it starts to be noticable on the memory bus, or ALU usage, thread overheads or anything else.
What will start to make a difference though is in the case where two files are being copied at once.
Let us make the assumption (which is usually correct) that most files are stored in a small number of fragments which are locationally close to each other on the disk. Now, if I were to copy file A near the beginning of the disk and file B near the end of the disk (on the same cylinder) (where each is three blocks big) then I can do the following:
read block A.0
read block A.1
read block A.2
write block A'.0
write block A'.1
write block A'.2
write metadata for A'
read block B.0
read block B.1
read block B.2
write block B'.0
write block B'.1
write block B'.2
write metadata for B'
Now if we run them in paralell we have the following (rough estimation) as to the sequence (allocated to any core, it doesn't matter)
read block A.0
read block B.0
read block A.1
read block B.1
read block A.2
read block B.2
write block A'.0
write block B'.0
write block A'.1
write block B'.1
write block A'.2
write block B'2
write metadata for A'
write metadata for B'
however notice that we now need to move the head between every operation - moving from read A.0 to read A.1 requires no movement of the head if A.1 is stored after A.0, but reading B.0 after reading A.0 means that the head needs to be moved. This can be made worse if the disk makes the (handy) optimisation of pre-reading parts of the disk out to it's own internal memory to make subsequent sequential reads faster (most disks do this so that if you read A.0, it'll read A.0 and the rest of the cycle to it's cache so that your next request to any of these blocks will be easilly accessible. The next read that isn't there requires a flush of this buffer).
Now we are adding to the queue of operations:
read block A.0
move head A.0 -> B.0
read block B.0
move head B.1 -> A.1
read block A.1
...
Which ultimately will lead your paralell implementation to be slower than a sequential one.
Parallelism might be wonderful and shiny and everything, but you have to remember that you get speed up because you have a parrallel CPU (many threads on a one-core machine slows it down). You don't have a parallel disk unless you have a raid-array, so don't think that parallelism will speed it up. -
I was curious of the same thing myself, awhile back.
So I stood on the shoulders of others and wired this thing together to see if any performance could be gained by threading out a copy task.
I found in my very limited testing that the point of diminishing returns for my box at work was around 5 threads. After 5 threads, I saw no gains in speed.
The link points to a zip file with a VS2008 solution.
I got the original code for the mult-thread application from CodeProject long ago, and hot-wired it for things like load testing stored procedures and copying files.
some settling may occur during shipment; sold by weight not volume; your results may vary, etc.. etc.. -
For a desktop, that's almost certainly true. If you're coding for a server environment where SAN storage is commonplace the situation can be very different. Parallelism isn't a magic cure all that guarantees more performance, so it's important to know the environment you're working with.evildictaitor said:It'll be dramatically slower to run it in multithreaded mode.
Disk IO is one of the slowest things your computer does, and copying files requires reading all of the data on disk that belongs to the file and putting it back to the disk somewhere else.
Consequently copying any (non-trivial) file will max out your computer's disk IO lines long before it starts to be noticable on the memory bus, or ALU usage, thread overheads or anything else.
What will start to make a difference though is in the case where two files are being copied at once.
Let us make the assumption (which is usually correct) that most files are stored in a small number of fragments which are locationally close to each other on the disk. Now, if I were to copy file A near the beginning of the disk and file B near the end of the disk (on the same cylinder) (where each is three blocks big) then I can do the following:
read block A.0
read block A.1
read block A.2
write block A'.0
write block A'.1
write block A'.2
write metadata for A'
read block B.0
read block B.1
read block B.2
write block B'.0
write block B'.1
write block B'.2
write metadata for B'
Now if we run them in paralell we have the following (rough estimation) as to the sequence (allocated to any core, it doesn't matter)
read block A.0
read block B.0
read block A.1
read block B.1
read block A.2
read block B.2
write block A'.0
write block B'.0
write block A'.1
write block B'.1
write block A'.2
write block B'2
write metadata for A'
write metadata for B'
however notice that we now need to move the head between every operation - moving from read A.0 to read A.1 requires no movement of the head if A.1 is stored after A.0, but reading B.0 after reading A.0 means that the head needs to be moved. This can be made worse if the disk makes the (handy) optimisation of pre-reading parts of the disk out to it's own internal memory to make subsequent sequential reads faster (most disks do this so that if you read A.0, it'll read A.0 and the rest of the cycle to it's cache so that your next request to any of these blocks will be easilly accessible. The next read that isn't there requires a flush of this buffer).
Now we are adding to the queue of operations:
read block A.0
move head A.0 -> B.0
read block B.0
move head B.1 -> A.1
read block A.1
...
Which ultimately will lead your paralell implementation to be slower than a sequential one.
Parallelism might be wonderful and shiny and everything, but you have to remember that you get speed up because you have a parrallel CPU (many threads on a one-core machine slows it down). You don't have a parallel disk unless you have a raid-array, so don't think that parallelism will speed it up.
-
I think it important to distinguish sufficiently clearly the difference between multi-threading today and the task parallel library. Multi-threading today (in .NET) is mostly about increasing the responsiveness of your application and not the overall speed of the application, especially around not locking the main UI thread.
The TPL is about multi-threading but running the same job quicker on multiple threads, leveraging the abundant processing power nowadays. In some ways components like the background worker are primitive compared to the TPL. Most download programs like uTorrent, Flashget ,FreeDownloadManager etc. are all indeed multi-threaded, insofar as running different downloads on different threads. -
Using what metric to determine speed? What was happening on the system besides your application running?Red5 said:I was curious of the same thing myself, awhile back.
So I stood on the shoulders of others and wired this thing together to see if any performance could be gained by threading out a copy task.
I found in my very limited testing that the point of diminishing returns for my box at work was around 5 threads. After 5 threads, I saw no gains in speed.
The link points to a zip file with a VS2008 solution.
I got the original code for the mult-thread application from CodeProject long ago, and hot-wired it for things like load testing stored procedures and copying files.
some settling may occur during shipment; sold by weight not volume; your results may vary, etc.. etc.. -
Using the sample application, I started with one thread, and worked my way up from there.ManipUni said:
Using what metric to determine speed? What was happening on the system besides your application running?Red5 said:*snip*
All files copied were between 5k-15k in size, and the copy was from one folder on the SAN to another folder on the SAN.
I had a timer on it and figured out the average copy time per file when it was all done....or total time..it was awhile ago.
I never actually tried it on my local hard-drive.....
As to what other things were running on my box...i kept this at a minimum..probably Outlook as an application, and the normal processes on a WinXP box. -
You can try McTool http://kd7lrj.googlepages.com/mctool
-
cdsto said:You can try McTool http://kd7lrj.googlepages.com/mctool"5-10 times the performance of other copy tools"
That sounds like a made-up number, given that there is no evidence of what tests you ran, on what hardware, or number of trials. Having a 100% error bar is pretty bad too, at least where I come from.
Basically copying is by a distance an IO bound process, not a CPU bound one, and so multithreading doesn't speed things up. -
I'm the developer of the program cdsto mentioned above. Your quoted comment needs a little context so here's the whole paragraph from the web site:evildictaitor said:cdsto said:*snip*"5-10 times the performance of other copy tools"
That sounds like a made-up number, given that there is no evidence of what tests you ran, on what hardware, or number of trials. Having a 100% error bar is pretty bad too, at least where I come from.
Basically copying is by a distance an IO bound process, not a CPU bound one, and so multithreading doesn't speed things up.
It's designed to copy multiple (relatively small) files from one location to another and may not work well for other purposes. In this situation, McTool usually gets 5-10 times the performance of other copy tools (Windows Explorer, batch file copy, xcopy, Robocopy, Xxcopy, Total Copy, etc.) depending on the hardware and files being processed. Of course, it also provides features that the other tools don't (GUI interface, e-mail reports, real-time statistics, etc.).
Those are not a made-up numbers. In the logs I have now, it has copied 1,357,195,463 files (1.3 billion) with 317,736,870,194,708 bytes (289 TB). As you can see, the average filesize is relatively small (228 KB).
100% error bar? Are you referring to the progress bar? What was it doing wrong? It's certainly not perfect, but if I can make it better, I'd like to...
You are correct that a single drive without NCQ capabilities does not benefit much (if any!) from running multiple copy threads on "normal" files. It often does help though when copying small files using just a few (3-5) threads. In a networked environment with high performance storage hardware like I'm using (from Isilon), it makes a huge difference.
I added a note in the Tips and Tricks section on the web page (just yesterday) to let people know that it's good to play with the number of threads running to get the best performance out of the partcular situation in which you find yourself running the application. Unless you're alone on your machine and/or LAN, conditions will change depending on what else is happening at the time.
Someday when I get more time, I'd like to add some real-time analysis and have the program auto-adjust the number of threads to get the best throughput. In a short copy that only takes an hour or two, this may not be all that useful, but many times we copy Terabytes at a time and the program can be running for several days. Adjusting for the different operating conditions over this period may help.
Thread Closed
This thread is kinda stale and has been closed but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.