Coffeehouse Thread

21 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

How do you manage "many" files? ( harddrive structure discussion )

Back to Forum: Coffeehouse
  • User profile image
    turrican

    I got many files... and by many, I really mean MANY... close to 3 million files... might be even close to 4.

     

    How do you manage this many files and structure it in the drive in order to not lose performance? I split my drive into several smaller ones which I was thinking maybe helps... but I got no facts behind this. Recently, I have merged some of them into one bigger drive and performance is still OK.

     

    Say I got 2TB space... hm, would it be OK to have everything in one fat drive letter? ( fat as in obese, not FAT :] )

     

    What if my space grows, say to 4TB ( still speaking one physical drive ), would it still be OK to have just one drive letter?

  • User profile image
    W3bbo

    Of course.

     

    Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

     

    Anyway, spreading files over partitions does not help performance.

  • User profile image
    turrican

    W3bbo said:

    Of course.

     

    Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

     

    Anyway, spreading files over partitions does not help performance.

    I see, cool because I'm kind'a sick of partioning right now. In the next PC setup, I'll probably do a small 30GB C: and one other partition for everything else.

     

    Thanks.

  • User profile image
    Sven Groot

    W3bbo said:

    Of course.

     

    Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

     

    Anyway, spreading files over partitions does not help performance.

    we see the resurgence of partitions for storing recovery

    What do you mean, resurgence? Every single laptop I've owned since the late 90s has had this, regardless of whether they also included CDs or not. It's hardly a resurgence if it never went away. Smiley

     

    Personally, I still use at least two partitions: one for the OS and applications, and one for data. This is a habit I started when I first joined the Windows 2000 beta program, because it allows me to wipe and reinstall the OS without having to think about whether I have all my data. I never put anything that's not easily recoverable from some other source on the system partitions, so I always know it's safe to format that partition.

     

    I also don't like combing drives into one partition so I currently have several drive letters for my three separate physical drives. This is more because I fear if one of them fails I'd lose the data on the others too. If you do want to combine them, and it's performance you're after, I say go the whole nine yards and use RAID 0. Smiley

     

    Unlike with FAT, NTFS cluster size does not grow with partition size. There may be some performance impact from the growth in size of the MFT, but it's minimal. You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big.

  • User profile image
    turrican

    Sven Groot said:
    W3bbo said:
    *snip*

    What do you mean, resurgence? Every single laptop I've owned since the late 90s has had this, regardless of whether they also included CDs or not. It's hardly a resurgence if it never went away. Smiley

     

    Personally, I still use at least two partitions: one for the OS and applications, and one for data. This is a habit I started when I first joined the Windows 2000 beta program, because it allows me to wipe and reinstall the OS without having to think about whether I have all my data. I never put anything that's not easily recoverable from some other source on the system partitions, so I always know it's safe to format that partition.

     

    I also don't like combing drives into one partition so I currently have several drive letters for my three separate physical drives. This is more because I fear if one of them fails I'd lose the data on the others too. If you do want to combine them, and it's performance you're after, I say go the whole nine yards and use RAID 0. Smiley

     

    Unlike with FAT, NTFS cluster size does not grow with partition size. There may be some performance impact from the growth in size of the MFT, but it's minimal. You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big.

    "You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big."

     

    ...that was my next question. I see.

     

    Is there any theoretical "max" file numbers one should have inside one folder to keep performance "good"? My guess is less than 100K or would it be even less? Like less than 20K?

  • User profile image
    Sven Groot

    turrican said:
    Sven Groot said:
    *snip*

    "You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big."

     

    ...that was my next question. I see.

     

    Is there any theoretical "max" file numbers one should have inside one folder to keep performance "good"? My guess is less than 100K or would it be even less? Like less than 20K?

    I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

     

    I do know that disabling generation of short 8.3 filenames can improve performance if you have a lot of files in a single directory: http://support.microsoft.com/kb/121007

  • User profile image
    W3bbo

    Sven Groot said:
    turrican said:
    *snip*

    I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

     

    I do know that disabling generation of short 8.3 filenames can improve performance if you have a lot of files in a single directory: http://support.microsoft.com/kb/121007

    This is odd.

     

    My NtfsDisable8dot3NameCreation value is set to "2", but TechNet says its values are either 0 or 1.

     

  • User profile image
    JoshRoss

    Sven Groot said:
    turrican said:
    *snip*

    I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

     

    I do know that disabling generation of short 8.3 filenames can improve performance if you have a lot of files in a single directory: http://support.microsoft.com/kb/121007

    If you had three million files, and you wanted to keep them in directories containing less than 1000 files, you could arrange them by file hash.  Create a directory called Hash or something like this, and create hex named directories from 00-ff, and for each of those create the same set.  Hash the files and move them into the directories by the first two byes of the hash. Or is this silly?

     

    -Josh

  • User profile image
    W3bbo

    JoshRoss said:
    Sven Groot said:
    *snip*

    If you had three million files, and you wanted to keep them in directories containing less than 1000 files, you could arrange them by file hash.  Create a directory called Hash or something like this, and create hex named directories from 00-ff, and for each of those create the same set.  Hash the files and move them into the directories by the first two byes of the hash. Or is this silly?

     

    -Josh

    That's silly. If you had three million files you organise them by content and subject matter. I doubt anyone has three million files of related data.

     

    Of note: why doesn't the Disk Usage-o-meter say how much space is taken up by the filesystem itself?

     

    ...and why can't the filesystem be held in-memory? That way filesystem traversal would be instantaneous.

  • User profile image
    rhm

    Sven Groot said:
    turrican said:
    *snip*

    I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

     

    I do know that disabling generation of short 8.3 filenames can improve performance if you have a lot of files in a single directory: http://support.microsoft.com/kb/121007

    NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

  • User profile image
    elmer

    rhm said:
    Sven Groot said:
    *snip*

    NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

    Yes, because of the way the MFT works, max files per volume and max files per folder are the same thing, and how you organise them should make little or no difference to performance of NTFS.

     

    http://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx">http://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx

     

    To a file server handling requests, it's probably much of a muchness... but, of course, Windows Explorer viewing a folder with 2^32 files, might a different matter Wink

     

    Large volumes and/or folders can suffer from mft/folder/file fragmentation, and so I find that it's often good practice to archive rarely used files to separate volumes, rather than mixing rarely acessed files with frequently accessed files... unless you want to use an automated defrag ultility.

  • User profile image
    mstefan

    elmer said:
    rhm said:
    *snip*

    Yes, because of the way the MFT works, max files per volume and max files per folder are the same thing, and how you organise them should make little or no difference to performance of NTFS.

     

    http://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx">http://technet.microsoft.com/en-us/library/cc781134(WS.10).aspx

     

    To a file server handling requests, it's probably much of a muchness... but, of course, Windows Explorer viewing a folder with 2^32 files, might a different matter Wink

     

    Large volumes and/or folders can suffer from mft/folder/file fragmentation, and so I find that it's often good practice to archive rarely used files to separate volumes, rather than mixing rarely acessed files with frequently accessed files... unless you want to use an automated defrag ultility.

    I'm not sure about XP and earlier (don't recall offhand, and I'm too lazy to go boot XP), but Vista and Win7 will automagically run scheduled defrags in the background. IIRC, one caveat is that you can't defrag the MFT when the volume is in use, it has to be done at boot time (similar to a disk check/repair).

  • User profile image
    elmer

    mstefan said:
    elmer said:
    *snip*

    I'm not sure about XP and earlier (don't recall offhand, and I'm too lazy to go boot XP), but Vista and Win7 will automagically run scheduled defrags in the background. IIRC, one caveat is that you can't defrag the MFT when the volume is in use, it has to be done at boot time (similar to a disk check/repair).

    There are automated defrag utilities (diskeeper for example) that monitor the MFT and Pagefile to defrag them while the volume is in use.

  • User profile image
    exoteric

    W3bbo said:
    JoshRoss said:
    *snip*

    That's silly. If you had three million files you organise them by content and subject matter. I doubt anyone has three million files of related data.

     

    Of note: why doesn't the Disk Usage-o-meter say how much space is taken up by the filesystem itself?

     

    ...and why can't the filesystem be held in-memory? That way filesystem traversal would be instantaneous.

    One could persist the file system index on a system partition on a separate solid state disk to mitigate the issue. I imagine one could perhaps also use the cache of a hybrid disk to maintain the index.

     

    The problem with organizing files and with tree-structured file systems is that often files do not naturally fall into a single category, so what is needed is a graph-structured layout. On the other hand, few people want to maintain a graph-structured layout manually. It's just too much work.

     

    The new Semantic Engine that Microsoft presented at the last PDC looks like an attempt to solve this issue by having an engine that applies machine learning techniques to automatically index files - both textual and binary, such as images and audio. It'll be interesting to see how easily extensible it is. It could be one hell of a replacement for ifilters. There's so many interesting types of files that are not indexed currently.

  • User profile image
    exoteric

    rhm said:
    Sven Groot said:
    *snip*

    NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

    That windows explorer doesn't handle folders with a large number of ("1st generation") files sounds more like a design issue with windows explorer than an intrinsic issue with NTFS as you say...

  • User profile image
    elmer

    exoteric said:
    W3bbo said:
    *snip*

    One could persist the file system index on a system partition on a separate solid state disk to mitigate the issue. I imagine one could perhaps also use the cache of a hybrid disk to maintain the index.

     

    The problem with organizing files and with tree-structured file systems is that often files do not naturally fall into a single category, so what is needed is a graph-structured layout. On the other hand, few people want to maintain a graph-structured layout manually. It's just too much work.

     

    The new Semantic Engine that Microsoft presented at the last PDC looks like an attempt to solve this issue by having an engine that applies machine learning techniques to automatically index files - both textual and binary, such as images and audio. It'll be interesting to see how easily extensible it is. It could be one hell of a replacement for ifilters. There's so many interesting types of files that are not indexed currently.

    I thought that WinFS was the attempt to manage this... essentially a relational view of the underlaying NTFS attributes.

  • User profile image
    magicalclick

    My solution.... don't get 3million files. 3 million files is the real problem here. It simply doesn't make sense.

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • User profile image
    intelman

    magicalclick said:

    My solution.... don't get 3million files. 3 million files is the real problem here. It simply doesn't make sense.

    I dunno, someday soon it might. I do not think a million files is out of the question. If you keep compressed copies of images (to send to family and friends) in addition to the edited and orgional copies. You take many exposures .... over a half a decade ...

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.