Coffeehouse Thread

85 posts

It is time... Move the filesystem off of disks

Back to Forum: Coffeehouse
  • User profile image
    Bass

    Dexter said:
    Bass said:
    *snip*

    Seriously, do you really want/expect a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?

    Why not? Isn't that exactly what happens when you invoke a database optimization?

     

    Sometimes you have to move gigabytes or terrabytes of data to have optimal performance. Look up the hash table data structure.

  • User profile image
    Cream​Filling512

    Hints are like, I am going to read sequentially, or I am going to read randomly.  It's not like, here's a hint describing this complex internal data structure that you could write a 10000 page book about.  If the OS was capable of such a level of heuristic it would be super slow, its just optimized for general use.

  • User profile image
    Dexter

    Bass said:
    Dexter said:
    *snip*

    Why not? Isn't that exactly what happens when you invoke a database optimization?

     

    Sometimes you have to move gigabytes or terrabytes of data to have optimal performance. Look up the hash table data structure.

    Bad analogy. A hashtable data structure moves data around because there's nothing better it can do. A database can do many things to get the best performance and it does just that.

  • User profile image
    Bass

    CreamFilling512 said:

    Hints are like, I am going to read sequentially, or I am going to read randomly.  It's not like, here's a hint describing this complex internal data structure that you could write a 10000 page book about.  If the OS was capable of such a level of heuristic it would be super slow, its just optimized for general use.

    Listen, once it's in memory (the only thing a DB can actually do on it's own) it's not automagically optimized either. You are going to have to structure your data structures in such a way that they make optimial use of the hardware's cache as well. You can not avoid this. 

     

    A lot of those really complex data structures (eg: the judy array) are so complicated is because they are designed around being cached. They can not cache themselves, because x86 does not allow this. So they must structure their data to be cached by the system implicitly.

  • User profile image
    Cream​Filling512

    Bass said:
    CreamFilling512 said:
    *snip*

    Listen, once it's in memory (the only thing a DB can actually do on it's own) it's not automagically optimized either. You are going to have to structure your data structures in such a way that they make optimial use of the hardware's cache as well. You can not avoid this. 

     

    A lot of those really complex data structures (eg: the judy array) are so complicated is because they are designed around being cached. They can not cache themselves, because x86 does not allow this. So they must structure their data to be cached by the system implicitly.

    Normally if you are running a database you need to ensure the hard drive caching can be disabled.  It's not just about performance but you can't guarantee transactions with any caching going on outside the control of the server software.

  • User profile image
    Bass

    Dexter said:
    Bass said:
    *snip*

    Bad analogy. A hashtable data structure moves data around because there's nothing better it can do. A database can do many things to get the best performance and it does just that.

    A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.

  • User profile image
    Bass

    CreamFilling512 said:
    Bass said:
    *snip*

    Normally if you are running a database you need to ensure the hard drive caching can be disabled.  It's not just about performance but you can't guarantee transactions with any caching going on outside the control of the server software.

    Well I don't know about Windows, but there numerous parameters you can customize in Linux regarding the functionality of the file system or even the CPU scheduler. You can even swap out file systems and CPU schedulers completely, Linux is open source. Smiley


    That's actually what Google does, they use the O(1) scheduler with a modern kernel, while the default scheduler tends to be CFS (the "Completely Fair Scheduler"). This is on top many other changes designed to make Linux perform really well for their specialized task.

     

    In Android, Google uses CFS. But they might be adopting the Brain F**k Scheduler (BFS), which is reported to be really f**king fast, and yet it's algorithm is so simple that it's existance is a giant brainf**k of an engima. Kind of like some of Quake 3's rendering algorithms. Smiley

  • User profile image
    ScanIAm

    Bass said:
    CreamFilling512 said:
    *snip*

    Well I don't know about Windows, but there numerous parameters you can customize in Linux regarding the functionality of the file system or even the CPU scheduler. You can even swap out file systems and CPU schedulers completely, Linux is open source. Smiley


    That's actually what Google does, they use the O(1) scheduler with a modern kernel, while the default scheduler tends to be CFS (the "Completely Fair Scheduler"). This is on top many other changes designed to make Linux perform really well for their specialized task.

     

    In Android, Google uses CFS. But they might be adopting the Brain F**k Scheduler (BFS), which is reported to be really f**king fast, and yet it's algorithm is so simple that it's existance is a giant brainf**k of an engima. Kind of like some of Quake 3's rendering algorithms. Smiley

    Wasn't IBM working on hybrid drives a few years ago?

     

    I am pretty far removed from my FileStructures course, but if putting the filesystem in 'memory' would speed things up, perhaps it would do to have the filesystem reside on a gig of SSD memory and still keep the files on actual spindles.

     

    I've got a pair of 60G SSDs in striped as my OS disk, and let me tell ya, getting the entire OS off of the hard-drive seriously speeds things up.

  • User profile image
    turrican

    Sure, why not. I got 12GB. RAM is dirt cheap anyway. I can't believe there are still people ( specially developers ) with only 2GB of RAM. ...that's just weird, almost perverse. Tongue Out

  • User profile image
    CplCarrot

    Bass said:
    Dexter said:
    *snip*

    A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.

    I recall that major enterprise database engines like SQL Server have the ability, and for very high end specific performance , the desirability to store their data on unformatted raw disks. However, the lack of OS support means that this is not a recommended except where every ounce of performance is required. Otherwise it is easier and cheeper to throw a couple of Gb ram at the problem.

     

  • User profile image
    Sven Groot

    Bass said:
    Dexter said:
    *snip*

    A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.

    A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible.

    Sorry, but I'm a PhD student who specializes in database engineering, and that statement goes contrary to everything I've been taught.

  • User profile image
    magicalclick

    Are we talking about DB or OS or DBOS? I am confused.

     

    Well, just think crazily, but what happen when the HDD has no OS? Like remote storage that is shared with many servers? Remote storage is going thru fiber optic with high bandwidth obviously. What I see is the Live Mesh case. Cache on your computer and you don't know what's going on to the actual remote storage, let along the other sync device. And I am using Live Mesh, it is hardly good because synchronization is hard between devices.

     

    And whats the point to store the FS when you still need to access the DB and get some data from a page or many pages? And if HDD is not spining, how do you know your FS on RAM is synced? And what about RAM for actual data caculations, they are RAM intensive on numerous occasions? If RAM runs out, you are going to drop FS on RAM? Just doesn't make any sense to me.

     

    Anyway, FS on RAM? It sounds really really unsafe to begin with.

     

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • User profile image
    figuerres

    magicalclick said:

    Are we talking about DB or OS or DBOS? I am confused.

     

    Well, just think crazily, but what happen when the HDD has no OS? Like remote storage that is shared with many servers? Remote storage is going thru fiber optic with high bandwidth obviously. What I see is the Live Mesh case. Cache on your computer and you don't know what's going on to the actual remote storage, let along the other sync device. And I am using Live Mesh, it is hardly good because synchronization is hard between devices.

     

    And whats the point to store the FS when you still need to access the DB and get some data from a page or many pages? And if HDD is not spining, how do you know your FS on RAM is synced? And what about RAM for actual data caculations, they are RAM intensive on numerous occasions? If RAM runs out, you are going to drop FS on RAM? Just doesn't make any sense to me.

     

    Anyway, FS on RAM? It sounds really really unsafe to begin with.

     

    magicalclick:  yeah this has went back and forth and i can see why you are lost....

     

    the start of this was that the OP was talking about taking the NTFS "metadata" to a memory based system of some kind.

    that he seemed to think that the raw data and the NTFS filesystem data could and should be handled differently.

    at least that what i think the OP was saying.

     

    then as the topic went on folks started talking about FS optimazation and how a DBMS might use the FS or might not.

     

    so the term database has been used here 2 ways one as a normal database for say sql server and as the special data that a file system needs to manage.

  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    Sorry, but I'm a PhD student who specializes in database engineering, and that statement goes contrary to everything I've been taught.

    You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.

     

    Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.

     

    There is a long history of people not agreeing with each other when it comes to what a good database design is. Maybe because there is no genuinely right way to do it.

  • User profile image
    Sven Groot

    Bass said:
    Sven Groot said:
    *snip*

    You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.

     

    Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.

     

    There is a long history of people not agreeing with each other when it comes to what a good database design is. Maybe because there is no genuinely right way to do it.

    You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.

     

    Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.

    I was challenging your assertion that a modern DB delegates to the kernel as much as possible (in the context of file I/O of this thread). I did not say any of the things you just accused me of saying.

  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    I was challenging your assertion that a modern DB delegates to the kernel as much as possible (in the context of file I/O of this thread). I did not say any of the things you just accused me of saying.

    If a kernel can do something that a DB does, it makes sense to remove that functionality and have the kernel do it. This reduces the amount of code the DB developer has to maintain. I don't see what your problem with this is at all.

  • User profile image
    Sven Groot

    Bass said:
    Sven Groot said:
    *snip*

    If a kernel can do something that a DB does, it makes sense to remove that functionality and have the kernel do it. This reduces the amount of code the DB developer has to maintain. I don't see what your problem with this is at all.

    The problem with that is that the kernel has no idea what your application is trying to do. The kernel has to serve every possible type of application, and will therefore need to do things so that they work well for all of them. When it comes to I/O, particularly things like caching and prefetching, there is no one strategy that works best in all scenarios. The kernel therefore uses a strategy that works pretty well for most scenarios, but probably isn't the optimal strategy for any of them.

     

    The DBMS has the advantage that is knows exactly what it's doing. It knows far more about its data access and usage patterns than the kernel ever will, and can therefore use a caching and prefetching strategy that is far better than what the kernel could do. So a truly high performance DBMS will bypass the kernel in this instance and implement their own file I/O, because it knows it can do a better job than the kernel.

     

    This isn't because the code in the kernel is somehow worse than that of the DBMS, nor does it mean DBMS developers are smarter than kernel developers. The only reason they do this is because the kernel is too generalised to provide the best solution, and the DBMS has far more information about what it wants to do and can therefore make much better decisions.

  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    The problem with that is that the kernel has no idea what your application is trying to do. The kernel has to serve every possible type of application, and will therefore need to do things so that they work well for all of them. When it comes to I/O, particularly things like caching and prefetching, there is no one strategy that works best in all scenarios. The kernel therefore uses a strategy that works pretty well for most scenarios, but probably isn't the optimal strategy for any of them.

     

    The DBMS has the advantage that is knows exactly what it's doing. It knows far more about its data access and usage patterns than the kernel ever will, and can therefore use a caching and prefetching strategy that is far better than what the kernel could do. So a truly high performance DBMS will bypass the kernel in this instance and implement their own file I/O, because it knows it can do a better job than the kernel.

     

    This isn't because the code in the kernel is somehow worse than that of the DBMS, nor does it mean DBMS developers are smarter than kernel developers. The only reason they do this is because the kernel is too generalised to provide the best solution, and the DBMS has far more information about what it wants to do and can therefore make much better decisions.

    The same arguments have been made in the past regarding cooperative vs preemptive multitasking. I assume you know who won in the end. The thing about any engineering project is there is limited resources. If you choose to optimize one part of your DB (file I/O), you probably are missing out somewhere else (eg: CPU cache). As I said before, the CPU cache is not programmable. So if you don't design your data structures with the CPU cache in mind, you are losing a ridiculously important optimization.

     

    Another thing you conveniently leave out if that kernel developer and database developer many times work for the same company. Eg: Oracle and IBM are both Linux kernel developers (both interestingly have contributed file systems to the Linux kernel). There is contributions to the Linux kernel that were specifically designed around making databases faster. Sometimes this means tweaking the characteristics of the I/O scheduler and filesystem to improve their database performance, and not the other way around.

     

    Anyway what is interesting about DB performance is depending on what DB vendor you ask, their DB is the fastest. So you can argue about DB performence all day and what approach is better, but MySQL/Oracle/DB2/SQLServer/SQLite are all the fastest DB in existence anyway. Smiley

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.