Coffeehouse Thread

85 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

It is time... Move the filesystem off of disks

Back to Forum: Coffeehouse
  • User profile image
    Sven Groot

    Bass said:
    Sven Groot said:
    *snip*

    But that just plainly isn't true. The kernel can figure out only what was the most common information in the past, and then hope that this will still be true in the future.

     

    If you design your data structures correctly, it SHOULD be true in the future.

     

    But there are many access patterns where this simply isn't true.

     

    And they don't belong in databases.

     

    The kernel cannot know if an application is going to have an access pattern for which this strategy doesn't work. Only the application can know that. Hence, the application can do better.

     

    I don't think that's actually true. For many data structures, you can not reliably predict where an element will be located on a restructure. If you can, you can ask the kernel to cache that part of the file for you anyway. See the readahead syscall. Which is exactly how readahead daemons like preload work! They don't go around implementing their own file caching, that would be stupid. They let the kernel do it for them, as any good DB implementation would.

     

    And CPU caching isn't relevant to this discussion for several reasons: 1. The timing gap between memory and disk is greater than that between CPU cache and memory. 2. CPU cache is a few MB at best, file cache is typically multiple GB. 3. The kernel does not manage the CPU cache, the CPU does that.

     

    1. It may be, but having too many direct memory accesses with effect system performance in a very bad way.

    2. CPU Cache is a few MB at best, so lets ignore it? That sounds like a plan chief.

    3. The fact that the CPU manages the CPU cache is exactly why it's important. You can not program a CPU cache, because that is a right reserved only for the immortals who design hardware, and you are but a mortal software developer. The best you could do is structure your data in a way that is conductive to CPU caching. If you are going to do this anyway, why not structure your data in a way that is conductive to the kernel caching? Hmm? The kernel is like the CPU's prophet to software. Like most prophets, he is mortal and software like all other software, but is bestowed on by the immortals with certain miracles. But you shouldn't ignore his powers, for he speakiths directly to the hardware, and knows it's demands. You do not.

    Look, I agree that in 99.9% of the cases, you do want to leave this to the kernel. I just believe that very high performance databases are one of the cases where you don't.

     

    Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.

     

    They let the kernel do it for them, as any good DB implementation would.

    So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?

  • User profile image
    magicalclick

    Sven Groot said:
    Bass said:
    *snip*

    So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?

    I think he is more refering to stuff like C#. Kernel took care of lower level stuff and DB people took care of higher level stuff. But, that only works for lazy people like me. For pure DB people, I am sure they want to do everything themselves to get a better performance than general kernel.

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?

    Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.

     

    What "clear, measurable" difference? Every DB vendor claims they are the fastest DB. Even SQLite claims it's the fastest DB. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.

     

    If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms. You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.

  • User profile image
    Sven Groot

    Bass said:
    Sven Groot said:
    *snip*

    Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.

     

    What "clear, measurable" difference? Every DB vendor claims they are the fastest DB. Even SQLite claims it's the fastest DB. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.

     

    If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms. You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.

    What "clear, measurable" difference?

    Well, let's look at the top ten results of the TPC-C. If SQLite is so great, why isn't it in there? Or the TPC-H, for that matter. Or check the huge amount of research being done on this topic. Apparently every single one of the papers was a waste of time because the kernel can do better anyway.

     

    I've asked repeatedly, but I've got no answers.

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

    Well, let's look at the top ten results of the TPC-C. If SQLite is so great, why isn't it in there?

     

    Maybe because they don't test it? Did you even bother looking before you pasted that link?

     

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    What exactly would a database know that could possibly aid it in caching? That's not exactly obvious is it? Then why make an assumption that you can not even answer?


    But seriously, let's talk algorithms.

     

    You are building a DB, and you store your DB information in a file or a series of files, right?

    You have to somehow access those files, by using some kind of file api (eg: fopen, fseek), or something like mmap.

    Now you want to cache something, what do you do? Explain the functions a typical DB would use to cache something. And then explain exactly how this is better then using readahead. Because I can not explain this, and you apperently can.


    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

     

    First of all, only modern kernels (eg: Linux 2.6) have the syscall that allows processes to suggest parts of a file be in the kernel's file cache. I don't think Windows any similar syscall, or if it does, it's poorly documented because I've looked hard. So you might be SOL on Windows support, and that's probably one of many reason Drizzle doesn't work on Windows.  Secondly, there are applications that do use it (I even named one), so your entire permise is wrong. Smiley Thirdly, you wouldn't know if Oracle/DB2 et al. do it, because they are closed source. Forthly, well I think 1, 2, and 3 were enough in this case.

     

     

  • User profile image
    Sven Groot

    Bass said:
    Sven Groot said:
    *snip*

    Well, let's look at the top ten results of the TPC-C. If SQLite is so great, why isn't it in there?

     

    Maybe because they don't test it? Did you even bother looking before you pasted that link?

     

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    What exactly would a database know that could possibly aid it in caching? That's not exactly obvious is it? Then why make an assumption that you can not even answer?


    But seriously, let's talk algorithms.

     

    You are building a DB, and you store your DB information in a file or a series of files, right?

    You have to somehow access those files, by using some kind of file api (eg: fopen, fseek), or something like mmap.

    Now you want to cache something, what do you do? Explain the functions a typical DB would use to cache something. And then explain exactly how this is better then using readahead. Because I can not explain this, and you apperently can.


    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

     

    First of all, only modern kernels (eg: Linux 2.6) have the syscall that allows processes to suggest parts of a file be in the kernel's file cache. I don't think Windows any similar syscall, or if it does, it's poorly documented because I've looked hard. So you might be SOL on Windows support, and that's probably one of many reason Drizzle doesn't work on Windows.  Secondly, there are applications that do use it (I even named one), so your entire permise is wrong. Smiley Thirdly, you wouldn't know if Oracle/DB2 et al. do it, because they are closed source. Forthly, well I think 1, 2, and 3 were enough in this case.

     

     

    Now you want to cache something, what do you do?

    As I've said a million times, but you keep ignoring, it isn't about when you want to cache something. It's about when you want to remove something from the cache. Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.

     

    That's just one simple example. There is 40 years of research on this point, and no, I don't know it all, nor do I have the time to read all of it and create a summary for you.

     

    First of all, only modern kernels can do things like readahead. I don't think Windows any similar functionality, or if it does, it's poorly documented, because I've looked hard.

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point.

     

    Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great. All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    These people don't implement their own caching schemes because they have nothing better to do. I'm sure they started out without explicit caching, but then they profiled it and found they were hitting the disc more often than was needed. That's how performance optimization works: you start with a simple solution, measure it, then try to improve. That's where these caching solutions come from. They didn't set out thinking "well I've heard that we have to cache things manually, so let's not measure it and do it regardless". That's not how these things work.

  • User profile image
    Bass

    Sven Groot said:
    Bass said:
    *snip*

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point.

     

    Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great. All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    These people don't implement their own caching schemes because they have nothing better to do. I'm sure they started out without explicit caching, but then they profiled it and found they were hitting the disc more often than was needed. That's how performance optimization works: you start with a simple solution, measure it, then try to improve. That's where these caching solutions come from. They didn't set out thinking "well I've heard that we have to cache things manually, so let's not measure it and do it regardless". That's not how these things work.

    Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.

     

    Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be faster. Woops!

     

    Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be "keep commonly accessed data in memory". Why? Because that, is commonly accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics  algorithm that wastes clock cycles thinking what decision to fail on next.

     

    To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.

     

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.

     

    Okay, this kind of functionality is new in that it's not 30 years old.  Smiley But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly not something "in the future". Unless "the future is now". I'm sure Windows has it too, it's probably just one of those "undocumented APIs", because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all about this already, since you are a DB/caching expert. Wink

     

    All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is "archaic" enough that a fork is in process to "modernize" it (in order to make it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. "LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!"

  • User profile image
    Cream​Filling512

    Bass said:
    CreamFilling512 said:
    *snip*

    The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data. Quite frankly to run efficiently on an x86 processor, it has no other choice. . I don't know how many of you ever did assembly programming. Although you can create a memory caching algorithm, you can't create your own processor caching algorithm on x86. That is hard-coded on the CPU by Intel/AMD. Performance degrades considerably on x86 if there is are many cache misses, which is the cache equivalent of a page fault. So your data structures must consider this in order to efficiently run on the architecture.

    No, it really doesn't matter at all.  A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?  If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

  • User profile image
    Bass

    CreamFilling512 said:
    Bass said:
    *snip*

    No, it really doesn't matter at all.  A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?  If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

    A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?

     

    The fact that a database server is I/O limited is exactly why you want to access time. Think for a moment on why that would be so.

     

    If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

     

    The kernel doesn't even need to involve the CPU all that much to read I/O to memory. Come on, pay attention to the rest of the debate, I mentioned this before and Dexter even elaborated on the point.

  • User profile image
    Cream​Filling512

    Bass said:
    CreamFilling512 said:
    *snip*

    A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?

     

    The fact that a database server is I/O limited is exactly why you want to access time. Think for a moment on why that would be so.

     

    If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

     

    The kernel doesn't even need to involve the CPU all that much to read I/O to memory. Come on, pay attention to the rest of the debate, I mentioned this before and Dexter even elaborated on the point.

    Everytime you issue a kernel-mode I/O request you are doing a system call, throwing up an interrupt, and causing a thread context switch to kernel-mode regardless of if the kernel actually has to go to disk for the data or not.  You are thrashing the CPU cache by doing this.

  • User profile image
    Dexter

    Bass said:
    Sven Groot said:
    *snip*

    Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.

     

    Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be faster. Woops!

     

    Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be "keep commonly accessed data in memory". Why? Because that, is commonly accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics  algorithm that wastes clock cycles thinking what decision to fail on next.

     

    To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.

     

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.

     

    Okay, this kind of functionality is new in that it's not 30 years old.  Smiley But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly not something "in the future". Unless "the future is now". I'm sure Windows has it too, it's probably just one of those "undocumented APIs", because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all about this already, since you are a DB/caching expert. Wink

     

    All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is "archaic" enough that a fork is in process to "modernize" it (in order to make it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. "LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!"

    Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU).

     

    Since you seem incapable to accept that usermode caching can be done without memcpy despite the fact I explained how it can be done, would you be so kind to explain to us why we should bother to provide more evidence?

     

    And when all you can say is "the kernel is awesome" and the "cache is intelligent" do you expact us to provide complete technical arguments? Since when "awesomeness" is a technical word? Or "brainf**k"?

     

    You bring a lot of stuff into discussion but you fail to provide any decent example on how the said stuff might be used. readahead? sure, why not. Let's say what this readahead is all about:

     

    ssize_t readahead(int fd, off64_t offset, size_t count);

    So, the database has to tell the kernel to bring in memory a specific portion of the file. Woops. The database has to know that! The horror! The sheer stupidty of the database having to know what parts of the file it needs! How's that any different from the database doing non cached read operations? And your fancy pants readahead call turns out to be a synchronous call. Not the best way to achieve high performance.

     

    mmap? Sure, why not? To your advantage servers have moved to 64 bits and mapping huge files is less of an issue. It would be fun to see you using mmap with a 50 GB database on a 32 bit system. You'll have to keep call mmap and munmap all day long. And you'll probably trigger a lot of page faults in the process. I'm not even bothering to ask how to deal with writes when mmap is involved. Feel free to figure out for yourself if you think you're soo cool and everyone else is an idiot.

     

    HDD's head position? What a good joke, probably from a Sci Fi book. Yes, sure, there are disk I/O scheduling algorithms that take into account the position of the head. Except that:

    - what the kernel does is keep track of where on the drive the last operation was performed, it does not get this information from the drive. Since high performance databases tend to used dedicated drives they can keep track of this too. Not that they need to to do so because at the end of the day they don't bypass the disk driver. In any case, there's no "zomg" context switch required for this. In the worst case the database needs to retrieve the drive geometry from the kernel but since that doesn't change over time it can be done only once, at startup.

    - even some, "modern", consumer drives can do command queueing (ever heard of NCQ?). Good look trying to keep track of head's position with those drives. Though you won't need to do that, they do command queueing so the kernel doesn't need to do it.

    - should I bother to mention SAN storage? Those boxes with tens or hundreds of drives where a read/write request can be spread to multiple drives?

     

    Keeping commonly accessed data in memory? Sure, why not? But what exactly constitues "commonly accessed data" in a database given the fact that it can process a wide number of queries and each query might require completly different data?

     

  • User profile image
    AndyC

    Bass said:
    Sven Groot said:
    *snip*

    Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.

     

    What "clear, measurable" difference? Every DB vendor claims they are the fastest DB. Even SQLite claims it's the fastest DB. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.

     

    If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms. You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.

    See a million-and-one benchmarks of SQL Server running in Threaded mode (kernel handles scheduling) vs Fiber mode (SQL Server manages scheduling). Prior to the enormous amount of work done in Server 2008 R2 (much of which was driven by SQL Server), fiber mode won absolutely hands down.

     

    You keep coming back to the argument that MRU is always the best caching policy. But there is lots of CS research out there that proves that simply isn't the case. And a basic assumption of all CS is that the more you know about a problem in advance, the easier it is to provide the optimal solution.  

  • User profile image
    AndyC

    Bass said:
    Sven Groot said:
    *snip*

    Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.

     

    Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be faster. Woops!

     

    Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be "keep commonly accessed data in memory". Why? Because that, is commonly accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics  algorithm that wastes clock cycles thinking what decision to fail on next.

     

    To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.

     

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.

     

    Okay, this kind of functionality is new in that it's not 30 years old.  Smiley But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly not something "in the future". Unless "the future is now". I'm sure Windows has it too, it's probably just one of those "undocumented APIs", because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all about this already, since you are a DB/caching expert. Wink

     

    All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is "archaic" enough that a fork is in process to "modernize" it (in order to make it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. "LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!"

    You appear to be under the assumption that memcpy has to be implemented by a big CPU loop copying data around in physical memory. Or that a database that was managing it's own memory allocation and caching algorithms would work in different sized units to memory pages.

     

    Challenge your assumptions.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.