Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Discussions

Bass Bass Knows the way the wind is flowing.
  • It is time... Move the filesystem off of disks

    Dexter said:
    Bass said:
    *snip*

     

    Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?

    Are you insane?  I've said that I don't think the database should move data around to keep the kernel happy. That's not the same thing as "localizing" important data so it fits the CPU cache (or the harddrive).

     

    And why not? The kind of caching algorithm the Linux kernel uses is not all that different from what Intel microcode uses. You kill two birds with one stone. PS: Ad homiem is evidence of a losing argument.

     

    Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?

     

    Well a kernel can safely assume that important/commonly used data in memory will be near each other, because that's how optimizing compilers and high performence databases tend to structure their data. True story.

     

  • It is time... Move the filesystem off of disks

    CreamFilling512 said:
    Bass said:
    *snip*

    Cooperative multitasking would probably make more sense on a database server.  Normally the entire machine is dedicated to running the database server, so there is really only one process at a time.  You would lose the overhead of preemptive multitasking, and getting context switched at a bad time.

    Well with that argument, a database server should BE a kernel right? Tongue Out Since it's going to do everything a kernel does! That is clearly the way to get the optimal performance.

  • It is time... Move the filesystem off of disks

    CreamFilling512 said:
    Dexter said:
    *snip*

    I also said the same thing, and he started talking about CPU caches for some reason.  Anyway this is basic engineering sense, it doesn't even really have anything to do with databases, you can almost always get better performance if you "roll it yourself".  Whether or not it is feasible to do this or worth the effort is what you need to decide as an engineer.  But when major commercial databases like Microsoft SQL,  that have massive budgets and huge teams of engineers, made a decision to roll their own caching scheme, maybe it's a good indication you're wrong about this?

    The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data. Quite frankly to run efficiently on an x86 processor, it has no other choice. . I don't know how many of you ever did assembly programming. Although you can create a memory caching algorithm, you can't create your own processor caching algorithm on x86. That is hard-coded on the CPU by Intel/AMD. Performance degrades considerably on x86 if there is are many cache misses, which is the cache equivalent of a page fault. So your data structures must consider this in order to efficiently run on the architecture.

  • It is time... Move the filesystem off of disks

    Minh said:
    Bass said:
    *snip*

    The same arguments have been made in the past regardless cooperative vs preemptive multitasking.

     

    Not really... If your claim is:

     

    OS file handling -> Generic (good)

    DBMS "file" handling -> Specific (unnecessary)

     

    then Preemptive vs. Cooperative doesn't fall on the same scale. Pre-emptive vs Cooperative fall more on hardware vs software. And I believe you know which is better.

     

    Look, DBMSes DO have their own internal "file" / page / disk memory management... That's fact. Why? Because that code is more specific to their domain. DBMS loves contiguous blocks of memory... whereas, HDs have 1 to 2 Dimensions (spiral platters) to their storage.

     

    PS. C9, fix the GD posting errors

    I don't think preemptive vs. cooperative is a different analogy at all. Cooperative multitasking allows a process to decide how much CPU time it needs. In preemptive multitasking, this is decided for the process (although in some systems like POSIX/Windows, a process can suggest how should it be handled to the kernel).

  • It is time... Move the filesystem off of disks

    Dexter said:
    Bass said:
    *snip*

    Hrm, I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring arguments for things that nobody argued about.

     

    It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short why it can't and Sven provided more details. Instead of answering to that you went on a rant about CPU cache, schedulers, algorithms and whatnot.

    Hrm, I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring arguments for things that nobody argued about.

     

    It's all part of the same argument. You'll just have to keep up. Smiley

     

    It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short why it can't and Sven provided more details. Instead of answering to that you went on a rant about CPU cache, schedulers, algorithms and whatnot.

     

    The kernel can do caching better. You can claim that the software knows more about the characteristics of the data structures it uses, but the kernel knows more about the characteristics of the hardware it is running on.

  • Google founders to cash in $5.2 billion dollars worth of stock

    figuerres said:

    interesting...  not that it's "over 5 years" so they could stop after some time and chnage the plan i guess?

     

    but also say thats 1 billion a year / 2  what are you going to do with that much cash? (well not really cash but funds)

    are they going to invest? start a VC ?  start a new business ?

     

    with that kind of "Croesus mode" you got to have some plan for it i would think....

     

    Robot army to conquer to ther world.

  • It is time... Move the filesystem off of disks

    Sven Groot said:
    Bass said:
    *snip*

    The problem with that is that the kernel has no idea what your application is trying to do. The kernel has to serve every possible type of application, and will therefore need to do things so that they work well for all of them. When it comes to I/O, particularly things like caching and prefetching, there is no one strategy that works best in all scenarios. The kernel therefore uses a strategy that works pretty well for most scenarios, but probably isn't the optimal strategy for any of them.

     

    The DBMS has the advantage that is knows exactly what it's doing. It knows far more about its data access and usage patterns than the kernel ever will, and can therefore use a caching and prefetching strategy that is far better than what the kernel could do. So a truly high performance DBMS will bypass the kernel in this instance and implement their own file I/O, because it knows it can do a better job than the kernel.

     

    This isn't because the code in the kernel is somehow worse than that of the DBMS, nor does it mean DBMS developers are smarter than kernel developers. The only reason they do this is because the kernel is too generalised to provide the best solution, and the DBMS has far more information about what it wants to do and can therefore make much better decisions.

    The same arguments have been made in the past regarding cooperative vs preemptive multitasking. I assume you know who won in the end. The thing about any engineering project is there is limited resources. If you choose to optimize one part of your DB (file I/O), you probably are missing out somewhere else (eg: CPU cache). As I said before, the CPU cache is not programmable. So if you don't design your data structures with the CPU cache in mind, you are losing a ridiculously important optimization.

     

    Another thing you conveniently leave out if that kernel developer and database developer many times work for the same company. Eg: Oracle and IBM are both Linux kernel developers (both interestingly have contributed file systems to the Linux kernel). There is contributions to the Linux kernel that were specifically designed around making databases faster. Sometimes this means tweaking the characteristics of the I/O scheduler and filesystem to improve their database performance, and not the other way around.

     

    Anyway what is interesting about DB performance is depending on what DB vendor you ask, their DB is the fastest. So you can argue about DB performence all day and what approach is better, but MySQL/Oracle/DB2/SQLServer/SQLite are all the fastest DB in existence anyway. Smiley

  • Google founders to cash in $5.2 billion dollars worth of stock

    http://www.businessweek.com/news/2010-01-22/google-s-founders-file-to-sell-5-million-shares-each-update1-.html

     

    This move will remove their controlling stake in the company. Hence "do no evil" might not even be possible.

  • It is time... Move the filesystem off of disks

    Sven Groot said:
    Bass said:
    *snip*

    I was challenging your assertion that a modern DB delegates to the kernel as much as possible (in the context of file I/O of this thread). I did not say any of the things you just accused me of saying.

    If a kernel can do something that a DB does, it makes sense to remove that functionality and have the kernel do it. This reduces the amount of code the DB developer has to maintain. I don't see what your problem with this is at all.

  • iBored

    Whenever it is, I'm getting two!