Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Discussions

Bass Bass Knows the way the wind is flowing.
  • Microsoft ​TuneTablet​Telephone--​Video

    raymond said:

    The wrong Steve introduces a new category, just what I was looking for:

     

     

    LOL, the Apple guy in the second video (0:35) basically said the same things I said in the other thread. Wow. Big Smile

  • !!@!THE APPLE TABLET EVENT!@!!

    <blink>

    It's better then I thought. BAM! This will revolutionize education. This will revolutionize business. This will revolutionize how you read books, how you watch movies, and how you listen to music. And it will revolutionalize the world. Our world. This is the ultimate device. iPad.

    </blink>

     

  • Data space on hosts Vs. reality?

    ManipUni said:
    AndyC said:
    *snip*

    Do you think staffing, monitoring, and air conditioning go up by 75% of the initial value of the account if you add 2 GB to the account?

    Like W3bbo said, web hosts love overselling. If they offer like 500 GB for $10/mo, they are probably going the premise that you will never use 500 GB, anyway. Smiley Plus, also as W3bbo said, they'll find some other way to cap you if you a heavy user.

  • Data space on hosts Vs. reality?

    Amazon S3 charges $0.15 GB / mo (bandwidth not included). 

    Hell you can get an entire virtual server with like 16 GB of storage for $20 / mo. I'm sure W3bbo can hook you up with a better deal then $6/GB anyway. Wink

  • Revealed: the HP Slate

    The Apple tablet will be so revolutionary that, future historians will label events in this world, our world, in BS or "Before Slate" years. Starting tomorrow, the era of the slate begins. A new era, a wonderful era, an area of pure happiness and awesome. We will celebrate tomorrow for years and years to come. 

  • It is time... Move the filesystem off of disks

    CreamFilling512 said:
    Bass said:
    *snip*

    No, it really doesn't matter at all.  A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?  If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

    A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?

     

    The fact that a database server is I/O limited is exactly why you want to access time. Think for a moment on why that would be so.

     

    If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?

     

    The kernel doesn't even need to involve the CPU all that much to read I/O to memory. Come on, pay attention to the rest of the debate, I mentioned this before and Dexter even elaborated on the point.

  • So MS Ranks 51st Best Company To Work For

    CannotResolveSymbol said:

    Awww....  my university's not on the list anymore.  Sad

     

    IIRC, Microsoft used to be quite a bit more highly ranked (several years ago)...  maybe that was a different magazine, though.

    The sign of a maturing company? Smiley

     

    Google used to be #1 for a few years at least. Now they are #4.

     

    SAS really deserves their spot though. They take all the perks of Google and take in into a new level.

  • It is time... Move the filesystem off of disks

    Sven Groot said:
    Bass said:
    *snip*

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point.

     

    Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great. All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    These people don't implement their own caching schemes because they have nothing better to do. I'm sure they started out without explicit caching, but then they profiled it and found they were hitting the disc more often than was needed. That's how performance optimization works: you start with a simple solution, measure it, then try to improve. That's where these caching solutions come from. They didn't set out thinking "well I've heard that we have to cache things manually, so let's not measure it and do it regardless". That's not how these things work.

    Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.

     

    Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be faster. Woops!

     

    Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be "keep commonly accessed data in memory". Why? Because that, is commonly accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics  algorithm that wastes clock cycles thinking what decision to fail on next.

     

    To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.

     

    So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.

     

    Okay, this kind of functionality is new in that it's not 30 years old.  Smiley But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly not something "in the future". Unless "the future is now". I'm sure Windows has it too, it's probably just one of those "undocumented APIs", because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all about this already, since you are a DB/caching expert. Wink

     

    All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.

     

    I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is "archaic" enough that a fork is in process to "modernize" it (in order to make it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. "LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!"

  • It is time... Move the filesystem off of disks

    Sven Groot said:
    Bass said:
    *snip*

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

    Well, let's look at the top ten results of the TPC-C. If SQLite is so great, why isn't it in there?

     

    Maybe because they don't test it? Did you even bother looking before you pasted that link?

     

    I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.

     

    What exactly would a database know that could possibly aid it in caching? That's not exactly obvious is it? Then why make an assumption that you can not even answer?


    But seriously, let's talk algorithms.

     

    You are building a DB, and you store your DB information in a file or a series of files, right?

    You have to somehow access those files, by using some kind of file api (eg: fopen, fseek), or something like mmap.

    Now you want to cache something, what do you do? Explain the functions a typical DB would use to cache something. And then explain exactly how this is better then using readahead. Because I can not explain this, and you apperently can.


    And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?

     

    First of all, only modern kernels (eg: Linux 2.6) have the syscall that allows processes to suggest parts of a file be in the kernel's file cache. I don't think Windows any similar syscall, or if it does, it's poorly documented because I've looked hard. So you might be SOL on Windows support, and that's probably one of many reason Drizzle doesn't work on Windows.  Secondly, there are applications that do use it (I even named one), so your entire permise is wrong. Smiley Thirdly, you wouldn't know if Oracle/DB2 et al. do it, because they are closed source. Forthly, well I think 1, 2, and 3 were enough in this case.

     

     

  • It is time... Move the filesystem off of disks

    Sven Groot said:
    Bass said:
    *snip*

    So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?

    Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.

     

    What "clear, measurable" difference? Every DB vendor claims they are the fastest DB. Even SQLite claims it's the fastest DB. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.

     

    If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms. You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.