<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" media="screen" href="/styles/xslt/rss.xslt"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:c9="http://channel9.msdn.com">
<channel>
	<title>Channel 9 Forums - Coffeehouse - It is time... Move the filesystem off of disks</title>
	<atom:link rel="self" type="application/rss+xml" href="http://channel9.msdn.com/Forums/rss"></atom:link>
	<image>
		<url>http://mschnlnine.vo.llnwd.net/d1/Dev/App_Themes/C9/images/feedimage.png</url>
		<title>Channel 9 Forums - Coffeehouse - It is time... Move the filesystem off of disks</title>
		<link>http://channel9.msdn.com/Forums</link>
	</image>
	<description>Channel 9 keeps you up to date with the latest news and behind the scenes info from Microsoft that developers love to keep up with. From LINQ to SilverLight – Watch videos and hear about all the cool technologies coming and the people behind them.</description>
	<link>http://channel9.msdn.com/Forums</link>
	<language>en</language>
	<pubDate>Wed, 19 Jun 2013 02:41:05 GMT</pubDate>
	<lastBuildDate>Wed, 19 Jun 2013 02:41:05 GMT</lastBuildDate>
	<generator>Rev9</generator>
	<c9:totalResults>85</c9:totalResults>
	<c9:pageCount>-85</c9:pageCount>
	<c9:pageSize>-1</c9:pageSize>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Most new computers today are sold with between 4 - 6 GB and that is set to rise. Yet most filesystems continue to store their databases on the physical drives be it SSD or Hard Disks. They have very clever caching but yet still more often than not when a
 file's meta data is queried the drive has to be powered and we have to wait several milliseconds.</p>
<p>&nbsp;</p>
<p>Yes, your NTFS database CAN reach up to 1 GB on a really large modern drive with ACL permissions set all over the place, but so what? When you have several GB of memory free kicking around would you not give that up for far more time on your laptop and quicker
 responses across the board? </p>
<p>&nbsp;</p>
<p>And before you say a word about data integrity, we already have transactional filesystems to solve that problem...
</p>
<p>&nbsp;</p>
<p>PS - Keep in mind *DATA* and the filesystem database are distinctly different things (with larger files the database actually gets smaller)</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/522547#522547</link>
		<pubDate>Fri, 22 Jan 2010 18:22:19 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/522547#522547</guid>
		<dc:creator>Manip</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/ManipUni/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteText">When you have several GB of memory free kicking around</div></blockquote></p>
<p>&nbsp;</p>
<p>But who says that I have several GB of memory free kicking around? Right now my task manager says i have 50 megabytes free (out of 2 GB). Are you perhaps running some stone age operaring system that's incapable of putting the available memory to good use?
 <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-4.gif' alt='Tongue Out' /> </p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/43ffbcaca31043c7a6479deb00d57f40#43ffbcaca31043c7a6479deb00d57f40</link>
		<pubDate>Fri, 22 Jan 2010 18:33:41 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/43ffbcaca31043c7a6479deb00d57f40#43ffbcaca31043c7a6479deb00d57f40</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Well first question is &quot;does the OS keep a copy in memory already?&quot; seems like the OS might do that and we just do not see that it has?</p>
<p>&nbsp;</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7f158e9882fc400183ef9deb00d57f6a#7f158e9882fc400183ef9deb00d57f6a</link>
		<pubDate>Fri, 22 Jan 2010 18:42:57 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7f158e9882fc400183ef9deb00d57f6a#7f158e9882fc400183ef9deb00d57f6a</guid>
		<dc:creator>figuerres</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/figuerres/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Everything on NTFS is a file, including the file tables, which are metafiles, so they get cached just the same.</p>
<p>&nbsp;</p>
<p>Edit: Also it's indexed with B&#43; trees, so lookup is very fast.</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/30c4c3bbbf1a420390f89deb00d57f94#30c4c3bbbf1a420390f89deb00d57f94</link>
		<pubDate>Fri, 22 Jan 2010 19:13:17 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/30c4c3bbbf1a420390f89deb00d57f94#30c4c3bbbf1a420390f89deb00d57f94</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>This is already done at least on Linux. Actually ext4 even caches writes, to hilarious effect. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p>
<p>&nbsp;</p>
<p>When a feature called &quot;barriers&quot; is turned off, ext4 will detect when a file is constantly being read and write from, and do that entirely in RAM. fsync() has no effect on this. This makes databases insanely fast, but at the cost of possible system integrity.
 Most desktop OSes turn barriers on, and server OSes tend to turn barriers off.</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/313847cd3d99477da83c9deb00d57fbf#313847cd3d99477da83c9deb00d57fbf</link>
		<pubDate>Fri, 22 Jan 2010 19:14:17 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/313847cd3d99477da83c9deb00d57fbf#313847cd3d99477da83c9deb00d57fbf</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<p>This is already done at least on Linux. Actually ext4 even caches writes, to hilarious effect.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
<p>&nbsp;</p>
<p>When a feature called &quot;barriers&quot; is turned off, ext4 will detect when a file is constantly being read and write from, and do that entirely in RAM. fsync() has no effect on this. This makes databases insanely fast, but at the cost of possible system integrity.
 Most desktop OSes turn barriers on, and server OSes tend to turn barriers off.</p>
</div></blockquote>
<p>That's bizarre, I mean usually database severs want to do their own caching, and control disk flushing since they know better than the OS.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ad01997003ed4c8396259deb00d57feb#ad01997003ed4c8396259deb00d57feb</link>
		<pubDate>Fri, 22 Jan 2010 19:22:09 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ad01997003ed4c8396259deb00d57feb#ad01997003ed4c8396259deb00d57feb</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>That's bizarre, I mean usually database severs want to do their own caching, and control disk flushing since they know better than the OS.</p>
</div></blockquote>
<p>I don't think most actually do, or if they do, they don't do a particularly good job of it. Really with DBs you expect that if you do an INSERT it will actually happen. So a lot of databases (at least Postgres and SQLite) call fsync after the completion
 of a simple write operation. Which on barriers enabled FS, tends to block until the data is actually written out to disk. Which is of course, slow.</p>
<p>&nbsp;</p>
<p>Without barriers the kernel decides when it feels it is appropriate to actually write the data to disc.This means a lot of file operations (both read and write) are all happening entirely in RAM, and only when the disk is available and it doesn't hamper
 performence will the kernel persist the contents to disk. This can be as long as 60 seconds after the fsync request was made (or longer?).</p>
<p><br>
You can fine tweak the performance vs data security, but the more data security you want, the less performance you are going to get (and vise-versa). Just a fact of life I guess. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e5976ec8b32a420495189deb00d58019#e5976ec8b32a420495189deb00d58019</link>
		<pubDate>Fri, 22 Jan 2010 19:30:51 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e5976ec8b32a420495189deb00d58019#e5976ec8b32a420495189deb00d58019</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>I don't think most actually do, or if they do, they don't do a particularly good job of it. Really with DBs you expect that if you do an INSERT it will actually happen. So a lot of databases (at least Postgres and SQLite) call fsync after the completion
 of a simple write operation. Which on barriers enabled FS, tends to block until the data is actually written out to disk. Which is of course, slow.</p>
<p>&nbsp;</p>
<p>Without barriers the kernel decides when it feels it is appropriate to actually write the data to disc.This means a lot of file operations (both read and write) are all happening entirely in RAM, and only when the disk is available and it doesn't hamper
 performence will the kernel persist the contents to disk. This can be as long as 60 seconds after the fsync request was made (or longer?).</p>
<p><br>
You can fine tweak the performance vs data security, but the more data security you want, the less performance you are going to get (and vise-versa). Just a fact of life I guess.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>Well I'm talking about the commercial database servers where scaling is necessary. &nbsp;Like Microsoft SQL Server running Hotmail or something. &nbsp;And if you've ever run MSSQL you know that it will consume all memory on the machine with the out-of-box configuration,
 because its doing its own disk caching.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4baaeb8d6bbc43afb89f9deb00d58048#4baaeb8d6bbc43afb89f9deb00d58048</link>
		<pubDate>Fri, 22 Jan 2010 19:44:02 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4baaeb8d6bbc43afb89f9deb00d58048#4baaeb8d6bbc43afb89f9deb00d58048</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Well I'm talking about the commercial database servers where scaling is necessary. &nbsp;Like Microsoft SQL Server running Hotmail or something. &nbsp;And if you've ever run MSSQL you know that it will consume all memory on the machine with the out-of-box configuration,
 because its doing its own disk caching.</p>
</div></blockquote>
<p>Perhaps, but commercial databases tend to guarantee some kind of data integrity, which is impossible unless they persist the contents of a transaction. I don't think they would use write caching by default, as it is fundamentally dangerous to this objective.</p>
<p>&nbsp;</p>
<p>I don't think a well designed DB would do extensive read caching either. It's something that is more readily done by a kernel. As everyone has been saying, that's read caching is what most FS do for you (including NTFS) for free.</p>
<p>&nbsp;</p>
<p>A DB can provide detailed information about their file I/O requirements through an mmap call anyway.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2b8215fb732e4980bbd89deb00d58075#2b8215fb732e4980bbd89deb00d58075</link>
		<pubDate>Fri, 22 Jan 2010 19:45:30 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2b8215fb732e4980bbd89deb00d58075#2b8215fb732e4980bbd89deb00d58075</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Perhaps, but commercial databases tend to guarantee some kind of data integrity, which is impossible unless they persist the contents of a transaction. I don't think they would use write caching by default, as it is fundamentally dangerous to this objective.</p>
<p>&nbsp;</p>
<p>I don't think a well designed DB would do extensive read caching either. It's something that is more readily done by a kernel. As everyone has been saying, that's read caching is what most FS do for you (including NTFS) for free.</p>
<p>&nbsp;</p>
<p>A DB can provide detailed information about their file I/O requirements through an mmap call anyway.</p>
</div></blockquote>
<p>Indeed, transactional databases require non cached writes (and it's not only about filesystem caching but also about hardware caching).
</p>
<p>&nbsp;</p>
<p>As for read caching: of course they cache reads, it would be insane not to do it. The filesystem has no magic orb to tell it what exactly to cache, read ahead, discard from cache&nbsp;etc.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/82adcc32783a44b1b3109deb00d580a3#82adcc32783a44b1b3109deb00d580a3</link>
		<pubDate>Fri, 22 Jan 2010 20:01:50 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/82adcc32783a44b1b3109deb00d580a3#82adcc32783a44b1b3109deb00d580a3</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>From what I understand from Windows internals stuff.&nbsp; Any time you open a file regardless of how you do it, its implemented internally&nbsp;using memory-mapped files. File gets mapped to some pages in virtual memory, then it gets brought in by on-demand paging
 and some heuristics to do basic read-ahead, I imagine the paging file and on-demand paging of EXE/DLLs use the same mechanism.&nbsp; But any kind of I/O APIs, like C's fread() or whatever, never issues I/O requests, you'll just get I/O requests if it touches some
 memory-mapped file and gets a page fault.</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e7df6d1e63014f42be199deb00d580cd#e7df6d1e63014f42be199deb00d580cd</link>
		<pubDate>Fri, 22 Jan 2010 20:21:03 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e7df6d1e63014f42be199deb00d580cd#e7df6d1e63014f42be199deb00d580cd</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<p>From what I understand from Windows internals stuff.&nbsp; Any time you open a file regardless of how you do it, its implemented internally&nbsp;using memory-mapped files. File gets mapped to some pages in virtual memory, then it gets brought in by on-demand paging
 and some heuristics to do basic read-ahead, I imagine the paging file and on-demand paging of EXE/DLLs use the same mechanism.&nbsp; But any kind of I/O APIs, like C's fread() or whatever, never issues I/O requests, you'll just get I/O requests if it touches some
 memory-mapped file and gets a page fault.</p>
</div></blockquote>
<p>Yeah Windows has a similar call to mmap, and I assume also does paging. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/af919bf015234140940b9deb00d580f9#af919bf015234140940b9deb00d580f9</link>
		<pubDate>Fri, 22 Jan 2010 20:24:42 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/af919bf015234140940b9deb00d580f9#af919bf015234140940b9deb00d580f9</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Indeed, transactional databases require non cached writes (and it's not only about filesystem caching but also about hardware caching).
</p>
<p>&nbsp;</p>
<p>As for read caching: of course they cache reads, it would be insane not to do it. The filesystem has no magic orb to tell it what exactly to cache, read ahead, discard from cache&nbsp;etc.</p>
</div></blockquote>
<p>It doesn't but an OS can accomplish a lot of the intelligent read caching by a paging algorithm, which keeps track of the most commonly read parts of a file. This is easy to implement with the information the kernel gets from mmap and subsequent use of the
 mmaped space. Most of the performence heavy lifting can be done by the kernel, which knows more about the characters of the persistent store (eg: location of the R/W head, etc).</p>
<p>&nbsp;</p>
<p>It would be downright stupid for a DB to take all of this in it's own hands, but I am sure there is some legacy DBs out there who still do all of this, because they were written during a time where MS-DOS was considered advanced. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/045e7c7ff5104cc38df79deb00d58127#045e7c7ff5104cc38df79deb00d58127</link>
		<pubDate>Fri, 22 Jan 2010 20:29:40 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/045e7c7ff5104cc38df79deb00d58127#045e7c7ff5104cc38df79deb00d58127</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>It doesn't but an OS can accomplish a lot of the intelligent read caching by a paging algorithm, which keeps track of the most commonly read parts of a file. This is easy to implement with the information the kernel gets from mmap and subsequent use of the
 mmaped space. Most of the performence heavy lifting can be done by the kernel, which knows more about the characters of the persistent store (eg: location of the R/W head, etc).</p>
<p>&nbsp;</p>
<p>It would be downright stupid for a DB to take all of this in it's own hands, but I am sure there is some legacy DBs out there who still do all of this, because they were written during a time where MS-DOS was considered advanced.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>No way man, the OS has no knowledge of the internal structure of the database.&nbsp; You can certainly get better performance by doing more work.&nbsp; Database servers optimize their layout on the physical disk.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/5bb9c8450b484ac8ab7c9deb00d58155#5bb9c8450b484ac8ab7c9deb00d58155</link>
		<pubDate>Fri, 22 Jan 2010 20:36:37 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/5bb9c8450b484ac8ab7c9deb00d58155#5bb9c8450b484ac8ab7c9deb00d58155</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>No way man, the OS has no knowledge of the internal structure of the database.&nbsp; You can certainly get better performance by doing more work.&nbsp; Database servers optimize their layout on the physical disk.</p>
</div></blockquote>
<p>You can optimize the internal structure of the database such that the OS will optimize reads to the fullest.
</p>
<p>&nbsp;</p>
<p>It's really no different then optimizing instructions, you have no control over branch prediction and cache usage on an x86 processor. But you can still optimize code for branch prediction and cache, by modifying the structure of your program.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/205e5543893949ea948c9deb00d58181#205e5543893949ea948c9deb00d58181</link>
		<pubDate>Fri, 22 Jan 2010 20:41:00 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/205e5543893949ea948c9deb00d58181#205e5543893949ea948c9deb00d58181</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>It doesn't but an OS can accomplish a lot of the intelligent read caching by a paging algorithm, which keeps track of the most commonly read parts of a file. This is easy to implement with the information the kernel gets from mmap and subsequent use of the
 mmaped space. Most of the performence heavy lifting can be done by the kernel, which knows more about the characters of the persistent store (eg: location of the R/W head, etc).</p>
<p>&nbsp;</p>
<p>It would be downright stupid for a DB to take all of this in it's own hands, but I am sure there is some legacy DBs out there who still do all of this, because they were written during a time where MS-DOS was considered advanced.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>Any application (big enough, complex enough to worth the effort of doing it) can do better than the kernel at caching because an application will always know better than the kernel what data&nbsp;it&nbsp;needs. The kernel can at best obeserve the reads and the writes
 and some hints passed through the system calls and do some guesswork based on that. The kernel does not have a time machine to look into the future but the application might just have one.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cc6b323177354e89b4589deb00d581b0#cc6b323177354e89b4589deb00d581b0</link>
		<pubDate>Fri, 22 Jan 2010 20:41:06 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cc6b323177354e89b4589deb00d581b0#cc6b323177354e89b4589deb00d581b0</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Any application (big enough, complex enough to worth the effort of doing it) can do better than the kernel at caching because an application will always know better than the kernel what data&nbsp;it&nbsp;needs. The kernel can at best obeserve the reads and the writes
 and some hints passed through the system calls and do some guesswork based on that. The kernel does not have a time machine to look into the future but the application might just have one.</p>
<p>&nbsp;</p>
</div></blockquote>
<p>So I don't really agree. I think a program can provide enough hints to a kernel to let the kernel do all the real work. IE: Moving commonly read data to a certain part of a file. Again, similar to x86 optimization.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a34d596f78e94dc98ba89deb00d581dc#a34d596f78e94dc98ba89deb00d581dc</link>
		<pubDate>Fri, 22 Jan 2010 20:45:36 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a34d596f78e94dc98ba89deb00d581dc#a34d596f78e94dc98ba89deb00d581dc</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>So I don't really agree. I think a program can provide enough hints to a kernel to let the kernel do all the real work. IE: Moving commonly read data to a certain part of a file. Again, similar to x86 optimization.</p>
</div></blockquote>
<p>Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c02b3a98fc3045d1a2f29deb00d58207#c02b3a98fc3045d1a2f29deb00d58207</link>
		<pubDate>Fri, 22 Jan 2010 20:47:06 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c02b3a98fc3045d1a2f29deb00d58207#c02b3a98fc3045d1a2f29deb00d58207</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</p>
</div></blockquote>
<p>Why not? Isn't that exactly what happens when you invoke a database optimization?
</p>
<p>&nbsp;</p>
<p>Sometimes you have to move gigabytes or terrabytes of data to have optimal performance. Look up the hash table data structure.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/296c2d663e784aee90119deb00d58233#296c2d663e784aee90119deb00d58233</link>
		<pubDate>Fri, 22 Jan 2010 20:49:02 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/296c2d663e784aee90119deb00d58233#296c2d663e784aee90119deb00d58233</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Hints are like, I am going to read sequentially, or I am going to read randomly.&nbsp; It's not like, here's a hint describing this complex internal data structure that you could write a 10000 page book about.&nbsp; If the OS was capable of such a level of heuristic
 it would be super slow, its just optimized for general use.</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0b9b60ba68344b2ba0dc9deb00d5825d#0b9b60ba68344b2ba0dc9deb00d5825d</link>
		<pubDate>Fri, 22 Jan 2010 20:50:34 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0b9b60ba68344b2ba0dc9deb00d5825d#0b9b60ba68344b2ba0dc9deb00d5825d</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Why not? Isn't that exactly what happens when you invoke a database optimization?
</p>
<p>&nbsp;</p>
<p>Sometimes you have to move gigabytes or terrabytes of data to have optimal performance. Look up the hash table data structure.</p>
</div></blockquote>
<p>Bad analogy. A hashtable data structure moves data around because there's nothing better it can do. A database can do many things to get the best performance and it does just that.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/1cd187a903004108b6829deb00d58289#1cd187a903004108b6829deb00d58289</link>
		<pubDate>Fri, 22 Jan 2010 20:53:34 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/1cd187a903004108b6829deb00d58289#1cd187a903004108b6829deb00d58289</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<p>Hints are like, I am going to read sequentially, or I am going to read randomly.&nbsp; It's not like, here's a hint describing this complex internal data structure that you could write a 10000 page book about.&nbsp; If the OS was capable of such a level of heuristic
 it would be super slow, its just optimized for general use.</p>
</div></blockquote>
<p>Listen, once it's in memory (the only thing a DB can actually do on it's own) it's not automagically optimized either. You are going to have to structure your data structures in such a way that they make optimial use of the hardware's cache as well.<strong>
 You can not avoid this.</strong>&nbsp; </p>
<p>&nbsp;</p>
<p>A lot of those really complex data structures (eg: the judy array) are so complicated is because they are designed around being cached. They can not cache themselves, because x86 does not allow this. So they must structure their data to be cached by the
 system implicitly.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/55bddc89fbc74e0d98049deb00d582b6#55bddc89fbc74e0d98049deb00d582b6</link>
		<pubDate>Fri, 22 Jan 2010 20:55:21 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/55bddc89fbc74e0d98049deb00d582b6#55bddc89fbc74e0d98049deb00d582b6</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Listen, once it's in memory (the only thing a DB can actually do on it's own) it's not automagically optimized either. You are going to have to structure your data structures in such a way that they make optimial use of the hardware's cache as well.<strong>
 You can not avoid this.</strong>&nbsp; </p>
<p>&nbsp;</p>
<p>A lot of those really complex data structures (eg: the judy array) are so complicated is because they are designed around being cached. They can not cache themselves, because x86 does not allow this. So they must structure their data to be cached by the
 system implicitly.</p>
</div></blockquote>
<p>Normally if you are running a database you need to ensure the hard drive caching can be disabled.&nbsp; It's not just about performance but you can't guarantee transactions with any caching going on outside the control of the server software.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/29fcf76f8cf84d8d97c79deb00d582e3#29fcf76f8cf84d8d97c79deb00d582e3</link>
		<pubDate>Fri, 22 Jan 2010 20:59:59 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/29fcf76f8cf84d8d97c79deb00d582e3#29fcf76f8cf84d8d97c79deb00d582e3</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Bad analogy. A hashtable data structure moves data around because there's nothing better it can do. A database can do many things to get the best performance and it does just that.</p>
</div></blockquote>
<p>A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/358b975aaaec480ba9fa9deb00d5830f#358b975aaaec480ba9fa9deb00d5830f</link>
		<pubDate>Fri, 22 Jan 2010 21:01:35 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/358b975aaaec480ba9fa9deb00d5830f#358b975aaaec480ba9fa9deb00d5830f</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Normally if you are running a database you need to ensure the hard drive caching can be disabled.&nbsp; It's not just about performance but you can't guarantee transactions with any caching going on outside the control of the server software.</p>
</div></blockquote>
<p>Well I don't know about Windows, but there numerous parameters you can customize in Linux regarding the functionality of the file system or even the CPU scheduler. You can even swap out file systems and CPU schedulers completely, Linux is open source. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p>
<p><br>
That's actually what Google does, they use the O(1) scheduler with a modern kernel, while the default scheduler tends to be CFS (the &quot;Completely Fair Scheduler&quot;). This is on top many other changes designed to make Linux perform really well for their specialized
 task. </p>
<p>&nbsp;</p>
<p>In Android, Google uses CFS. But they might be adopting the Brain F**k Scheduler (BFS), which is reported to be really f**king fast, and yet it's algorithm is so simple that it's existance is a giant brainf**k of an engima. Kind of like some of Quake 3's
 rendering algorithms. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/38c3293d694542599a119deb00d5833e#38c3293d694542599a119deb00d5833e</link>
		<pubDate>Fri, 22 Jan 2010 21:05:48 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/38c3293d694542599a119deb00d5833e#38c3293d694542599a119deb00d5833e</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Well I don't know about Windows, but there numerous parameters you can customize in Linux regarding the functionality of the file system or even the CPU scheduler. You can even swap out file systems and CPU schedulers completely, Linux is open source.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
<p><br>
That's actually what Google does, they use the O(1) scheduler with a modern kernel, while the default scheduler tends to be CFS (the &quot;Completely Fair Scheduler&quot;). This is on top many other changes designed to make Linux perform really well for their specialized
 task. </p>
<p>&nbsp;</p>
<p>In Android, Google uses CFS. But they might be adopting the Brain F**k Scheduler (BFS), which is reported to be really f**king fast, and yet it's algorithm is so simple that it's existance is a giant brainf**k of an engima. Kind of like some of Quake 3's
 rendering algorithms. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>Wasn't IBM working on hybrid drives a few years ago?</p>
<p>&nbsp;</p>
<p>I am pretty far removed from my FileStructures course, but if putting the filesystem in 'memory' would speed things up, perhaps it would do to have the filesystem reside on a gig of SSD memory and still keep the files on actual spindles.</p>
<p>&nbsp;</p>
<p>I've got a pair of 60G SSDs in striped as my OS disk, and let me tell ya, getting the entire OS off of the hard-drive seriously speeds things up.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/057e7159fd2947cfa3aa9deb00d5836d#057e7159fd2947cfa3aa9deb00d5836d</link>
		<pubDate>Fri, 22 Jan 2010 22:39:54 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/057e7159fd2947cfa3aa9deb00d5836d#057e7159fd2947cfa3aa9deb00d5836d</guid>
		<dc:creator>ScanIAm</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/ScanIAm/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Sure, why not. I got 12GB. RAM is dirt cheap anyway. I can't believe there are still people ( specially developers ) with only 2GB of RAM. ...that's just weird, almost perverse. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-4.gif' alt='Tongue Out' /></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/159891bad09743d391609deb00d583c9#159891bad09743d391609deb00d583c9</link>
		<pubDate>Sat, 23 Jan 2010 00:00:00 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/159891bad09743d391609deb00d583c9#159891bad09743d391609deb00d583c9</guid>
		<dc:creator>Turrican</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/turrican/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.</p>
</div></blockquote>
<p>I recall that major enterprise database engines like SQL Server have the ability, and for very high end specific performance&nbsp;,&nbsp;the desirability to store their data on unformatted raw disks. However, the lack of OS support means that this is not a recommended
 except where every ounce of performance is required. Otherwise it is easier and cheeper to throw a couple of Gb ram at the problem.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/afa1241cb0474b708de19deb00d583f5#afa1241cb0474b708de19deb00d583f5</link>
		<pubDate>Sat, 23 Jan 2010 00:24:28 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/afa1241cb0474b708de19deb00d583f5#afa1241cb0474b708de19deb00d583f5</guid>
		<dc:creator>CplCarrot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CplCarrot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible. This makes the code simpler, and this easier to optimize. That's just how it is.</p>
</div></blockquote>
<p><blockquote><div class="quoteText">A modern DB (eg: Drizzle) is going to delegate as much responsibility to the kernel as possible.</div></blockquote></p>
<p>Sorry, but I'm a PhD student who specializes in database engineering, and that statement goes contrary to everything I've been taught.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d1f363d4baf24a7dba439deb00d58421#d1f363d4baf24a7dba439deb00d58421</link>
		<pubDate>Sat, 23 Jan 2010 03:53:25 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d1f363d4baf24a7dba439deb00d58421#d1f363d4baf24a7dba439deb00d58421</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p>Are we talking about DB or OS or DBOS? I am confused.</p>
<p>&nbsp;</p>
<p>Well, just think crazily, but what happen when the HDD has no OS? Like remote storage that is shared with many servers? Remote storage&nbsp;is going thru fiber optic with high bandwidth obviously. What I see is the Live Mesh case. Cache on your computer and you
 don't know what's going on to the actual remote storage, let along the other sync device.&nbsp;And I am using Live Mesh, it is hardly good because synchronization is hard between devices.</p>
<p>&nbsp;</p>
<p>And whats the point to store the FS when you still need to access the DB and get some data from a page or many pages?&nbsp;And if HDD is not spining, how do you know your FS on RAM is&nbsp;synced? And what about RAM for actual data caculations, they are RAM intensive
 on numerous occasions? If RAM runs out, you are going to drop FS on RAM? Just doesn't make any sense to me.</p>
<p>&nbsp;</p>
<p>Anyway, FS on RAM? It sounds really really unsafe to begin with.</p>
<p>&nbsp;</p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a95f042ac13f4926b7c29deb00d5844e#a95f042ac13f4926b7c29deb00d5844e</link>
		<pubDate>Sat, 23 Jan 2010 09:30:58 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a95f042ac13f4926b7c29deb00d5844e#a95f042ac13f4926b7c29deb00d5844e</guid>
		<dc:creator>magicalclick</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/magicalclick/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">magicalclick said:</div><div class="quoteText">
<p>Are we talking about DB or OS or DBOS? I am confused.</p>
<p>&nbsp;</p>
<p>Well, just think crazily, but what happen when the HDD has no OS? Like remote storage that is shared with many servers? Remote storage&nbsp;is going thru fiber optic with high bandwidth obviously. What I see is the Live Mesh case. Cache on your computer and you
 don't know what's going on to the actual remote storage, let along the other sync device.&nbsp;And I am using Live Mesh, it is hardly good because synchronization is hard between devices.</p>
<p>&nbsp;</p>
<p>And whats the point to store the FS when you still need to access the DB and get some data from a page or many pages?&nbsp;And if HDD is not spining, how do you know your FS on RAM is&nbsp;synced? And what about RAM for actual data caculations, they are RAM intensive
 on numerous occasions? If RAM runs out, you are going to drop FS on RAM? Just doesn't make any sense to me.</p>
<p>&nbsp;</p>
<p>Anyway, FS on RAM? It sounds really really unsafe to begin with.</p>
<p>&nbsp;</p>
</div></blockquote>
<p>magicalclick:&nbsp; yeah this has went back and forth and i can see why you are lost....</p>
<p>&nbsp;</p>
<p>the start of this was that the OP was talking about taking the NTFS &quot;metadata&quot; to a memory based system of some kind.</p>
<p>that he seemed to think that the raw data and the NTFS filesystem data could and should be handled differently.</p>
<p>at least that what i think the OP was saying.</p>
<p>&nbsp;</p>
<p>then as the topic went on folks started talking about FS optimazation and how a DBMS might use the FS or might not.</p>
<p>&nbsp;</p>
<p>so the term database has been used here 2 ways one as a normal database for say sql server and as the special data that a file system needs to manage.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7e2ba45fb97941ea99539deb00d5847e#7e2ba45fb97941ea99539deb00d5847e</link>
		<pubDate>Sat, 23 Jan 2010 10:36:38 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7e2ba45fb97941ea99539deb00d5847e#7e2ba45fb97941ea99539deb00d5847e</guid>
		<dc:creator>figuerres</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/figuerres/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>Sorry, but I'm a PhD student who specializes in database engineering, and that statement goes contrary to everything I've been taught.</p>
</div></blockquote>
<p>You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.</p>
<p>&nbsp;</p>
<p>Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.</p>
<p>&nbsp;</p>
<p>There is a long history of people not agreeing with each other when it comes to what a good database design is. Maybe because there is no genuinely right way to do it.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6400a757fa1f443b82379deb00d584ab#6400a757fa1f443b82379deb00d584ab</link>
		<pubDate>Sat, 23 Jan 2010 18:01:33 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6400a757fa1f443b82379deb00d584ab#6400a757fa1f443b82379deb00d584ab</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.</p>
<p>&nbsp;</p>
<p>Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.</p>
<p>&nbsp;</p>
<p>There is a long history of people not agreeing with each other when it comes to what a good database design is. Maybe because there is no genuinely right way to do it.</p>
</div></blockquote>
<p><blockquote><div class="quoteText">You have been tought that Drizzle team is trying to develop a heavy DB? Well PhD student or not, that is patently wrong.</p>
<p>&nbsp;</p>
<p>Or you have been tought that the only good DB is a complicated DB? Well that's wrong too. The only good DB is a relational DB? That's also a load of crap.</div></blockquote></p>
<p>I was challenging your assertion that a modern DB delegates to the kernel as much as possible (in the context of file I/O of this thread). I did not say any of the things you just accused me of saying.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/dd811a7c29f6482491549deb00d584d8#dd811a7c29f6482491549deb00d584d8</link>
		<pubDate>Sun, 24 Jan 2010 03:40:54 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/dd811a7c29f6482491549deb00d584d8#dd811a7c29f6482491549deb00d584d8</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>I was challenging your assertion that a modern DB delegates to the kernel as much as possible (in the context of file I/O of this thread). I did not say any of the things you just accused me of saying.</p>
</div></blockquote>
<p>If a kernel can do something that a DB does, it makes sense to remove that functionality and have the kernel do it. This reduces the amount of code the DB developer has to maintain. I don't see what your problem with this is at all.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/249493517107496bb71e9deb00d58504#249493517107496bb71e9deb00d58504</link>
		<pubDate>Sun, 24 Jan 2010 05:07:01 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/249493517107496bb71e9deb00d58504#249493517107496bb71e9deb00d58504</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>If a kernel can do something that a DB does, it makes sense to remove that functionality and have the kernel do it. This reduces the amount of code the DB developer has to maintain. I don't see what your problem with this is at all.</p>
</div></blockquote>
<p>The problem with that is that the kernel has no idea what your application is trying to do. The kernel has to serve every possible type of application, and will therefore need to do things so that they work well for all of them. When it comes to I/O, particularly
 things like caching and prefetching, there is no one strategy that works best in all scenarios. The kernel therefore uses a strategy that works pretty well for most scenarios, but probably isn't the optimal strategy for any of them.</p>
<p>&nbsp;</p>
<p>The DBMS has the advantage that is knows exactly what it's doing. It knows far more about its data access and usage patterns than the kernel ever will, and can therefore use a caching and prefetching strategy that is far better than what the kernel could
 do. So a truly high performance DBMS will bypass the kernel in this instance and implement their own file I/O, because it knows it can do a better job than the kernel.</p>
<p>&nbsp;</p>
<p>This isn't because the code in the kernel is somehow worse than that of the DBMS, nor does it mean DBMS developers are smarter than kernel developers. The only reason they do this is because the kernel is too generalised to provide the best solution, and
 the DBMS has far more information about what it wants to do and can therefore make much better decisions.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c1fda03005e345219d5c9deb00d58534#c1fda03005e345219d5c9deb00d58534</link>
		<pubDate>Sun, 24 Jan 2010 05:45:24 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c1fda03005e345219d5c9deb00d58534#c1fda03005e345219d5c9deb00d58534</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The problem with that is that the kernel has no idea what your application is trying to do. The kernel has to serve every possible type of application, and will therefore need to do things so that they work well for all of them. When it comes to I/O, particularly
 things like caching and prefetching, there is no one strategy that works best in all scenarios. The kernel therefore uses a strategy that works pretty well for most scenarios, but probably isn't the optimal strategy for any of them.</p>
<p>&nbsp;</p>
<p>The DBMS has the advantage that is knows exactly what it's doing. It knows far more about its data access and usage patterns than the kernel ever will, and can therefore use a caching and prefetching strategy that is far better than what the kernel could
 do. So a truly high performance DBMS will bypass the kernel in this instance and implement their own file I/O, because it knows it can do a better job than the kernel.</p>
<p>&nbsp;</p>
<p>This isn't because the code in the kernel is somehow worse than that of the DBMS, nor does it mean DBMS developers are smarter than kernel developers. The only reason they do this is because the kernel is too generalised to provide the best solution, and
 the DBMS has far more information about what it wants to do and can therefore make much better decisions.</p>
</div></blockquote>
<p>The same arguments have been made in the past regarding cooperative vs preemptive multitasking. I assume you know who won in the end. The thing about any engineering project is there is limited resources. If you choose to optimize one part of your DB (file
 I/O), you probably are missing out somewhere else (eg: CPU cache). As I said before, the CPU cache is not programmable. So if you don't design your data structures with the CPU cache in mind, you are losing a ridiculously important optimization.
</p>
<p>&nbsp;</p>
<p>Another thing you conveniently leave out if that kernel developer and database developer many times work for the same company. Eg: Oracle and IBM are both Linux kernel developers (both interestingly have contributed file systems to the Linux kernel). There
 is contributions to the Linux kernel that were specifically designed around making databases faster. Sometimes this means tweaking the characteristics of the I/O scheduler and filesystem to improve their database performance, and not the other way around.</p>
<p>&nbsp;</p>
<p>Anyway what is interesting about DB performance is depending on what DB vendor you ask, their DB is the fastest. So you can argue about DB performence all day and what approach is better, but MySQL/Oracle/DB2/SQLServer/SQLite are all the fastest DB in existence
<em>anyway</em>. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cb49c67105c0447ebf7f9deb00d58568#cb49c67105c0447ebf7f9deb00d58568</link>
		<pubDate>Sun, 24 Jan 2010 18:22:18 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cb49c67105c0447ebf7f9deb00d58568#cb49c67105c0447ebf7f9deb00d58568</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The same arguments have been made in the past regarding cooperative vs preemptive multitasking. I assume you know who won in the end. The thing about any engineering project is there is limited resources. If you choose to optimize one part of your DB (file
 I/O), you probably are missing out somewhere else (eg: CPU cache). As I said before, the CPU cache is not programmable. So if you don't design your data structures with the CPU cache in mind, you are losing a ridiculously important optimization.
</p>
<p>&nbsp;</p>
<p>Another thing you conveniently leave out if that kernel developer and database developer many times work for the same company. Eg: Oracle and IBM are both Linux kernel developers (both interestingly have contributed file systems to the Linux kernel). There
 is contributions to the Linux kernel that were specifically designed around making databases faster. Sometimes this means tweaking the characteristics of the I/O scheduler and filesystem to improve their database performance, and not the other way around.</p>
<p>&nbsp;</p>
<p>Anyway what is interesting about DB performance is depending on what DB vendor you ask, their DB is the fastest. So you can argue about DB performence all day and what approach is better, but MySQL/Oracle/DB2/SQLServer/SQLite are all the fastest DB in existence
<em>anyway</em>. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p><em><strong>The same arguments have been made in the past regardless cooperative vs preemptive multitasking.</strong></em></p>
<p>&nbsp;</p>
<p>Not really... If your claim is:</p>
<p>&nbsp;</p>
<p>OS file handling -&gt; Generic (good)</p>
<p>DBMS &quot;file&quot; handling -&gt; Specific (unnecessary)</p>
<p>&nbsp;</p>
<p>then Preemptive vs. Cooperative doesn't fall on the same scale. Pre-emptive vs Cooperative fall more on hardware vs software. And I believe you know which is better.</p>
<p>&nbsp;</p>
<p>Look, DBMSes DO have their own internal &quot;file&quot; / page / disk memory management... That's fact. Why? Because that code is more specific to their domain. DBMS loves contiguous blocks of memory... whereas, HDs have 1 to 2 Dimensions (spiral platters) to their
 storage.</p>
<p>&nbsp;</p>
<p>PS. C9, fix the GD posting errors</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/fc6479fdc217457796fb9deb00d5859a#fc6479fdc217457796fb9deb00d5859a</link>
		<pubDate>Sun, 24 Jan 2010 19:47:20 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/fc6479fdc217457796fb9deb00d5859a#fc6479fdc217457796fb9deb00d5859a</guid>
		<dc:creator>Minh</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Minh/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The same arguments have been made in the past regarding cooperative vs preemptive multitasking. I assume you know who won in the end. The thing about any engineering project is there is limited resources. If you choose to optimize one part of your DB (file
 I/O), you probably are missing out somewhere else (eg: CPU cache). As I said before, the CPU cache is not programmable. So if you don't design your data structures with the CPU cache in mind, you are losing a ridiculously important optimization.
</p>
<p>&nbsp;</p>
<p>Another thing you conveniently leave out if that kernel developer and database developer many times work for the same company. Eg: Oracle and IBM are both Linux kernel developers (both interestingly have contributed file systems to the Linux kernel). There
 is contributions to the Linux kernel that were specifically designed around making databases faster. Sometimes this means tweaking the characteristics of the I/O scheduler and filesystem to improve their database performance, and not the other way around.</p>
<p>&nbsp;</p>
<p>Anyway what is interesting about DB performance is depending on what DB vendor you ask, their DB is the fastest. So you can argue about DB performence all day and what approach is better, but MySQL/Oracle/DB2/SQLServer/SQLite are all the fastest DB in existence
<em>anyway</em>. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>Hrm,&nbsp;I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring&nbsp;arguments&nbsp;for things that nobody argued about.</p>
<p>&nbsp;</p>
<p>It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short why it
 can't and Sven provided more details. Instead of answering to that you&nbsp;went on a rant about CPU cache,&nbsp;schedulers, algorithms and whatnot.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/da54937bdf91455aab789deb00d585cf#da54937bdf91455aab789deb00d585cf</link>
		<pubDate>Sun, 24 Jan 2010 20:23:32 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/da54937bdf91455aab789deb00d585cf#da54937bdf91455aab789deb00d585cf</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Hrm,&nbsp;I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring&nbsp;arguments&nbsp;for things that nobody argued about.</p>
<p>&nbsp;</p>
<p>It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short why it
 can't and Sven provided more details. Instead of answering to that you&nbsp;went on a rant about CPU cache,&nbsp;schedulers, algorithms and whatnot.</p>
</div></blockquote>
<p><strong>Hrm,&nbsp;I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring&nbsp;arguments&nbsp;for things that nobody argued about.</strong></p>
<p>&nbsp;</p>
<p>It's all part of the same argument. You'll just have to keep up. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p>
<p>&nbsp;</p>
<p><strong>It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short
 why it can't and Sven provided more details. Instead of answering to that you&nbsp;went on a rant about CPU cache,&nbsp;schedulers, algorithms and whatnot.</strong></p>
<p>&nbsp;</p>
<p>The kernel can do caching better. You can claim that the software knows more about the characteristics of the data structures it uses, but the kernel knows more about the characteristics of the hardware it is running on.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3068886aa42249d9a15c9deb00d585fe#3068886aa42249d9a15c9deb00d585fe</link>
		<pubDate>Sun, 24 Jan 2010 21:12:42 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3068886aa42249d9a15c9deb00d585fe#3068886aa42249d9a15c9deb00d585fe</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Hrm,&nbsp;I think I lost count of how many times you switched the subject in this thread. Not to mention that you bring&nbsp;arguments&nbsp;for things that nobody argued about.</p>
<p>&nbsp;</p>
<p>It's perfectly fine for a database to leave the caching to the filesystem if you want a lightweight codebase or if you don't have the resources to do it. The problem starts when you claim that the kernel can do caching better. I mentioned in short why it
 can't and Sven provided more details. Instead of answering to that you&nbsp;went on a rant about CPU cache,&nbsp;schedulers, algorithms and whatnot.</p>
</div></blockquote>
<p>I also said the same thing, and he started talking about CPU caches for some reason.&nbsp; Anyway this is basic engineering sense, it doesn't even really have anything to do with databases, you can almost always get better performance if you &quot;roll it yourself&quot;.&nbsp;
 Whether or not it is feasible to do this or worth the effort is what you need to decide as an engineer.&nbsp; But when major commercial databases like Microsoft SQL,&nbsp;&nbsp;that have massive budgets and huge teams of engineers, made a decision to roll their own caching
 scheme, maybe it's a good indication you're wrong about this?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/26a95adcf14b49d19ab59deb00d5862e#26a95adcf14b49d19ab59deb00d5862e</link>
		<pubDate>Sun, 24 Jan 2010 21:14:55 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/26a95adcf14b49d19ab59deb00d5862e#26a95adcf14b49d19ab59deb00d5862e</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Minh said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><em><strong>The same arguments have been made in the past regardless cooperative vs preemptive multitasking.</strong></em></p>
<p>&nbsp;</p>
<p>Not really... If your claim is:</p>
<p>&nbsp;</p>
<p>OS file handling -&gt; Generic (good)</p>
<p>DBMS &quot;file&quot; handling -&gt; Specific (unnecessary)</p>
<p>&nbsp;</p>
<p>then Preemptive vs. Cooperative doesn't fall on the same scale. Pre-emptive vs Cooperative fall more on hardware vs software. And I believe you know which is better.</p>
<p>&nbsp;</p>
<p>Look, DBMSes DO have their own internal &quot;file&quot; / page / disk memory management... That's fact. Why? Because that code is more specific to their domain. DBMS loves contiguous blocks of memory... whereas, HDs have 1 to 2 Dimensions (spiral platters) to their
 storage.</p>
<p>&nbsp;</p>
<p>PS. C9, fix the GD posting errors</p>
</div></blockquote>
<p>I don't think preemptive vs. cooperative is a different analogy at all. Cooperative multitasking allows a process to decide how much CPU time it needs. In preemptive multitasking, this is decided for the process (although in some systems like POSIX/Windows,
 a process can suggest how should it be handled to the kernel).</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/53fed0b95a604a15a4729deb00d5865c#53fed0b95a604a15a4729deb00d5865c</link>
		<pubDate>Sun, 24 Jan 2010 21:17:54 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/53fed0b95a604a15a4729deb00d5865c#53fed0b95a604a15a4729deb00d5865c</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Minh said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>I don't think preemptive vs. cooperative is a different analogy at all. Cooperative multitasking allows a process to decide how much CPU time it needs. In preemptive multitasking, this is decided for the process (although in some systems like POSIX/Windows,
 a process can suggest how should it be handled to the kernel).</p>
</div></blockquote>
<p>Cooperative multitasking would probably make more sense on a database server.&nbsp; Normally the entire machine is dedicated to running the database server, so there is really only one process at a time.&nbsp; You would lose the overhead of preemptive multitasking,
 and getting context switched at a bad time.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/adfcd64ee20b445eb0f09deb00d58688#adfcd64ee20b445eb0f09deb00d58688</link>
		<pubDate>Sun, 24 Jan 2010 21:21:04 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/adfcd64ee20b445eb0f09deb00d58688#adfcd64ee20b445eb0f09deb00d58688</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>I also said the same thing, and he started talking about CPU caches for some reason.&nbsp; Anyway this is basic engineering sense, it doesn't even really have anything to do with databases, you can almost always get better performance if you &quot;roll it yourself&quot;.&nbsp;
 Whether or not it is feasible to do this or worth the effort is what you need to decide as an engineer.&nbsp; But when major commercial databases like Microsoft SQL,&nbsp;&nbsp;that have massive budgets and huge teams of engineers, made a decision to roll their own caching
 scheme, maybe it's a good indication you're wrong about this?</p>
</div></blockquote>
<p>The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data. Quite frankly to run efficiently on an x86 processor, it has no other choice. . I don't know how many of you ever did assembly programming.
 Although you can create a memory caching algorithm, you can't create your own processor caching algorithm on x86. That is hard-coded on the CPU by Intel/AMD. Performance degrades considerably on x86 if there is are many cache misses, which is the cache equivalent
 of a page fault. So your data structures must consider this in order to efficiently run on the architecture.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/08b8221c8c704f9e86829deb00d586b7#08b8221c8c704f9e86829deb00d586b7</link>
		<pubDate>Sun, 24 Jan 2010 21:21:16 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/08b8221c8c704f9e86829deb00d586b7#08b8221c8c704f9e86829deb00d586b7</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Cooperative multitasking would probably make more sense on a database server.&nbsp; Normally the entire machine is dedicated to running the database server, so there is really only one process at a time.&nbsp; You would lose the overhead of preemptive multitasking,
 and getting context switched at a bad time.</p>
</div></blockquote>
<p>Well with that argument, a database server should BE a kernel right? <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-4.gif' alt='Tongue Out' /> Since it's going to do everything a kernel does! That is clearly the way to get the
<em>optimal performance</em>. </p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d8b284d2cc014961966e9deb00d586e3#d8b284d2cc014961966e9deb00d586e3</link>
		<pubDate>Sun, 24 Jan 2010 21:23:30 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d8b284d2cc014961966e9deb00d586e3#d8b284d2cc014961966e9deb00d586e3</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data. Quite frankly to run efficiently on an x86 processor, it has no other choice. . I don't know how many of you ever did assembly programming.
 Although you can create a memory caching algorithm, you can't create your own processor caching algorithm on x86. That is hard-coded on the CPU by Intel/AMD. Performance degrades considerably on x86 if there is are many cache misses, which is the cache equivalent
 of a page fault. So your data structures must consider this in order to efficiently run on the architecture.</p>
</div></blockquote>
<p><blockquote><div class="quoteText">The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data</div></blockquote>.</p>
<p>&nbsp;</p>
<p>Are you insane?&nbsp; I've said that I don't think the database should move data around to keep the kernel happy. That's not the same thing as &quot;localizing&quot; important data so it fits the CPU cache (or the harddrive).</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">The kernel can do caching better. You can claim that the software knows more about the characteristics of the data structures it uses, but the kernel knows more about the characteristics of the hardware it is running on.</div></blockquote></p>
<p>&nbsp;</p>
<p>Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3379a45e63cd4f168b0a9deb00d58713#3379a45e63cd4f168b0a9deb00d58713</link>
		<pubDate>Sun, 24 Jan 2010 21:29:06 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3379a45e63cd4f168b0a9deb00d58713#3379a45e63cd4f168b0a9deb00d58713</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?</p>
</div></blockquote>
<p><strong>Are you insane?&nbsp; I've said that I don't think the database should move data around to keep the kernel happy. That's not the same thing as &quot;localizing&quot; important data so it fits the CPU cache (or the harddrive).</strong></p>
<p>&nbsp;</p>
<p>And why not? The kind of caching algorithm the Linux kernel uses is not all that different from what Intel microcode uses. You kill two birds with one stone. PS: Ad homiem is evidence of a losing argument.</p>
<p>&nbsp;</p>
<p><strong>Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?</strong></p>
<p>&nbsp;</p>
<p>Well a kernel can safely assume that important/commonly used data in memory will be near each other<strong>,
</strong>because that's how optimizing compilers and high performence databases tend to structure their data. True story.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/102a19a273cb46d3a0019deb00d58742#102a19a273cb46d3a0019deb00d58742</link>
		<pubDate>Sun, 24 Jan 2010 21:34:12 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/102a19a273cb46d3a0019deb00d58742#102a19a273cb46d3a0019deb00d58742</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Are you insane?&nbsp; I've said that I don't think the database should move data around to keep the kernel happy. That's not the same thing as &quot;localizing&quot; important data so it fits the CPU cache (or the harddrive).</strong></p>
<p>&nbsp;</p>
<p>And why not? The kind of caching algorithm the Linux kernel uses is not all that different from what Intel microcode uses. You kill two birds with one stone. PS: Ad homiem is evidence of a losing argument.</p>
<p>&nbsp;</p>
<p><strong>Except the characteristics of the hardware are likely to be simpler than the characteristics of the data structures and access patterns. Which set of characteristics do you think will be easier to communicate to the other party?</strong></p>
<p>&nbsp;</p>
<p>Well a kernel can safely assume that important/commonly used data in memory will be near each other<strong>,
</strong>because that's how optimizing compilers and high performence databases tend to structure their data. True story.</p>
<p>&nbsp;</p>
</div></blockquote>
<p><blockquote><div class="quoteText">Ad homiem is evidence of a losing argument.</div></blockquote></p>
<p>&nbsp;</p>
<p>Ha ha, look who's talking. You're twisting my words yet you claim &quot;losing argument&quot;.</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">Well a kernel can safely assume that important/commonly used data in memory will near each other<strong>,
</strong>because that's how optimizing compilers and high performence databases tend to structure their data. True story.</div></blockquote></p>
<p>&nbsp;</p>
<p>And how exactly does this relate to what I said? Or more generally, how exactly does this relate to caching?</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d48eb4d9e3d546da85aa9deb00d58772#d48eb4d9e3d546da85aa9deb00d58772</link>
		<pubDate>Sun, 24 Jan 2010 21:48:15 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d48eb4d9e3d546da85aa9deb00d58772#d48eb4d9e3d546da85aa9deb00d58772</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>And how exactly does this relate to what I said? Or more generally, how exactly does this relate to caching?</p>
<p>&nbsp;</p>
</div></blockquote>
<p><strong>Ha ha, look who's talking. You're twisting my words yet you claim &quot;losing argument&quot;.</strong></p>
<p>&nbsp;</p>
<p>Well I see you want to be on an equal footing with me on this. So I'll call you an insane person also. Insane person.
</p>
<p>&nbsp;</p>
<p><strong>And how exactly does this relate to what I said? Or more generally, how exactly does this relate to caching?</strong></p>
<p>&nbsp;</p>
<p>This is the original point I am addressing:</p>
<p><strong>Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</strong></p>
<p>&nbsp;</p>
<p>Okay I'm going to say this again. Please no more &quot;NUH UHs&quot; on this. I'm quite obviously right.</p>
<p>&nbsp;</p>
<p>You can not write any explicit caching code to efficently cache things on an x86 processor. This algorithm is hardcoded into the control unit of the actual processor. The only way to get important data into cache (which is of course, what you want) is to
 &quot;suggest&quot; it to the proccessor by your data structures. This means working with the quirks of the branch predictor, and also localizing commonly accessed data.</p>
<p>&nbsp;</p>
<p>Of course this is to keep the CPU happy and you are probably going to be like &quot;well I didn't say CPU, so I am somehow correct, and you are wrong&quot;. Well buddy, you have to keep the CPU happy. A unhappy CPU is a cache missing CPU, and you MUST avoid this to
 have any reasonable performance. </p>
<p>&nbsp;</p>
<p>And interestingly, by keeping the CPU happy, you also tend to make the kernel happy. Because the kernel isn't using some magical caching algorithm that Intel doesn't know about.
</p>
<p>&nbsp;</p>
<p>Kapeesh? </p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e61fb8a8a304435db6f09deb00d587a2#e61fb8a8a304435db6f09deb00d587a2</link>
		<pubDate>Sun, 24 Jan 2010 21:59:38 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e61fb8a8a304435db6f09deb00d587a2#e61fb8a8a304435db6f09deb00d587a2</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Ha ha, look who's talking. You're twisting my words yet you claim &quot;losing argument&quot;.</strong></p>
<p>&nbsp;</p>
<p>Well I see you want to be on an equal footing with me on this. So I'll call you an insane person also. Insane person.
</p>
<p>&nbsp;</p>
<p><strong>And how exactly does this relate to what I said? Or more generally, how exactly does this relate to caching?</strong></p>
<p>&nbsp;</p>
<p>This is the original point I am addressing:</p>
<p><strong>Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</strong></p>
<p>&nbsp;</p>
<p>Okay I'm going to say this again. Please no more &quot;NUH UHs&quot; on this. I'm quite obviously right.</p>
<p>&nbsp;</p>
<p>You can not write any explicit caching code to efficently cache things on an x86 processor. This algorithm is hardcoded into the control unit of the actual processor. The only way to get important data into cache (which is of course, what you want) is to
 &quot;suggest&quot; it to the proccessor by your data structures. This means working with the quirks of the branch predictor, and also localizing commonly accessed data.</p>
<p>&nbsp;</p>
<p>Of course this is to keep the CPU happy and you are probably going to be like &quot;well I didn't say CPU, so I am somehow correct, and you are wrong&quot;. Well buddy, you have to keep the CPU happy. A unhappy CPU is a cache missing CPU, and you MUST avoid this to
 have any reasonable performance. </p>
<p>&nbsp;</p>
<p>And interestingly, by keeping the CPU happy, you also tend to make the kernel happy. Because the kernel isn't using some magical caching algorithm that Intel doesn't know about.
</p>
<p>&nbsp;</p>
<p>Kapeesh? </p>
</div></blockquote>
<p><blockquote><div class="quoteText">I'm quite obviously right</div></blockquote></p>
<p>&nbsp;</p>
<p>Yeah, you're right... about a completly different and unrelated problem. What problem? Ah, it was branch prediction. Ah no, sorry, that was about CPU cache. Ah no, wrong again it was about optimizing compilers. Oops, I've missed again. No, I bet it was about
 42.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/936f87a24ecf4f16bf149deb00d587d3#936f87a24ecf4f16bf149deb00d587d3</link>
		<pubDate>Sun, 24 Jan 2010 22:17:45 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/936f87a24ecf4f16bf149deb00d587d3#936f87a24ecf4f16bf149deb00d587d3</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>Yeah, you're right... about a completly different and unrelated problem. What problem? Ah, it was branch prediction. Ah no, sorry, that was about CPU cache. Ah no, wrong again it was about optimizing compilers. Oops, I've missed again. No, I bet it was about
 42.</p>
</div></blockquote>
<p>A system memory cache is a just a much slower and much bigger CPU cache.</p>
<p>&nbsp;</p>
<p>In fact this is how the CPU sees the world:</p>
<p>&nbsp;</p>
<p>Page fault: Geographical time</p>
<p>Cache miss: Snail time</p>
<p>In cache: Ferrari Enzo</p>
<p>&nbsp;</p>
<p>You want your CPU to be riding that Ferrari as much as possible. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /> That means NO DIRECT MEMORY ACCESS IF POSSIBLE. A big optimization thing (at least big as autovectorization) is to try to keep your working data on the cache, and never make the CPU explicitly
 access a piece of memory. Does that sound familiar? Isn't that <em>exactly</em> what you want to do with memory caching in general?</p>
<p>&nbsp;</p>
<p>Oh wait, I can even make a fill in the blank:</p>
<p>The point of X caching is to keep as much working data in a fast memory store (in this case: Y) as possible.</p>
<p>&nbsp;</p>
<p>I'm going to guess the reply: &quot;irrelevant&quot;. LOLWUT?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6257fe2abc90483fab4f9deb00d58802#6257fe2abc90483fab4f9deb00d58802</link>
		<pubDate>Sun, 24 Jan 2010 22:18:33 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6257fe2abc90483fab4f9deb00d58802#6257fe2abc90483fab4f9deb00d58802</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>A system memory cache is a just a much slower and much bigger CPU cache.</p>
<p>&nbsp;</p>
<p>In fact this is how the CPU sees the world:</p>
<p>&nbsp;</p>
<p>Page fault: Geographical time</p>
<p>Cache miss: Snail time</p>
<p>In cache: Ferrari Enzo</p>
<p>&nbsp;</p>
<p>You want your CPU to be riding that Ferrari as much as possible. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> That means NO DIRECT MEMORY ACCESS IF POSSIBLE. A big optimization thing (at least big as autovectorization)
 is to try to keep your working data on the cache, and never make the CPU explicitly access a piece of memory. Does that sound familiar? Isn't that
<em>exactly</em> what you want to do with memory caching in general?</p>
<p>&nbsp;</p>
<p>Oh wait, I can even make a fill in the blank:</p>
<p>The point of X caching is to keep as much working data in a fast memory store (in this case: Y) as possible.</p>
<p>&nbsp;</p>
<p>I'm going to guess the reply: &quot;irrelevant&quot;. LOLWUT?</p>
</div></blockquote>
<p>You have to write your database caching algorithms with the kernel and CPU/hardware behaviour in mind. You certainly don't just &quot;leave it to the kernel&quot; unless you're expecting performance not to be important. That's true of any bit of coding.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cba1464b10d5427190f19deb00d58831#cba1464b10d5427190f19deb00d58831</link>
		<pubDate>Sun, 24 Jan 2010 22:25:37 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cba1464b10d5427190f19deb00d58831#cba1464b10d5427190f19deb00d58831</guid>
		<dc:creator>AndyC</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/AndyC/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">AndyC said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>You have to write your database caching algorithms with the kernel and CPU/hardware behaviour in mind. You certainly don't just &quot;leave it to the kernel&quot; unless you're expecting performance not to be important. That's true of any bit of coding.</p>
</div></blockquote>
<p>You want to leave &quot;as much as possible&quot; to the kernel though. Notice that's exactly what I said. Heh heh heh.</p>
<p>&nbsp;</p>
<p>You guys are a riot.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/389b6e7e76584aaa9d339deb00d5885c#389b6e7e76584aaa9d339deb00d5885c</link>
		<pubDate>Sun, 24 Jan 2010 22:28:19 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/389b6e7e76584aaa9d339deb00d5885c#389b6e7e76584aaa9d339deb00d5885c</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>A system memory cache is a just a much slower and much bigger CPU cache.</p>
<p>&nbsp;</p>
<p>In fact this is how the CPU sees the world:</p>
<p>&nbsp;</p>
<p>Page fault: Geographical time</p>
<p>Cache miss: Snail time</p>
<p>In cache: Ferrari Enzo</p>
<p>&nbsp;</p>
<p>You want your CPU to be riding that Ferrari as much as possible. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> That means NO DIRECT MEMORY ACCESS IF POSSIBLE. A big optimization thing (at least big as autovectorization)
 is to try to keep your working data on the cache, and never make the CPU explicitly access a piece of memory. Does that sound familiar? Isn't that
<em>exactly</em> what you want to do with memory caching in general?</p>
<p>&nbsp;</p>
<p>Oh wait, I can even make a fill in the blank:</p>
<p>The point of X caching is to keep as much working data in a fast memory store (in this case: Y) as possible.</p>
<p>&nbsp;</p>
<p>I'm going to guess the reply: &quot;irrelevant&quot;. LOLWUT?</p>
</div></blockquote>
<p>Feel free to continue rambling. I think I'll go drink my pan galactic gargle blaster now, thank you.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d1df7326617c44c0917b9deb00d5888a#d1df7326617c44c0917b9deb00d5888a</link>
		<pubDate>Sun, 24 Jan 2010 22:30:13 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d1df7326617c44c0917b9deb00d5888a#d1df7326617c44c0917b9deb00d5888a</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Feel free to continue rambling. I think I'll go drink my pan galactic gargle blaster now, thank you.</p>
</div></blockquote>
<p>You just don't like admitting you are wrong huh? I noticed that in Tech Off too. You have to try to &quot;correct&quot; everyone. You can't correct what is already correct though, buddy. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/b1a4ee5a932449b0bcd59deb00d588b6#b1a4ee5a932449b0bcd59deb00d588b6</link>
		<pubDate>Sun, 24 Jan 2010 22:30:57 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/b1a4ee5a932449b0bcd59deb00d588b6#b1a4ee5a932449b0bcd59deb00d588b6</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Ha ha, look who's talking. You're twisting my words yet you claim &quot;losing argument&quot;.</strong></p>
<p>&nbsp;</p>
<p>Well I see you want to be on an equal footing with me on this. So I'll call you an insane person also. Insane person.
</p>
<p>&nbsp;</p>
<p><strong>And how exactly does this relate to what I said? Or more generally, how exactly does this relate to caching?</strong></p>
<p>&nbsp;</p>
<p>This is the original point I am addressing:</p>
<p><strong>Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</strong></p>
<p>&nbsp;</p>
<p>Okay I'm going to say this again. Please no more &quot;NUH UHs&quot; on this. I'm quite obviously right.</p>
<p>&nbsp;</p>
<p>You can not write any explicit caching code to efficently cache things on an x86 processor. This algorithm is hardcoded into the control unit of the actual processor. The only way to get important data into cache (which is of course, what you want) is to
 &quot;suggest&quot; it to the proccessor by your data structures. This means working with the quirks of the branch predictor, and also localizing commonly accessed data.</p>
<p>&nbsp;</p>
<p>Of course this is to keep the CPU happy and you are probably going to be like &quot;well I didn't say CPU, so I am somehow correct, and you are wrong&quot;. Well buddy, you have to keep the CPU happy. A unhappy CPU is a cache missing CPU, and you MUST avoid this to
 have any reasonable performance. </p>
<p>&nbsp;</p>
<p>And interestingly, by keeping the CPU happy, you also tend to make the kernel happy. Because the kernel isn't using some magical caching algorithm that Intel doesn't know about.
</p>
<p>&nbsp;</p>
<p>Kapeesh? </p>
</div></blockquote>
<p>See --- this long post by Bass right here.</p>
<p>&nbsp;</p>
<p>This is why I want Channel 9 forums to switch to something with a little bit more maturity like SMF or phpBB or something. The delays in bringing up posts, the times when you can't get to page 2 of a thread, and the over-AJAXification of the current Channel
 9 forums interrupts and distracts from a steady stream of posts like this one which are both highly entertaining, yet educational.&nbsp;</p>
<p>&nbsp;</p>
<p>If I had succumbed to the urge to just not visit the site a few minute ago (an urge fueled mainly by the usability of these forums), I would not have read this post.</p>
<p>&nbsp;</p>
<p>Thanks.</p>
<p>&nbsp;</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0342567caba842ea94389deb00d588ea#0342567caba842ea94389deb00d588ea</link>
		<pubDate>Mon, 25 Jan 2010 01:08:53 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0342567caba842ea94389deb00d588ea#0342567caba842ea94389deb00d588ea</guid>
		<dc:creator>fknight</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/fknight/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>A system memory cache is a just a much slower and much bigger CPU cache.</p>
<p>&nbsp;</p>
<p>In fact this is how the CPU sees the world:</p>
<p>&nbsp;</p>
<p>Page fault: Geographical time</p>
<p>Cache miss: Snail time</p>
<p>In cache: Ferrari Enzo</p>
<p>&nbsp;</p>
<p>You want your CPU to be riding that Ferrari as much as possible. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> That means NO DIRECT MEMORY ACCESS IF POSSIBLE. A big optimization thing (at least big as autovectorization)
 is to try to keep your working data on the cache, and never make the CPU explicitly access a piece of memory. Does that sound familiar? Isn't that
<em>exactly</em> what you want to do with memory caching in general?</p>
<p>&nbsp;</p>
<p>Oh wait, I can even make a fill in the blank:</p>
<p>The point of X caching is to keep as much working data in a fast memory store (in this case: Y) as possible.</p>
<p>&nbsp;</p>
<p>I'm going to guess the reply: &quot;irrelevant&quot;. LOLWUT?</p>
</div></blockquote>
<p>So you have information cached in memory. Unfortunately, you can't have everything cached in memory, because you just don't have that much memory. So at some point, the DBMS is going to have to read something from disk, and, most importantly, something else
 is going to have to be removed from the cache to make room from the new information. Deciding what piece of information to remove from the cache when space runs out is probably the hardest and most&nbsp;performance critical&nbsp;part of cache algorithms.</p>
<p>&nbsp;</p>
<p>Now ask yourself: who has more information by which to decide what should be dropped from the cache? The OS kernel, or the DBMS itself?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2e42c2ec5ed1487181b19deb00d58919#2e42c2ec5ed1487181b19deb00d58919</link>
		<pubDate>Mon, 25 Jan 2010 03:50:36 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2e42c2ec5ed1487181b19deb00d58919#2e42c2ec5ed1487181b19deb00d58919</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>So you have information cached in memory. Unfortunately, you can't have everything cached in memory, because you just don't have that much memory. So at some point, the DBMS is going to have to read something from disk, and, most importantly, something else
 is going to have to be removed from the cache to make room from the new information. Deciding what piece of information to remove from the cache when space runs out is probably the hardest and most&nbsp;performance critical&nbsp;part of cache algorithms.</p>
<p>&nbsp;</p>
<p>Now ask yourself: who has more information by which to decide what should be dropped from the cache? The OS kernel, or the DBMS itself?</p>
</div></blockquote>
<p>The OS kernel has a lot of information here.</p>
<p><br>
(1) It can figure out where the most commonly read information is, and store those pages in memory. I know this may seem silly but the most commonly read information is wait for it... exactly what you
<em>want </em>to be in memory. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /> Thus the kernel does a pretty good job of finding exactly what needs to be in memory. The DB's data structures MUST be written to be automatically cached
<em>anyway</em>. You can not use unpredictably spread out data structures (see why heap sort performs poorly on x86), you are quite limited in how you structure your data for performance due to the CPU cache. But seriously, I in all honestly can not see what
 a DB's domain specific caching algorithm could do better. Since you are the so called DB expert perhaps you could enlighten me?</p>
<p>&nbsp;</p>
<p>(2) It has access to a lot of information about the persistent store (eg: the position of the head). So it knows exactly when it can pull some piece of data off the disk (often measurable in nanoseconds), while at at best the DB could query the kernel for
 this information (zomg cache miss / context switch), but more likely simply guesses and then has blocking stuff all nicely queued for unpredictable times on the kernel's I/O scheduler.</p>
<p>&nbsp;</p>
<p>(3) It has access to an array of modern caching technologies like DMA. Remember what I said about cache misses and all that? Well you are passing data in and out of the CPU like an idiot and that's BAD BAD BAD for performance. Unfortunately this is the only
 realistic way to implement a user-mode file cache is to pass data in and out of the CPU like an idiot. You really want to do that? Be my guest. I'll take DMA over that any day.</p>
<p>&nbsp;</p>
<p>At best, if you want to implement a cache in a DB you should be subservient to the kernel, passing it the right information so it can do it's job via it's sheer awesome kernel powers. As I said before, there is ways to ask the kernel to cache something for
 you and there are ways to structure your data that is conductive to caching. But one thing you should never do the kernel's job for it, because you will never do it as well.
</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c10175f328d54d25be3f9deb00d589b0#c10175f328d54d25be3f9deb00d589b0</link>
		<pubDate>Mon, 25 Jan 2010 04:27:57 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c10175f328d54d25be3f9deb00d589b0#c10175f328d54d25be3f9deb00d589b0</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">fknight said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>See --- this long post by Bass right here.</p>
<p>&nbsp;</p>
<p>This is why I want Channel 9 forums to switch to something with a little bit more maturity like SMF or phpBB or something. The delays in bringing up posts, the times when you can't get to page 2 of a thread, and the over-AJAXification of the current Channel
 9 forums interrupts and distracts from a steady stream of posts like this one which are both highly entertaining, yet educational.&nbsp;</p>
<p>&nbsp;</p>
<p>If I had succumbed to the urge to just not visit the site a few minute ago (an urge fueled mainly by the usability of these forums), I would not have read this post.</p>
<p>&nbsp;</p>
<p>Thanks.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
</div></blockquote>
<p><strong>This is why I want Channel 9 forums to switch to something with a little bit more maturity like SMF or phpBB</strong></p>
<p>&nbsp;</p>
<p>An open-source php BBS hosting Channel-9 ...somehow, I don’t think so.</p>
<p>&nbsp;</p>
<p>In the “old days” I would have suggested Community Server, but these days, I can’t even figure out what the hell Telligent are trying to sell, let alone figure out the status of the product... it seems to be a very confused company with an even more confusing
 website.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2fa845213aaf4d97aefc9deb00d589e9#2fa845213aaf4d97aefc9deb00d589e9</link>
		<pubDate>Mon, 25 Jan 2010 04:42:11 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2fa845213aaf4d97aefc9deb00d589e9#2fa845213aaf4d97aefc9deb00d589e9</guid>
		<dc:creator>Elmer</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/elmer/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">fknight said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>See --- this long post by Bass right here.</p>
<p>&nbsp;</p>
<p>This is why I want Channel 9 forums to switch to something with a little bit more maturity like SMF or phpBB or something. The delays in bringing up posts, the times when you can't get to page 2 of a thread, and the over-AJAXification of the current Channel
 9 forums interrupts and distracts from a steady stream of posts like this one which are both highly entertaining, yet educational.&nbsp;</p>
<p>&nbsp;</p>
<p>If I had succumbed to the urge to just not visit the site a few minute ago (an urge fueled mainly by the usability of these forums), I would not have read this post.</p>
<p>&nbsp;</p>
<p>Thanks.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
</div></blockquote>
<p>Thank you for the kind words. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/067fa33720994543b3e99deb00d58a15#067fa33720994543b3e99deb00d58a15</link>
		<pubDate>Mon, 25 Jan 2010 04:49:02 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/067fa33720994543b3e99deb00d58a15#067fa33720994543b3e99deb00d58a15</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The OS kernel has a lot of information here.</p>
<p><br>
(1) It can figure out where the most commonly read information is, and store those pages in memory. I know this may seem silly but the most commonly read information is wait for it... exactly what you
<em>want </em>to be in memory. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> Thus the kernel does a pretty good job of finding exactly what needs to be in memory. The DB's data structures MUST be written to
 be automatically cached <em>anyway</em>. You can not use unpredictably spread out data structures (see why heap sort performs poorly on x86), you are quite limited in how you structure your data for performance due to the CPU cache. But seriously, I in all
 honestly can not see what a DB's domain specific caching algorithm could do better. Since you are the so called DB expert perhaps you could enlighten me?</p>
<p>&nbsp;</p>
<p>(2) It has access to a lot of information about the persistent store (eg: the position of the head). So it knows exactly when it can pull some piece of data off the disk (often measurable in nanoseconds), while at at best the DB could query the kernel for
 this information (zomg cache miss / context switch), but more likely simply guesses and then has blocking stuff all nicely queued for unpredictable times on the kernel's I/O scheduler.</p>
<p>&nbsp;</p>
<p>(3) It has access to an array of modern caching technologies like DMA. Remember what I said about cache misses and all that? Well you are passing data in and out of the CPU like an idiot and that's BAD BAD BAD for performance. Unfortunately this is the only
 realistic way to implement a user-mode file cache is to pass data in and out of the CPU like an idiot. You really want to do that? Be my guest. I'll take DMA over that any day.</p>
<p>&nbsp;</p>
<p>At best, if you want to implement a cache in a DB you should be subservient to the kernel, passing it the right information so it can do it's job via it's sheer awesome kernel powers. As I said before, there is ways to ask the kernel to cache something for
 you and there are ways to structure your data that is conductive to caching. But one thing you should never do the kernel's job for it, because you will never do it as well.
</p>
</div></blockquote>
<p><blockquote><div class="quoteText">It can figure out where the most commonly read information is, and store those pages in memory.</div></blockquote></p>
<p>But that just plainly isn't true. The kernel can figure out only what was the most common information in the past, and then hope that this will still be true in the future. But there are many access patterns where this simply isn't true. The kernel cannot
 know if an application is going to have an access pattern for which this strategy doesn't work. Only the application can know that. Hence, the application can do better.</p>
<p>&nbsp;</p>
<p>And CPU caching isn't relevant to this discussion for&nbsp;several reasons: 1. The timing gap between memory and disk is greater than that between CPU cache and memory. 2. CPU cache is a few MB at best, file cache is typically multiple GB. 3. The kernel does
 not&nbsp;manage the CPU cache, the CPU does that.</p>
<p>&nbsp;</p>
<p>And the simple fact is that SQL Server and Oracle and other high-performance DB systems do in fact do their own caching. Are you saying the people who develop those databases are all wrong?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/8de2942371724cf887ee9deb00d58a4c#8de2942371724cf887ee9deb00d58a4c</link>
		<pubDate>Mon, 25 Jan 2010 05:17:30 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/8de2942371724cf887ee9deb00d58a4c#8de2942371724cf887ee9deb00d58a4c</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">figuerres said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">magicalclick said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>magicalclick:&nbsp; yeah this has went back and forth and i can see why you are lost....</p>
<p>&nbsp;</p>
<p>the start of this was that the OP was talking about taking the NTFS &quot;metadata&quot; to a memory based system of some kind.</p>
<p>that he seemed to think that the raw data and the NTFS filesystem data could and should be handled differently.</p>
<p>at least that what i think the OP was saying.</p>
<p>&nbsp;</p>
<p>then as the topic went on folks started talking about FS optimazation and how a DBMS might use the FS or might not.</p>
<p>&nbsp;</p>
<p>so the term database has been used here 2 ways one as a normal database for say sql server and as the special data that a file system needs to manage.</p>
</div></blockquote>
<p>Figuerres, Thanks for clearing that up.t</p>
<p>&nbsp;</p>
<p>To TP: it would be cool if you can make a prototype and demonstrate how good it is.</p>
<p>To Bass. It is called NTFS. I guess you need a new stuff called DBFS.</p>
<p>&nbsp;</p>
<p>For FS, I think I will list out some of the things we want in the future:</p>
<p>1) DB in mind.</p>
<p>2) Faster directory lookup to reduce lattency. Ideally directory lookup in RAM.</p>
<p>3) Multi-user support. (need good sync to all users when directory lookup is in RAM)</p>
<p>4) Security. No external driver, or FS, or whatever can access/modify its content without the FS.</p>
<p>&nbsp;</p>
<p>I am not really keen on this topic, so please don't laugh, hehe.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4ce2aa44431e4e0f965b9deb00d58a7b#4ce2aa44431e4e0f965b9deb00d58a7b</link>
		<pubDate>Mon, 25 Jan 2010 05:36:29 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4ce2aa44431e4e0f965b9deb00d58a7b#4ce2aa44431e4e0f965b9deb00d58a7b</guid>
		<dc:creator>magicalclick</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/magicalclick/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">magicalclick said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">figuerres said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Figuerres, Thanks for clearing that up.t</p>
<p>&nbsp;</p>
<p>To TP: it would be cool if you can make a prototype and demonstrate how good it is.</p>
<p>To Bass. It is called NTFS. I guess you need a new stuff called DBFS.</p>
<p>&nbsp;</p>
<p>For FS, I think I will list out some of the things we want in the future:</p>
<p>1) DB in mind.</p>
<p>2) Faster directory lookup to reduce lattency. Ideally directory lookup in RAM.</p>
<p>3) Multi-user support. (need good sync to all users when directory lookup is in RAM)</p>
<p>4) Security. No external driver, or FS, or whatever can access/modify its content without the FS.</p>
<p>&nbsp;</p>
<p>I am not really keen on this topic, so please don't laugh, hehe.</p>
<p>&nbsp;</p>
</div></blockquote>
<p>1) Depends what you mean by DB. You can consider NTFS as being a domain specific DB. It's not relational, it doesn't process SQL queries but it stores and retrieves data, a lot of data. Incidentaly NTFS uses B-Trees which are also commonly used in databases.</p>
<p>2) Already done. NTFS (and probably most other filesystems) caches file metdata in RAM. Go enumerate a directory containing a large number of files. You'll likely see a ton of disk activity. Do that again. No disk activity this time.</p>
<p>3) Already done. File metadata caching occurs in kernel, not in the application. This means that there's only one instance of that data so there's no need for some sort of &quot;good sync&quot;.</p>
<p>4) Already done. Unless you're an administrator you cannot access the disk directly, you must go through the filesystem. Also if you're not an administrator you cannot load drivers. In general this has less to do with the filesystem, it's a generic kernel
 problem.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ae8a903988ff416fb97a9deb00d58aac#ae8a903988ff416fb97a9deb00d58aac</link>
		<pubDate>Mon, 25 Jan 2010 09:58:38 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ae8a903988ff416fb97a9deb00d58aac#ae8a903988ff416fb97a9deb00d58aac</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The OS kernel has a lot of information here.</p>
<p><br>
(1) It can figure out where the most commonly read information is, and store those pages in memory. I know this may seem silly but the most commonly read information is wait for it... exactly what you
<em>want </em>to be in memory. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> Thus the kernel does a pretty good job of finding exactly what needs to be in memory. The DB's data structures MUST be written to
 be automatically cached <em>anyway</em>. You can not use unpredictably spread out data structures (see why heap sort performs poorly on x86), you are quite limited in how you structure your data for performance due to the CPU cache. But seriously, I in all
 honestly can not see what a DB's domain specific caching algorithm could do better. Since you are the so called DB expert perhaps you could enlighten me?</p>
<p>&nbsp;</p>
<p>(2) It has access to a lot of information about the persistent store (eg: the position of the head). So it knows exactly when it can pull some piece of data off the disk (often measurable in nanoseconds), while at at best the DB could query the kernel for
 this information (zomg cache miss / context switch), but more likely simply guesses and then has blocking stuff all nicely queued for unpredictable times on the kernel's I/O scheduler.</p>
<p>&nbsp;</p>
<p>(3) It has access to an array of modern caching technologies like DMA. Remember what I said about cache misses and all that? Well you are passing data in and out of the CPU like an idiot and that's BAD BAD BAD for performance. Unfortunately this is the only
 realistic way to implement a user-mode file cache is to pass data in and out of the CPU like an idiot. You really want to do that? Be my guest. I'll take DMA over that any day.</p>
<p>&nbsp;</p>
<p>At best, if you want to implement a cache in a DB you should be subservient to the kernel, passing it the right information so it can do it's job via it's sheer awesome kernel powers. As I said before, there is ways to ask the kernel to cache something for
 you and there are ways to structure your data that is conductive to caching. But one thing you should never do the kernel's job for it, because you will never do it as well.
</p>
</div></blockquote>
<p>Eh, more crap. You're calling a 30 years old (or maybe even older) technology like DMA a modern thing. Sure, 30 years ago there was no CPU cache but it's not like CPU cache is extremly new either.
</p>
<p>&nbsp;</p>
<p>And you have no clue about how non cached file I/O works and you claim that it's not possible to implement user mode caching because of lack of DMA?
</p>
<p>&nbsp;</p>
<p>And you call people idiots for moving data in and out of the CPU cache. Do you perhaps have a wonder CPU with 1 terrabyte of cache so people can stop acting like idiots while moving their database in and out of the CPU cache?</p>
<p>&nbsp;</p>
<p>You might as well throw in the kitchen sink. It will probably weight more than flawed arguments.</p>
<p>&nbsp;</p>
<p>PS:</p>
<p>Technical notes for people unfamiliar with x86 hardware and file caching (at least on Windows):</p>
<p>&nbsp;</p>
<p>PCs included DMA since the early days. Back then it was used to transfer data from the floppy disk to memory and for a couple of other things. AFAIR it was even possible to progam it to transfer from memory to memory.</p>
<p>&nbsp;</p>
<p>It is certainly possible to use DMA to transfer data directly from disk to application memory. There are some restriction to the size, memory address and file offset (basically they need to be multiple of disk sector size) but that's hardly a problem for
 databases because they organize data in pages and th page size is a multiple of sector size (at least on SQL Server).
</p>
<p>&nbsp;</p>
<p>It should also be noted that kernel file caching does not always come for free. For example if you're reading a file by using a &quot;read&quot; style call then the kernel has no choice but to copy the file data from its cache to the application provided buffer. Not
 extremly slow but certainly not free either. </p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ff9f9d54a10848e795db9deb00d58ae4#ff9f9d54a10848e795db9deb00d58ae4</link>
		<pubDate>Mon, 25 Jan 2010 10:01:36 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/ff9f9d54a10848e795db9deb00d58ae4#ff9f9d54a10848e795db9deb00d58ae4</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>But that just plainly isn't true. The kernel can figure out only what was the most common information in the past, and then hope that this will still be true in the future. But there are many access patterns where this simply isn't true. The kernel cannot
 know if an application is going to have an access pattern for which this strategy doesn't work. Only the application can know that. Hence, the application can do better.</p>
<p>&nbsp;</p>
<p>And CPU caching isn't relevant to this discussion for&nbsp;several reasons: 1. The timing gap between memory and disk is greater than that between CPU cache and memory. 2. CPU cache is a few MB at best, file cache is typically multiple GB. 3. The kernel does
 not&nbsp;manage the CPU cache, the CPU does that.</p>
<p>&nbsp;</p>
<p>And the simple fact is that SQL Server and Oracle and other high-performance DB systems do in fact do their own caching. Are you saying the people who develop those databases are all wrong?</p>
</div></blockquote>
<p><strong>But that just plainly isn't true. The kernel can figure out only what was the most common information in the past, and then hope that this will still be true in the future.</strong></p>
<p>&nbsp;</p>
<p>If you design your data structures correctly, it SHOULD be true in the future.</p>
<p>&nbsp;</p>
<p><strong>But there are many access patterns where this simply isn't true. </strong>
</p>
<p>&nbsp;</p>
<p>And they don't belong in databases.</p>
<p>&nbsp;</p>
<p><strong>The kernel cannot know if an application is going to have an access pattern for which this strategy doesn't work. Only the application can know that. Hence, the application can do better.</strong></p>
<p>&nbsp;</p>
<p>I don't think that's actually true. For many data structures, you can not&nbsp;reliably&nbsp;predict where an element will be located on a restructure. If you can, you can ask the kernel to cache that part of the file for you anyway. See the
<em>readahead</em> syscall. Which is exactly how readahead&nbsp;daemons&nbsp;like preload work! They don't go around implementing their own file caching, that would be stupid. They let the kernel do it for them, as any good DB implementation would.</p>
<p>&nbsp;</p>
<p><strong>And CPU caching isn't relevant to this discussion for&nbsp;several reasons: 1. The timing gap between memory and disk is greater than that between CPU cache and memory. 2. CPU cache is a few MB at best, file cache is typically multiple GB. 3. The kernel
 does not&nbsp;manage the CPU cache, the CPU does that.</strong></p>
<p>&nbsp;</p>
<p>1. It may be, but having too many direct memory accesses with effect system&nbsp;performance&nbsp;in a very bad way.</p>
<p>2. CPU Cache is a few MB at best, so lets ignore it? That sounds like a plan chief.</p>
<p>3. The fact that the CPU manages the CPU cache is exactly why it's important. You can not program a CPU cache, because that is a right reserved only for the immortals who design hardware, and you are but a mortal software developer. The best you could do
 is structure your data in a way that is conductive to CPU caching. If you are going to do this anyway, why not structure your data in a way that is conductive to the kernel caching? Hmm?&nbsp;The kernel is like the CPU's prophet to software. Like most prophets,
 he is mortal and software like all other software, but is bestowed on by the immortals with certain&nbsp;miracles. But you shouldn't ignore his powers, for he speakiths directly to the hardware, and knows it's demands. You do not.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2b802454f74b434097269deb00d58b1b#2b802454f74b434097269deb00d58b1b</link>
		<pubDate>Mon, 25 Jan 2010 18:50:31 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/2b802454f74b434097269deb00d58b1b#2b802454f74b434097269deb00d58b1b</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Eh, more crap. You're calling a 30 years old (or maybe even older) technology like DMA a modern thing. Sure, 30 years ago there was no CPU cache but it's not like CPU cache is extremly new either.
</p>
<p>&nbsp;</p>
<p>And you have no clue about how non cached file I/O works and you claim that it's not possible to implement user mode caching because of lack of DMA?
</p>
<p>&nbsp;</p>
<p>And you call people idiots for moving data in and out of the CPU cache. Do you perhaps have a wonder CPU with 1 terrabyte of cache so people can stop acting like idiots while moving their database in and out of the CPU cache?</p>
<p>&nbsp;</p>
<p>You might as well throw in the kitchen sink. It will probably weight more than flawed arguments.</p>
<p>&nbsp;</p>
<p>PS:</p>
<p>Technical notes for people unfamiliar with x86 hardware and file caching (at least on Windows):</p>
<p>&nbsp;</p>
<p>PCs included DMA since the early days. Back then it was used to transfer data from the floppy disk to memory and for a couple of other things. AFAIR it was even possible to progam it to transfer from memory to memory.</p>
<p>&nbsp;</p>
<p>It is certainly possible to use DMA to transfer data directly from disk to application memory. There are some restriction to the size, memory address and file offset (basically they need to be multiple of disk sector size) but that's hardly a problem for
 databases because they organize data in pages and th page size is a multiple of sector size (at least on SQL Server).
</p>
<p>&nbsp;</p>
<p>It should also be noted that kernel file caching does not always come for free. For example if you're reading a file by using a &quot;read&quot; style call then the kernel has no choice but to copy the file data from its cache to the application provided buffer. Not
 extremly slow but certainly not free either. </p>
</div></blockquote>
<p>Welcome back Dexter!</p>
<p>&nbsp;</p>
<p>I didn't say it was impossible to implement caching without DMA, I just said it was incredibly stupid. Since your entire argument is based on a false premise, there is little more I can say. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /></p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cc887b5ff8cd4abb97089deb00d58b4b#cc887b5ff8cd4abb97089deb00d58b4b</link>
		<pubDate>Mon, 25 Jan 2010 19:02:16 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/cc887b5ff8cd4abb97089deb00d58b4b#cc887b5ff8cd4abb97089deb00d58b4b</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>Welcome back Dexter!</p>
<p>&nbsp;</p>
<p>I didn't say it was impossible to implement caching without DMA, I just said it was incredibly stupid. Since your entire argument is based on a false premise, there is little more I can say.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"></p>
</div></blockquote>
<p>Well, it seems that you failed to read the &quot;technical notes&quot; part of the post. I guess you consider yourself familiar with inner kernel/hardware details. In that case you should also know that it is possible to use DMA even if caching is not done in kernel.
</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">Unfortunately this is the only realistic way to implement a user-mode file cache is to pass data in and out of the CPU like an idiot.</div></blockquote></p>
<p>&nbsp;</p>
<p>Did you say &quot;false premises&quot;?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0c9609894d294aa9994a9deb00d58b79#0c9609894d294aa9994a9deb00d58b79</link>
		<pubDate>Mon, 25 Jan 2010 19:59:03 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/0c9609894d294aa9994a9deb00d58b79#0c9609894d294aa9994a9deb00d58b79</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>Did you say &quot;false premises&quot;?</p>
</div></blockquote>
<p>You give a reasonable explination for how DMA works, and it's history. I'm not aware of any user-mode way of doing DMA, that's something that seems to be reserved to kernel space (as virtually any hardware control). If you know a way feel free to enlighten
 me.</p>
<p>&nbsp;</p>
<p>But as so far as it makes any difference to the&nbsp;argument, using DMA is far better then not. If you start taking memory management in your own hands, if you are implementing your own cache by memcpy'ing things around you are&nbsp;effectively&nbsp;not using DMA so any
 data you write to this cache has to go through the CPU. If you let the kernel handle this file caching for you, this is not a problem.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/1d60f3b3806a42cb8a179deb00d58ba6#1d60f3b3806a42cb8a179deb00d58ba6</link>
		<pubDate>Mon, 25 Jan 2010 20:44:44 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/1d60f3b3806a42cb8a179deb00d58ba6#1d60f3b3806a42cb8a179deb00d58ba6</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>You give a reasonable explination for how DMA works, and it's history. I'm not aware of any user-mode way of doing DMA, that's something that seems to be reserved to kernel space (as virtually any hardware control). If you know a way feel free to enlighten
 me.</p>
<p>&nbsp;</p>
<p>But as so far as it makes any difference to the&nbsp;argument, using DMA is far better then not. If you start taking memory management in your own hands, if you are implementing your own cache by memcpy'ing things around you are&nbsp;effectively&nbsp;not using DMA so any
 data you write to this cache has to go through the CPU. If you let the kernel handle this file caching for you, this is not a problem.</p>
</div></blockquote>
<p><blockquote><div class="quoteText"><strong>seems</strong> to be reserved to kernel space</div></blockquote></p>
<p>&nbsp;</p>
<p>So you make a bunch of claims but you actually don't have a clue how this works?&nbsp;You have&nbsp;some nerve.</p>
<p>&nbsp;</p>
<p>Of course one's going to use DMA. Not only that using PIO mode would be terrible inefficient but AFAIK it's not even possible with&nbsp;adapters like AHCI.&nbsp;Or SCSI.</p>
<p>&nbsp;</p>
<p>And there's no reason why DMA cannot be used to transfer data to user mode space. Note that I'm talking about data tranfer and not about DMA programming. All the application has to do is to issue a properly aligned, non cached, asynchronous read request.
 The kernel&nbsp;will take care of programming the&nbsp;DMA&nbsp;(or the disk adapter which in turn will program the DMA) with the memory address and size specified by the application and that's all there is to it. There's no memcpy, no CPU cache trashing and no requirement
 for the application to have low level access to the hardware.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/9d62dddc54f4426295119deb00d58bd6#9d62dddc54f4426295119deb00d58bd6</link>
		<pubDate>Mon, 25 Jan 2010 21:18:08 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/9d62dddc54f4426295119deb00d58bd6#9d62dddc54f4426295119deb00d58bd6</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>So you make a bunch of claims but you actually don't have a clue how this works?&nbsp;You have&nbsp;some nerve.</p>
<p>&nbsp;</p>
<p>Of course one's going to use DMA. Not only that using PIO mode would be terrible inefficient but AFAIK it's not even possible with&nbsp;adapters like AHCI.&nbsp;Or SCSI.</p>
<p>&nbsp;</p>
<p>And there's no reason why DMA cannot be used to transfer data to user mode space. Note that I'm talking about data tranfer and not about DMA programming. All the application has to do is to issue a properly aligned, non cached, asynchronous read request.
 The kernel&nbsp;will take care of programming the&nbsp;DMA&nbsp;(or the disk adapter which in turn will program the DMA) with the memory address and size specified by the application and that's all there is to it. There's no memcpy, no CPU cache trashing and no requirement
 for the application to have low level access to the hardware.</p>
<p>&nbsp;</p>
</div></blockquote>
<p><strong>So you make a bunch of claims but you actually don't have a clue how this works?&nbsp;You have&nbsp;some nerve.<br>
</strong></p>
<p>Well I am not like you Dexter. I don't act like I know everything, and I actually value input from others. And this I apologize for.&nbsp;</p>
<p>&nbsp;</p>
<p><strong>And there's no reason why DMA cannot be used to transfer data to user mode space. Note that I'm talking about data tranfer and not about DMA programming.</strong></p>
<p>&nbsp;</p>
<p>But what you seem (oops I said &quot;seem&quot;, I have some nerve!) to be saying is that you indeed need to make a syscall to do DMA, which is what I
<em>assumed<span> </span><span><span>(&quot;assumed&quot;? you must be raging by now)</span></span>.&nbsp;</em>Thank you for proving yourself wrong once again.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a951c07d334d4c929ea69deb00d58c06#a951c07d334d4c929ea69deb00d58c06</link>
		<pubDate>Mon, 25 Jan 2010 21:30:52 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a951c07d334d4c929ea69deb00d58c06#a951c07d334d4c929ea69deb00d58c06</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Dexter said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>So you make a bunch of claims but you actually don't have a clue how this works?&nbsp;You have&nbsp;some nerve.<br>
</strong></p>
<p>Well I am not like you Dexter. I don't act like I know everything, and I actually value input from others. And this I apologize for.&nbsp;</p>
<p>&nbsp;</p>
<p><strong>And there's no reason why DMA cannot be used to transfer data to user mode space. Note that I'm talking about data tranfer and not about DMA programming.</strong></p>
<p>&nbsp;</p>
<p>But what you seem (oops I said &quot;seem&quot;, I have some nerve!) to be saying is that you indeed need to make a syscall to do DMA, which is what I
<em>assumed<span> </span><span><span>(&quot;assumed&quot;? you must be raging by now)</span></span>.&nbsp;</em>Thank you for proving yourself wrong once again.</p>
</div></blockquote>
<p><blockquote><div class="quoteText"></p>
<p>Well I am not like you Dexter. &nbsp;I don't act like I know everything, and actually value input from others. And this I apologize for.&nbsp;</p>
<p></div></blockquote></p>
<p>&nbsp;</p>
<p>I don't know everything either. But I keep my mouth shut when I have no clue about what people are talking about.</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText"></p>
<p>But what you seem to be saying is that you indeed need to make a syscall to do DMA, which is what I assumed. Thank you for proving yourself wrong once again.</div></blockquote></p>
<p>&nbsp;</p>
<p>And what exactly is wrong? The whole thing was about an application bypassing the kernel cache, not about an application bypassing the whole kernel.</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/381af8af6d0a4114972c9deb00d58c35#381af8af6d0a4114972c9deb00d58c35</link>
		<pubDate>Mon, 25 Jan 2010 21:37:36 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/381af8af6d0a4114972c9deb00d58c35#381af8af6d0a4114972c9deb00d58c35</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>&nbsp;</p>
<p>And what exactly is wrong? The whole thing was about an application bypassing the kernel cache, not about an application bypassing the whole kernel.</p>
<p>&nbsp;</p>
</div></blockquote>
<p>No Dexter,</p>
<p>&nbsp;</p>
<p>Here is my primary&nbsp;arguments:</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<ul>
<li>If you design a DB, you want to delegate as much responsibility to the kernel as possible (Sven was almost insulted by this.)
</li><li>An OS can accomplish a lot of the intelligent read caching by a paging algorithm, which keeps track of the most commonly read parts of a file.
</li><li>In a modern kernel, you can control the functionality of the paging algorithm in many many ways (eg:
<em>mmap</em> and&nbsp;<em>readahead</em>&nbsp;in Linux), to the point where I am not sure what this magical &quot;domain-specific&quot; DB cache that neither of you explained can do better.
</li><li>It would be downright stupid for a DB to take all of this in it's own hands, but I am sure there is some legacy DBs out there who still do all of this, because they were written during a time where MS-DOS was considered advanced.
</li><li>Implementing a file cache entirely in userspace (eg. malloc &#43; memcpy) is downright idiotic.
</li></ul>
<div>If you don't believe me, reread all my posts and see that's exactly what I've been saying all along.</div>
<div>Meanwhile, your most absurd&nbsp;argument thus far&nbsp;is:</div>
<div>In response to &quot;Moving commonly read data to a certain part of a file.&quot;</div>
<div>You say:&nbsp;Seriously, do you really want/expect&nbsp;a database system to move gigabytes or terrabytes of data around just to keep the kernel happy?</div>
<div>With a statement like that, you make an&nbsp;implicit&nbsp;assumption that (1) you can not a design a data structure that automatically localizes commonly used data, which is a naive assertion. (2) moving gigabytes or terrabytes of data is completely off the table
 in a DBMS, which is also naive (3) that a DB implementer has another choice if he wants to optimize his application, which is not true because of the CPU cache you both seem to hate so much.</div>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a887e39fa86f42029ec29deb00d58c67#a887e39fa86f42029ec29deb00d58c67</link>
		<pubDate>Mon, 25 Jan 2010 21:56:45 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a887e39fa86f42029ec29deb00d58c67#a887e39fa86f42029ec29deb00d58c67</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Dexter said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">magicalclick said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>1) Depends what you mean by DB. You can consider NTFS as being a domain specific DB. It's not relational, it doesn't process SQL queries but it stores and retrieves data, a lot of data. Incidentaly NTFS uses B-Trees which are also commonly used in databases.</p>
<p>2) Already done. NTFS (and probably most other filesystems) caches file metdata in RAM. Go enumerate a directory containing a large number of files. You'll likely see a ton of disk activity. Do that again. No disk activity this time.</p>
<p>3) Already done. File metadata caching occurs in kernel, not in the application. This means that there's only one instance of that data so there's no need for some sort of &quot;good sync&quot;.</p>
<p>4) Already done. Unless you're an administrator you cannot access the disk directly, you must go through the filesystem. Also if you're not an administrator you cannot load drivers. In general this has less to do with the filesystem, it's a generic kernel
 problem.</p>
</div></blockquote>
<p>Thanks for clearing that up. Now, I don't understand all the blah anymore. Well, better leave it to you guys.
</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/03a1903e1a1942569adb9deb00d58c96#03a1903e1a1942569adb9deb00d58c96</link>
		<pubDate>Tue, 26 Jan 2010 03:43:43 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/03a1903e1a1942569adb9deb00d58c96#03a1903e1a1942569adb9deb00d58c96</guid>
		<dc:creator>magicalclick</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/magicalclick/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>But that just plainly isn't true. The kernel can figure out only what was the most common information in the past, and then hope that this will still be true in the future.</strong></p>
<p>&nbsp;</p>
<p>If you design your data structures correctly, it SHOULD be true in the future.</p>
<p>&nbsp;</p>
<p><strong>But there are many access patterns where this simply isn't true. </strong>
</p>
<p>&nbsp;</p>
<p>And they don't belong in databases.</p>
<p>&nbsp;</p>
<p><strong>The kernel cannot know if an application is going to have an access pattern for which this strategy doesn't work. Only the application can know that. Hence, the application can do better.</strong></p>
<p>&nbsp;</p>
<p>I don't think that's actually true. For many data structures, you can not&nbsp;reliably&nbsp;predict where an element will be located on a restructure. If you can, you can ask the kernel to cache that part of the file for you anyway. See the
<em>readahead</em> syscall. Which is exactly how readahead&nbsp;daemons&nbsp;like preload work! They don't go around implementing their own file caching, that would be stupid. They let the kernel do it for them, as any good DB implementation would.</p>
<p>&nbsp;</p>
<p><strong>And CPU caching isn't relevant to this discussion for&nbsp;several reasons: 1. The timing gap between memory and disk is greater than that between CPU cache and memory. 2. CPU cache is a few MB at best, file cache is typically multiple GB. 3. The kernel
 does not&nbsp;manage the CPU cache, the CPU does that.</strong></p>
<p>&nbsp;</p>
<p>1. It may be, but having too many direct memory accesses with effect system&nbsp;performance&nbsp;in a very bad way.</p>
<p>2. CPU Cache is a few MB at best, so lets ignore it? That sounds like a plan chief.</p>
<p>3. The fact that the CPU manages the CPU cache is exactly why it's important. You can not program a CPU cache, because that is a right reserved only for the immortals who design hardware, and you are but a mortal software developer. The best you could do
 is structure your data in a way that is conductive to CPU caching. If you are going to do this anyway, why not structure your data in a way that is conductive to the kernel caching? Hmm?&nbsp;The kernel is like the CPU's prophet to software. Like most prophets,
 he is mortal and software like all other software, but is bestowed on by the immortals with certain&nbsp;miracles. But you shouldn't ignore his powers, for he speakiths directly to the hardware, and knows it's demands. You do not.</p>
</div></blockquote>
<p>Look, I agree that in 99.9% of the cases, you do want to leave this to the kernel. I just believe that very high performance databases are one of the cases where you don't.</p>
<p>&nbsp;</p>
<p>Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own scheduling
 and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">They let the kernel do it for them, as any good DB implementation would.</div></blockquote></p>
<p>So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a02e893d0ee9450b9a019deb00d58cce#a02e893d0ee9450b9a019deb00d58cce</link>
		<pubDate>Tue, 26 Jan 2010 03:47:24 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a02e893d0ee9450b9a019deb00d58cce#a02e893d0ee9450b9a019deb00d58cce</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?</p>
</div></blockquote>
<p>I think he is more refering to stuff like C#. Kernel took care of lower level stuff and DB people took care of higher level stuff. But, that only works for lazy people like me. For pure DB people, I am sure they want to do everything themselves to get a
 better performance than general kernel.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/5216b4deeb9a4f9b9eb79deb00d58cfa#5216b4deeb9a4f9b9eb79deb00d58cfa</link>
		<pubDate>Tue, 26 Jan 2010 03:53:36 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/5216b4deeb9a4f9b9eb79deb00d58cfa#5216b4deeb9a4f9b9eb79deb00d58cfa</guid>
		<dc:creator>magicalclick</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/magicalclick/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>So you contend that SQL Server, Oracle, IBM DB2 and other databases that dominate the TPC-C and TPC-H benchmarks are all in fact not any good because they do their own caching?</p>
</div></blockquote>
<p><strong>Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own
 scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.</strong></p>
<p>&nbsp;</p>
<p>What &quot;clear, measurable&quot; difference? Every DB vendor claims they are the fastest DB. Even SQLite
<a href="http://www.sqlite.org/speed.html">claims it's the fastest DB</a>. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their
 websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.</p>
<p>&nbsp;</p>
<p>If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms.
 You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e23c1bf574b24892b68a9deb00d58d2a#e23c1bf574b24892b68a9deb00d58d2a</link>
		<pubDate>Tue, 26 Jan 2010 05:25:54 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/e23c1bf574b24892b68a9deb00d58d2a#e23c1bf574b24892b68a9deb00d58d2a</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own
 scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.</strong></p>
<p>&nbsp;</p>
<p>What &quot;clear, measurable&quot; difference? Every DB vendor claims they are the fastest DB. Even SQLite
<a href="http://www.sqlite.org/speed.html">claims it's the fastest DB</a>. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their
 websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.</p>
<p>&nbsp;</p>
<p>If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms.
 You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.</p>
</div></blockquote>
<p><blockquote><div class="quoteText">What &quot;clear, measurable&quot; difference?</div></blockquote></p>
<p>Well, let's look at the <a href="http://www.tpc.org/tpcc/results/tpcc_perf_results.asp">
top ten results of the TPC-C</a>. If SQLite is so great, why isn't it in there? Or
<a href="http://www.tpc.org/tpch/results/tpch_perf_results.asp">the TPC-H</a>, for that matter. Or check the
<a href="http://scholar.google.com/scholar?q=database&#43;cache&amp;hl=en&amp;as_sdt=2001&amp;as_sdtp=on">
huge amount of research being done on this topic</a>. Apparently every single one of the papers was a waste of time because the kernel can do better anyway.</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">I've asked repeatedly, but I've got no answers.</div></blockquote></p>
<p>I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.</p>
<p>&nbsp;</p>
<p>And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a6715b1394f04464b3469deb00d58d5d#a6715b1394f04464b3469deb00d58d5d</link>
		<pubDate>Tue, 26 Jan 2010 05:41:47 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/a6715b1394f04464b3469deb00d58d5d#a6715b1394f04464b3469deb00d58d5d</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the kernel.</p>
<p>&nbsp;</p>
<p>And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?</p>
</div></blockquote>
<p><strong>Well, let's look at the <a href="http://www.tpc.org/tpcc/results/tpcc_perf_results.asp">
top ten results of the TPC-C</a>. If SQLite is so great, why isn't it in there?</strong></p>
<p>&nbsp;</p>
<p>Maybe because <a href="http://www.tpc.org/tpcc/results/tpcc_results.asp">they don't test it</a>? Did you even bother looking before you pasted that link?
</p>
<p>&nbsp;</p>
<p><strong>I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the
 kernel.</strong></p>
<p>&nbsp;</p>
<p>What exactly would a database know that could possibly aid it in caching? That's not exactly obvious is it? Then why make an assumption that you can not even answer?</p>
<p><br>
But seriously, let's talk algorithms. </p>
<p>&nbsp;</p>
<p>You are building a DB, and you store your DB information in a file or a series of files, right?
</p>
<p>You have to somehow access those files, by using some kind of file api (eg: fopen, fseek), or something like mmap.</p>
<p>Now you want to cache something, what do you do? Explain the functions a typical DB would use to cache something. And then explain exactly how this is better then using readahead. Because I can not explain this, and you apperently can.</p>
<p><strong><br>
</strong></p>
<p><strong>And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?</strong></p>
<p>&nbsp;</p>
<p>First of all, only modern kernels (eg: Linux 2.6) have the syscall that allows processes to suggest parts of a file be in the kernel's file cache. I don't think Windows any similar syscall, or if it does, it's poorly documented because I've looked hard.
 So you might be SOL on Windows support, and that's probably one of many reason Drizzle doesn't work on Windows.&nbsp; Secondly, there are applications that do use it (I even named one), so your entire permise is wrong. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /> Thirdly, you wouldn't know if Oracle/DB2
 et al. do it, because they are closed source. Forthly, well I think 1, 2, and 3 were enough in this case.
</p>
<p>&nbsp;</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/31cee9fda4d745a5ac439deb00d58d92#31cee9fda4d745a5ac439deb00d58d92</link>
		<pubDate>Tue, 26 Jan 2010 06:04:41 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/31cee9fda4d745a5ac439deb00d58d92#31cee9fda4d745a5ac439deb00d58d92</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Well, let's look at the <a href="http://www.tpc.org/tpcc/results/tpcc_perf_results.asp">
top ten results of the TPC-C</a>. If SQLite is so great, why isn't it in there?</strong></p>
<p>&nbsp;</p>
<p>Maybe because <a href="http://www.tpc.org/tpcc/results/tpcc_results.asp">they don't test it</a>? Did you even bother looking before you pasted that link?
</p>
<p>&nbsp;</p>
<p><strong>I've answered repeatedly, but you keep igoring it. It isn't a matter of algorithms, it's a matter of information. The database knows its own usage patterns, and can therefore make better caching decisions even when using the same algorithms as the
 kernel.</strong></p>
<p>&nbsp;</p>
<p>What exactly would a database know that could possibly aid it in caching? That's not exactly obvious is it? Then why make an assumption that you can not even answer?</p>
<p><br>
But seriously, let's talk algorithms. </p>
<p>&nbsp;</p>
<p>You are building a DB, and you store your DB information in a file or a series of files, right?
</p>
<p>You have to somehow access those files, by using some kind of file api (eg: fopen, fseek), or something like mmap.</p>
<p>Now you want to cache something, what do you do? Explain the functions a typical DB would use to cache something. And then explain exactly how this is better then using readahead. Because I can not explain this, and you apperently can.</p>
<p><strong><br>
</strong></p>
<p><strong>And yes, it would be great if the applications could feed sufficiently detailed information to the kernel so that it could do better. But if that's so easy, then why aren't more applications doing it?</strong></p>
<p>&nbsp;</p>
<p>First of all, only modern kernels (eg: Linux 2.6) have the syscall that allows processes to suggest parts of a file be in the kernel's file cache. I don't think Windows any similar syscall, or if it does, it's poorly documented because I've looked hard.
 So you might be SOL on Windows support, and that's probably one of many reason Drizzle doesn't work on Windows.&nbsp; Secondly, there are applications that do use it (I even named one), so your entire permise is wrong.
<img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> Thirdly, you wouldn't know if Oracle/DB2 et al. do it, because they are closed source. Forthly, well I think 1, 2, and 3 were enough in this case.
</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
</div></blockquote>
<p><blockquote><div class="quoteText">Now you want to cache something, what do you do?</div></blockquote></p>
<p>As I've said a million times, but you keep ignoring, it isn't about when you want to cache something. It's about when you want to remove something from the cache. Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately,
 table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it
 knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.</p>
<p>&nbsp;</p>
<p>That's just one simple example. There is 40 years of research on this point, and no, I don't know it all, nor do I have the time to read all of it and create a summary for you.</p>
<p>&nbsp;</p>
<p><blockquote><div class="quoteText">First of all, only modern kernels can do things like readahead. I don't think Windows any similar functionality, or if it does, it's poorly documented, because I've looked hard.</div></blockquote></p>
<p>So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point.</p>
<p>&nbsp;</p>
<p>Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great. All I'm saying that
 in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.</p>
<p>&nbsp;</p>
<p>These people don't implement their own caching schemes because they have nothing better to do. I'm sure they started out without explicit caching, but then they profiled it and found they were hitting the disc more often than was needed. That's how performance
 optimization works: you start with a simple solution, measure it, then try to improve. That's where these caching solutions come from. They didn't set out thinking &quot;well I've heard that we have to cache things manually, so let's not measure it and do it regardless&quot;.
 That's not how these things work.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/32c6462e28f74a97b4329deb00d58dcd#32c6462e28f74a97b4329deb00d58dcd</link>
		<pubDate>Tue, 26 Jan 2010 06:14:15 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/32c6462e28f74a97b4329deb00d58dcd#32c6462e28f74a97b4329deb00d58dcd</guid>
		<dc:creator>Sven Groot</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Sven Groot/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Sven Groot said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p></p>
<p>So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point.</p>
<p>&nbsp;</p>
<p>Note that I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great. All I'm saying that
 in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.</p>
<p>&nbsp;</p>
<p>These people don't implement their own caching schemes because they have nothing better to do. I'm sure they started out without explicit caching, but then they profiled it and found they were hitting the disc more often than was needed. That's how performance
 optimization works: you start with a simple solution, measure it, then try to improve. That's where these caching solutions come from. They didn't set out thinking &quot;well I've heard that we have to cache things manually, so let's not measure it and do it regardless&quot;.
 That's not how these things work.</p>
</div></blockquote>
<p><strong>Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache
 and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.</strong></p>
<p>&nbsp;</p>
<p>Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about
 deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be
<em>faster</em>. Woops!</p>
<p>&nbsp;</p>
<p>Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be &quot;keep commonly accessed data in memory&quot;. Why? Because that, is commonly
 accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing
 is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics&nbsp; algorithm that wastes clock cycles thinking what decision to fail on next.</p>
<p>&nbsp;</p>
<p>To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.</p>
<p>&nbsp;</p>
<p><strong>So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that
 I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides.
</strong><strong>Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.
</strong></p>
<p>&nbsp;</p>
<p>Okay, this kind of functionality is new in that it's not 30 years old.&nbsp; <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif' alt='Smiley' /> But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly not something &quot;in the future&quot;. Unless &quot;the future is now&quot;. I'm sure Windows has it too,
 it's probably just one of those &quot;undocumented APIs&quot;, because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all about this already, since you are a DB/caching expert. <img src='http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-5.gif' alt='Wink' /></p>
<p>&nbsp;</p>
<p><strong>All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.</strong></p>
<p>&nbsp;</p>
<p>I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is &quot;archaic&quot; enough that a fork is in process to &quot;modernize&quot; it (in order to make
 it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. &quot;LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!&quot;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d6dd1f5c11ec4b82a1cd9deb00d58e0e#d6dd1f5c11ec4b82a1cd9deb00d58e0e</link>
		<pubDate>Tue, 26 Jan 2010 06:28:52 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/d6dd1f5c11ec4b82a1cd9deb00d58e0e#d6dd1f5c11ec4b82a1cd9deb00d58e0e</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>The CPU cache thing was in regard to Dexter's assertion that a DB shouldn't have to localize it's important data. Quite frankly to run efficiently on an x86 processor, it has no other choice. . I don't know how many of you ever did assembly programming.
 Although you can create a memory caching algorithm, you can't create your own processor caching algorithm on x86. That is hard-coded on the CPU by Intel/AMD. Performance degrades considerably on x86 if there is are many cache misses, which is the cache equivalent
 of a page fault. So your data structures must consider this in order to efficiently run on the architecture.</p>
</div></blockquote>
<p>No, it really doesn't matter at all.&nbsp; A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?&nbsp; If the server needs to read I/O by making a system call, its going to switch context and there goes your whole
 cache anyway, so why do we care about CPU cache again?</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4348c6df14c74462a4069deb00d58e3c#4348c6df14c74462a4069deb00d58e3c</link>
		<pubDate>Tue, 26 Jan 2010 07:18:55 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/4348c6df14c74462a4069deb00d58e3c#4348c6df14c74462a4069deb00d58e3c</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">CreamFilling512 said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Bass said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p>No, it really doesn't matter at all.&nbsp; A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?&nbsp; If the server needs to read I/O by making a system call, its going to switch context and there goes your whole
 cache anyway, so why do we care about CPU cache again?</p>
</div></blockquote>
<p><strong>A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?</strong></p>
<p>&nbsp;</p>
<p>The fact that a database server is I/O limited is <strong><em>exactly </em></strong>why you want to access time. Think for a moment on why that would be so.
</p>
<p>&nbsp;</p>
<p><strong>If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?</strong></p>
<p>&nbsp;</p>
<p>The kernel doesn't even need to involve the CPU all that much to read I/O to memory. Come on, pay attention to the rest of the debate, I mentioned this before and Dexter even elaborated on the point.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6961509cc7b14df790c99deb00d58e6a#6961509cc7b14df790c99deb00d58e6a</link>
		<pubDate>Tue, 26 Jan 2010 07:22:17 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/6961509cc7b14df790c99deb00d58e6a#6961509cc7b14df790c99deb00d58e6a</guid>
		<dc:creator>Bass</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Bass/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">CreamFilling512 said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>A database server is going to be I/O limited, not CPU limited, why would you optimize something like that?</strong></p>
<p>&nbsp;</p>
<p>The fact that a database server is I/O limited is <strong><em>exactly </em></strong>why you want to access time. Think for a moment on why that would be so.
</p>
<p>&nbsp;</p>
<p><strong>If the server needs to read I/O by making a system call, its going to switch context and there goes your whole cache anyway, so why do we care about CPU cache again?</strong></p>
<p>&nbsp;</p>
<p>The kernel doesn't even need to involve the CPU all that much to read I/O to memory. Come on, pay attention to the rest of the debate, I mentioned this before and Dexter even elaborated on the point.</p>
</div></blockquote>
<p>Everytime you issue a kernel-mode I/O request you are doing a system call, throwing up an interrupt, and causing a thread context switch to kernel-mode regardless of if the kernel actually has to go to disk for the data or not.&nbsp; You are thrashing the CPU
 cache by doing this.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3e06bfa806264509a5629deb00d58e98#3e06bfa806264509a5629deb00d58e98</link>
		<pubDate>Tue, 26 Jan 2010 07:32:47 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/3e06bfa806264509a5629deb00d58e98#3e06bfa806264509a5629deb00d58e98</guid>
		<dc:creator>CreamFilling512</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/CreamFilling512/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache
 and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.</strong></p>
<p>&nbsp;</p>
<p>Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about
 deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be
<em>faster</em>. Woops!</p>
<p>&nbsp;</p>
<p>Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be &quot;keep commonly accessed data in memory&quot;. Why? Because that, is commonly
 accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing
 is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics&nbsp; algorithm that wastes clock cycles thinking what decision to fail on next.</p>
<p>&nbsp;</p>
<p>To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.</p>
<p>&nbsp;</p>
<p><strong>So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that
 I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides.
</strong><strong>Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.
</strong></p>
<p>&nbsp;</p>
<p>Okay, this kind of functionality is new in that it's not 30 years old.&nbsp; <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly
 not something &quot;in the future&quot;. Unless &quot;the future is now&quot;. I'm sure Windows has it too, it's probably just one of those &quot;undocumented APIs&quot;, because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all
 about this already, since you are a DB/caching expert. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-5.gif" alt="Wink"></p>
<p>&nbsp;</p>
<p><strong>All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.</strong></p>
<p>&nbsp;</p>
<p>I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is &quot;archaic&quot; enough that a fork is in process to &quot;modernize&quot; it (in order to make
 it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. &quot;LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!&quot;</p>
</div></blockquote>
<p><blockquote><div class="quoteText">Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU).</div></blockquote></p>
<p>&nbsp;</p>
<p>Since you seem incapable to accept that usermode caching can be done without memcpy despite the fact I explained how it can be done, would you be so kind to explain to us why we should bother to provide more evidence?</p>
<p>&nbsp;</p>
<p>And when all you can say&nbsp;is &quot;the kernel is awesome&quot; and the &quot;cache is intelligent&quot; do you expact us to provide complete technical arguments? Since when &quot;awesomeness&quot; is a technical word? Or &quot;brainf**k&quot;?</p>
<p>&nbsp;</p>
<p>You bring a lot of stuff into discussion but you fail to provide any decent example on how the said stuff might be used. readahead? sure, why not. Let's say what this readahead is all about:</p>
<p>&nbsp;</p>
<p><strong>ssize_t</strong> <strong>readahead(int</strong> <em>fd</em><strong>,</strong>
<strong>off64_t</strong> <em>offset</em><strong>,</strong> <strong>size_t</strong>
<em>count</em><strong>);</strong></p>
<p>So, the database has to tell the kernel to bring in memory a specific portion of the file. Woops. The database has to know that! The horror! The sheer stupidty of&nbsp;the database having to know what parts of the file it needs! How's that any different from
 the database doing non cached read operations? And your fancy pants readahead call turns out to be a synchronous call. Not the best way to achieve high performance.</p>
<p>&nbsp;</p>
<p>mmap? Sure, why not? To your advantage servers have moved to 64 bits and mapping huge files is less of an issue. It would be fun to see you using mmap with a 50 GB database on a 32 bit system. You'll have to keep call mmap and munmap all day long. And you'll
 probably trigger a lot of page faults in the process. I'm not even bothering to ask how to deal with writes when mmap is involved. Feel free to figure out for yourself if you think you're soo cool and everyone else is an idiot.</p>
<p>&nbsp;</p>
<p>HDD's head position? What a good joke, probably from a Sci Fi book. Yes, sure, there are disk I/O scheduling algorithms that take into account the position of the head. Except that:</p>
<p>- what the kernel does is keep track of where on the drive the last operation was performed, it does not get this information from the drive. Since high performance databases tend to used dedicated drives they can keep track of this too. Not that they need
 to to do so because at the end of the day they don't bypass the disk driver. In any case, there's no &quot;zomg&quot; context switch required for this. In the worst case the database needs to retrieve the drive geometry from the kernel but since that doesn't change
 over time it can be done only once, at startup.</p>
<p>- even some, &quot;modern&quot;, consumer drives can do command queueing (ever heard of NCQ?). Good look trying to keep track of head's position with those drives. Though you won't need to do that, they do command queueing so the kernel doesn't need to do it.</p>
<p>- should I bother to mention SAN storage? Those boxes with tens or hundreds of drives where a read/write request can be spread to multiple drives?</p>
<p>&nbsp;</p>
<p>Keeping commonly accessed data in memory? Sure, why not? But what exactly constitues &quot;commonly accessed data&quot; in a database given the fact that it can process a wide number of queries and each query might require completly different data?</p>
<p>&nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7cfccb3075ed424eb8869deb00d58edc#7cfccb3075ed424eb8869deb00d58edc</link>
		<pubDate>Tue, 26 Jan 2010 09:00:27 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/7cfccb3075ed424eb8869deb00d58edc#7cfccb3075ed424eb8869deb00d58edc</guid>
		<dc:creator>Dexter</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/Dexter/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Of course you want to delegate as much as possible, but experience has shown that you can in fact get better performance, in many cases much better, by doing it yourself. And it's not just caching. High-performance applications tend to do their own
 scheduling and memory management as well. Yes, it's an enormous amount of work, so the only reason people do this is because it makes a clear, measurable difference.</strong></p>
<p>&nbsp;</p>
<p>What &quot;clear, measurable&quot; difference? Every DB vendor claims they are the fastest DB. Even SQLite
<a href="http://www.sqlite.org/speed.html">claims it's the fastest DB</a>. That's right, a barely 200kB shared library DB is claiming victory. MySQL/Oracle has similar comparisons, showing how they are the fastest. You can probably find that if you visit their
 websites, I've seen them before. Every DB vendor is going to claim it's the fastest. It's a sales tactic. There is no objectivity here.</p>
<p>&nbsp;</p>
<p>If you want to argue with me on this, argue on a technical level. Explain exactly what this DB file caching algorithm can do not the kernel can not. I'd like to know. I've asked repeatedly, but I've got no answers. Speak to me with data structures and algorithms.
 You are a DB expert, you should know this. You guys can't think it's just some kind of fact that should be assumed. CS doesn't work that way.</p>
</div></blockquote>
<p>See a million-and-one benchmarks of SQL Server running in Threaded mode (kernel handles scheduling) vs Fiber mode (SQL Server manages scheduling). Prior to the enormous amount of work done in&nbsp;Server 2008 R2 (much of which was driven by SQL Server), fiber
 mode won absolutely hands down.</p>
<p>&nbsp;</p>
<p>You keep coming back to the argument that MRU is always the best caching policy. But there is lots of CS research out there that proves that simply isn't the case. And a basic assumption of all CS is that the more you know about a problem in advance, the
 easier it is to provide the optimal solution. &nbsp;</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c9985fd5e9f4437398e39deb00d58f0e#c9985fd5e9f4437398e39deb00d58f0e</link>
		<pubDate>Tue, 26 Jan 2010 09:17:37 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/c9985fd5e9f4437398e39deb00d58f0e#c9985fd5e9f4437398e39deb00d58f0e</guid>
		<dc:creator>AndyC</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/AndyC/Discussions/RSS</wfw:commentRss>
	</item>
	<item>
		<title>Coffeehouse - It is time... Move the filesystem off of disks</title>
		<description><![CDATA[<p><blockquote><div class="quoteUser">Bass said:</div><div class="quoteText">
<blockquote>
<div class="quoteUser">Sven Groot said:</div>
<div class="quoteText">*snip*</div>
</blockquote>
<p><strong>Let's say the DB is executing an execution plan that accesses two tables A and B. Unfortunately, table A isn't cached, and B is currently the least recently used table in the cache. Based on regular LRU policy, reading A would place it in the cache
 and push B out, so then B would subsequently be read from disc. But the DB knows the execution plan, it knows it's going to read B next, and so it knows removing B from the cache at this point is a bad decision. The kernel doesn't know that.</strong></p>
<p>&nbsp;</p>
<p>Okay, so you need to allocate some space for the entire A, memcpy it over (poor CPU). Then leave table B in memory (it's already there as you said), but you have to drop something right? So drop table C? What if the execution which the DB doesn't know about
 deals with table C? In your contrived example, if that was the case, having no user-mode cache would actually be
<em>faster</em>. Woops!</p>
<p>&nbsp;</p>
<p>Even if your cache implementation gained awesome kernel powers, ie. caching was actually efficient thanks to ring0 technologies, I still think the BEST cache algorithm would simply be &quot;keep commonly accessed data in memory&quot;. Why? Because that, is commonly
 accessed data is likely .. accessed. It's a brainf**k I know, but the simple solution in this case might be the overall fastest. When you pull some commonly accessed data out of memory, chances are, you going to be put it right back anyway. All your are doing
 is delaying the inevitable page fault. Thus defeating the purpose of a complex heuristics&nbsp; algorithm that wastes clock cycles thinking what decision to fail on next.</p>
<p>&nbsp;</p>
<p>To see a similar philosophy in action, see O(1) vs CFS vs BFS. You can put plenty of contrived examples that show BFS should be the slowest piece of sh!t on the planet. But overall, it hauls *.</p>
<p>&nbsp;</p>
<p><strong>So basically what you're saying is that you can't delegate this functionality to the kernel, because most kernels in use in production systems today don't expose the functionality that allow you to do this. Thanks for proving my point. Note that
 I have never claimed that this is a universal truth that will never change. Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides.
</strong><strong>Perhaps in the future you can give enough information information to the kernel so you can rely solely on the caching it provides. That would be great.
</strong></p>
<p>&nbsp;</p>
<p>Okay, this kind of functionality is new in that it's not 30 years old.&nbsp; <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-1.gif" alt="Smiley"> But it's been around for something like 6-7 years, as so far is Linux is concerned. Certainly
 not something &quot;in the future&quot;. Unless &quot;the future is now&quot;. I'm sure Windows has it too, it's probably just one of those &quot;undocumented APIs&quot;, because Prefetch could not realistically work having some control over the kernel cache. But I am sure you know all
 about this already, since you are a DB/caching expert. <img src="http://ecn.channel9.msdn.com/o9/content/images/emoticons/emotion-5.gif" alt="Wink"></p>
<p>&nbsp;</p>
<p><strong>All I'm saying that in current production databases, this isn't the case. And this isn't because the developers of those databases were idiots, as you seem to be implying.</strong></p>
<p>&nbsp;</p>
<p>I said it was idiotic to do this now. When a lot of these DBs were first designed, MS-DOS was considered advanced. Even MySQL, despite being a more modern DB then the rest, is &quot;archaic&quot; enough that a fork is in process to &quot;modernize&quot; it (in order to make
 it faster, interestingly enough), and a big part of that is removing stuff like user-mode caching. &quot;LOLWUT? Removing code is improvement??? Blasphemy! I MEASURE PROGRESS IN SLOCS!&quot;</p>
</div></blockquote>
<p>You appear to be under the assumption that memcpy <em>has</em> to be implemented by a big CPU loop copying data around in physical memory. Or that a database that was managing it's own memory allocation and caching algorithms would work in different sized
 units to memory pages.</p>
<p>&nbsp;</p>
<p>Challenge your assumptions.</p></p>]]></description>
		<link>http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/536c82d4e86a43f998669deb00d58f47#536c82d4e86a43f998669deb00d58f47</link>
		<pubDate>Tue, 26 Jan 2010 09:21:14 GMT</pubDate>
		<guid isPermaLink="false">http://channel9.msdn.com/Forums/Coffeehouse/522547-It-is-time-Move-the-filesystem-off-of-disks/536c82d4e86a43f998669deb00d58f47#536c82d4e86a43f998669deb00d58f47</guid>
		<dc:creator>AndyC</dc:creator>
		<slash:comments>85</slash:comments>
		<wfw:commentRss>http://channel9.msdn.com/Niners/AndyC/Discussions/RSS</wfw:commentRss>
	</item>
</channel>
</rss>