Sometimes it's the small things, but not this time. This is Biggy!
- Posted: Mar 05, 2014 at 6:00 AM
- 11,491 Views
- 2 Comments
Loading User Information from Channel 9
Something went wrong getting user information from Channel 9
Loading User Information from MSDN
Something went wrong getting user information from MSDN
Loading Visual Studio Achievements
Something went wrong getting the Visual Studio Achievements
Today's project isn't our usual fare, it's different and yet kind of cool. We've heard of Document databases, NoSQL and all that. Well from the fertile mind Rob Conery comes another look at that. What? A little File-based Document Store for .NET. Say...
I've been using NeDB (a file-based document store for Node) for a few projects and I utterly love it. Such a simple idea, so fast, so elegant and many times just what I need! I had assumed that such a thing must be around for .NET because there are about 100 different kinds of lists in C#... someone must have made one with a persistent backing store!
But I looked around and couldn't find it, so I made it (as I'll need this in the coming months).
The idea is basically this: I want to use LINQ, I like Dynamics, and I like speed. So that's it, and here's Biggy:
Events and Callbacks
Gotta do it :). There are 5 or so events you can use when working with your data (which I hope seem obvious):
Reading and Writing
So, by now you should be wondering why this is useful. The simple answer is that if you have a high-read application (like a blog, CMS, etc) then something like Biggy could speed things up.
Whenever you instantiate a new BiggyList it tries to read it's data from disk - this is good, and it's bad. It's good because from that point on whenever you try to query your data (using LINQ) it's an in-memory operation and you can't get much faster than that.
It's bad because this means you probably want to have a single DB instance around for the life of your app. This might be easy for some, might be repulsive to others. I'm used to doing this kind of thing with Node (all modules in Node are cached which means you always hit the same module instance).
For a blog engine, this might be a very fun thing to have - no database installs, superfast, easy to use. For a Twitter clone... not so much.
Just an Idea For Now
I'll be goofing around with this for a few months to see if it has legs but I thought it would be fun to share it with others - even if it's to tell me it sucks and to make it go away.
Getting closer to pushing a first release so I thought I'd try to expand on what Biggy is and why I made it
I believe viewing all data as the same is a bit of a mistake. Some data needs to be available all the time, quickly - other data might sit inside your database for years and get pulled out very rarely. To put things in concrete terms, let's consider on online store:
- Product and Customer information need to be at the ready for logging in and catalog display
- Order information is displayed occasionally - much less than Catalog/Customer information
- Log data is examined rarely
Our store is a simple process - the "input" data (Products and Customers) generate "output", or "record" data. The input data changes fairly often - Customers logging in and changing things, store owners changing prices, etc.
The output data doesn't change much - in analytical terms this is called "slowly changing over time" - you might go in and tweak an order here and there, but mostly it's a matter of historical record and should never be changed.
To me the "input" stuff (Products/Customers) is perfect for a looser, document structure. The output stuff should not only be in a relational structure - it should be denormalized, ready for analytical export to CSV or some other reporting system.
This is why I made Biggy: I wanted the best of both worlds. I want speed, I want fast writes, I want LINQ and the ability to store things in the simplest manner possible.
What Biggy Does
This is lightning fast and there's no SQL translation to deal with - it just pulls that record out of memory. More on this in a bit.
There are currently 3 ways to store documents:
- On disk in a JSON file (one per record type T)
- In Postgres database using the built-in JSON data type
- In SQL Server using regular text storage
We're working on more storage options currently, such as MongoDB and AzureTableStorage.
Let's say I've loaded up 10,000 products and now I want users to be able to search. Seems like it might be hard - but remember you're querying in memory so you don't have to worry about your DBA kicking your butt for doing this:
I'm crudely "benchmarking" our reads and writes - keep in mind this stuff varies by machine and so on. Here's what I have so far:
Azure web sites (free tier) give you 1G of RAM to play with - you'll have a pretty long time before you run out of RAM. All of Tekpub's data (customers, orders, logs, etc) capped out at 6.5Mb...
There's More, And I'll Keep Writing
I'm exhausted. I haven't gone on a tear like this in years but to see this come together has been extraordinarily fun. There's a lot more to touch on here, including:
- Non-document Lists. We have em, and if you want to use a regular relational structure with Biggy, you can.
SqlServerListwill do just that.
- Non in-memory stuff - we have that too. Under the covers it's all "just Massive" - so you can use
SqlServerTableto read and write (with dynamics) as you need to.
I like working with Document databases, and I like working with relational ones. I like LINQ, I like Postgres, and sometimes I just want to store data on disk in a JSON file: so I made Biggy.
This project started life as an implementation of ICollection that persisted itself to a file using JSON seriliazation. That quickly evolved into using Postgres as a JSON store, and then SQL Server. What we ended up with is the fastest data tool you can use.
Data is loaded into memory when your application starts, and you query it with Linq. That's it. It loads incredibly fast (100,000 records in about 1 second) and from there will sync your in-memory list with whatever store you choose.
Biggy supports both SQL Server and Postgres - but we develop with Postgres first so there are a few more bells and whistles for this amazing database (specifically: Full Text search over documents).
To define a Document Store, create an instance of DBDocumentList:
If you don't want to install a database engine, you don't have to. Biggy can load and write to disk easily:
You can move from the file store over to the relational store by a single type change (as well as moving data over). This makes Biggy attractive for greenfield projects and just trying stuff out.
The engine behind Biggy is a newer version of Massive that has some type-driven love built into it. If you want to run queries and do things like you always have, go for it:
SQlServerTable and PGTable are very, very close to Massive with a few things removed - specifically the dynamic query builder and the validation stuff. You can add that in as you see fit using the hooks we've always had:
A document-centric, "NoSQL"-style of development is great for high-read, quick changing things. Products, Customers, Promotions and Coupons - these things get read from the database continually and it's sort of silly. Querying in-memory makes perfect sense for this use case. For these you could use one of the document storage ideas above.could
A relational, write-oriented transactional situation is great for "slowly changing over time" records - like Orders, Invoices, SecurityLogs, etc. For this you could use a regular relational table using the PGTable or SQLServerTable as you see fit.
You only want to read the
InMemoryList<T>stuff off disk once - and this should be when your app starts up. This is pretty straightforward if you're using a Console or Forms-based app, but if you're using a web app this gets more difficult.
Fortunately, you have a few nice choices.
The first is to use your IoC container to instantiate Biggy for you. For this, create a wrapper class just like you would with EF:
Some applications have a ton of data and for that, Biggy might not be the best fit if you need to read from that ton of data consistently. We've focused on prying apart data into two camps: High Read, and High Write.
We're still solidifying our benchmarks, but in-memory read is about as fast as you can get. Our writes are getting there too - currently we can drop 100,000 documents to disk in about 2 seconds - which isn't so bad. We can write 10,000 records to Postgres and SQL Server in about 500ms - again not bad.
So if you want to log with Biggy - go for it! Just understand that if you use a
DBList<T>, it assumes you want to read too so it will store the contents in memory as well as on disk. If you don't need this, just use a
DBTable<T>(Postgres or SQLServer) and write your heart out.
You might also wonder about memory use. Since you're storing everything in memory - for a small web app this might be a concern. Currently the smallest, free sites on Azure allow you 1G RAM. Is this enough space for your data? Borrowing from Karl Seguin:
I do feel that some developers have lost touch with how little space data can take. The Complete Works of William Shakespeare takes roughly 5.5MB of storage
The entire customer, catalog, logging, and sales history of Tekpub was around 6MB. If you're bumping up against your data limit - just move from an in-memory list to a regular table object (as shown above) and you're good to go.
Please do! Here's what we ask of you:
If you've found a bug, please log it in the Issue list.
If you want to fork and fix (thanks!) - please fork then open a branch on your fork specifically for this issue. Give it a nice name.
Make the fix and then in your final commit message please use the Github magic syntax ("Closes #X" or Fixes etc) so we can tie your PR to you and your issue
Please please please verify your bug or issue with a test (we use XUnit and it's simple to get going)
Now that's you've seen this, you can get started using it, helping with it, or just check out how he does his magic...