Adding full text indexing to your app is a hOOt!
- Posted: Sep 05, 2011 at 6:00 AM
- 10,458 Views
- 4 Comments
Loading User Information from Channel 9
Something went wrong getting user information from Channel 9
Loading User Information from MSDN
Something went wrong getting user information from MSDN
Loading Visual Studio Achievements
Something went wrong getting the Visual Studio Achievements
Search. Everyone uses it all day, every day. The web's had it for forever in Net years and now we're growing to expect it just about everywhere else.
So you're a dev and you want to build in full text indexing and searching into your app. But you don't want to build in a dependency on something that has to be locally installed and configured, like Windows Search. You've searched the web and found some other libraries, but they seem a little much for what you need.
Plus, being the coder you are, you'd like to also try to understand just how the full text indexing and searching works (when you have a few spare cycles to dig into that anyway). So you'd like a article with some details, a sample app, and the engine you could embedded in your app, and the source to it all.
Wouldn't all that just a be...
hOOtis a extremely small size and fast embedded full text search engine for .net built from scratch using an inverted WAH bitmap index. Most people are familiar with an Apache project by the name of Lucene.net which is a port of the original java version. Many people have complained in the past why the .net version of lucene is not maintained, and many unsupported ports of the original exists. To circumvent this I have created this project which does the same job, is smaller, simpler and faster.
hOOtis part of my upcoming
RaptorDB document store database, and was so successful that I decided to release it as a separate entity in the meantime.
hOOtuses the following articles :
- WAH compressed BitArray found here (WAHBitArray.aspx)
- mini Log4net replacement found here (http://www.codeproject.com/KB/miscctrl/minilog4net.aspx)
- MurMur2 hash index and storage file from
RaptorDBfound here (RaptorDB.aspx)
fastJSONserializer found here (http://www.codeproject.com/KB/IP/fastJSON.aspx)
- IFilter without COM by Eyal Post found here (http://www.codeproject.com/KB/cs/IFilter.aspx) for the sample application
Based on the response and reaction of users to this project, I will upgrade and enhance
hOOtto full feature compatibility with lucene.net, so show your love.
Why Another Full Text Indexer?
I was always fascinated by how Google searches in general and lucene indexing technique and its internal algorithms, but it was just too difficult to follow and anyone who has worked with lucene.net will attest that it is a complicated and convoluted piece of code. While some people are trying to create a more .net optimized version, the fact of the matter is that it is not easy to do with that code base. What amazes me is that nobody has rewritten it from scratch.
hOOtis much simpler, smaller and faster than lucene.net.
One of the reasons for creating
hOOtwas for implementing full text search on string columns in RaptorDB - the document store version. Hopefully more people will be able to use and extend
hOOtinstead of lucene.net as it is much easier to understand and change.
hOOthas been built with the following features in mind:
- Blazing fast operating speed (see performance test section)
- Incredibly small code size.
- Uses WAH compressed BitArrays to store information.
- Multi-threaded implementation meaning you can query while indexing.
- Tiny size only 38kb DLL (lucene.net is ~300kb).
- Highly optimized storage, typically ~60% smaller than lucene.net (the more in the index the greater the difference).
- Query strings are parsed on spaces with the
ANDoperator (e.g. all words must exist).
- Wildcard characters are supported (*,?) in queries.
ORoperations are done by default (like lucene).
ANDoperations require a (+) prefix (like lucene).
NOToperations require a (-) prefix (like lucene).
The article continues on and covers just how to use hOOt in your app;
And best of all, the article goes into some depth on just how it works, how the indexing goes, how the results are saved and searched.
So you'd expect this to be some kind of code beast, right? A massively sized project?
The zip with the source and same app is 57k.
Here's the Solution (as in this is it);
The project works just as expected. Being meta, I used hOOt to index hOOT...
So you ever thought that your app would benefit from having full text indexing and searching, but the existing libraries put you off and you didn't want to take a dependency on a third party solution, then hOOt could be the thing you've been hoping for...