Tech Off Thread

3 posts

File Cache/Storage dilemna

Back to Forum: Tech Off
  • User profile image
    clint_hill

    I have an application that I am working on that is on the front end of implementing some file storage. The choices I have are to store the files in SQL (Image type) and retrieve accordingly or to store the file on a network share and store meta info in the SQL DB. So, my question isn't which method is easier, but which method has the most advantage? My key points of consideration are for availability, performance and re-usability since this will probably end up being SOA. I have a rough assumption of average file size (1MB) - typically PDF, and being accessed roughly 100's of times a day if not more. My major concern is building a performant downloading utility for accessing these files.

    Any good suggestions? 

  • User profile image
    spod

    That's an interesting problem. I'd probably be tempted to go the file system route given the size of the docs. 

    There are definitely cases where having a two phase process will give you better scalability( web-service returns metadata and url of file, file download happens in background etc )
     
    The main problems i've seen with systems that store large blobs in sql is that you tend to end up with very long connection times for slow clients. This might not be relevant for your situation, but a 1meg doc over a 56k link takes 3minutes or so, which is a long time to tie up a server's resources ( it has to manage the ouput buffer etc ).

    Also when storing in sql you tend to copy a lot of memory around ( database -> webserver -> through middle tier -> http output buffer etc ) when passing these docs out to the user, and have to do a fair bit of processing ( base64 encoding etc ), compared to a simple file download..


    There are a lot of factors to consider here though. Backup and restore are interesting in the filesystem case, and you have two sets of permissions to manage etc.

    If you are only talking 100s of times a day, with docs not much bigger than 1meg, and a reasonable client topography ( they aren't all 2000 miles away on 56k modems etc Smiley ), i think both ways will work fine for you. Otherwise i'd err towards the filesystem / metadata in sql solution.

    All just an opinion though...interested in what others think...

    ========================================

    Here's a couple of examples of doing it both ways... 

    http://terraserver.microsoft.com/
      stores all its ( terrabytes of ) images in sql server. These are fairly small tiles ( 2-4k i think ) though, so chunk naturally over slow connections etc.

    Windows update uses the metadata in sql route, followed by a background download of the actual data managed by a smart client. These can be pretty big "documents" though...



  • User profile image
    clint_hill

    Thanks for the reply. I agree the FS/Meta in SQL is probably the best, and for reasons you outline specifically holding up the SQL resource for the download. But I find that really isn't as much of the problem as is the stream to the client. Performance aside I am finding that PDF files actually "request" twice? Here is the handler I built to accomodate file extensions and for PDF and SWF files, I am showing two hits to this handler?

    public void ProcessRequest(HttpContext context)
    {
    HttpResponse response = context.Response;
    string fileName = context.Request.QueryString["FileName"].ToString();
    if(fileName.IndexOf(".doc") > 0) response.ContentType = "application/msword";
    if(fileName.IndexOf(".pdf") > 0) response.ContentType = "application/pdf";
    if(fileName.IndexOf(".jpg") > 0) response.ContentType = "image/jpeg";
    if(fileName.IndexOf(".gif") > 0) response.ContentType = "image/gif";
    if(fileName.IndexOf(".swf") > 0) response.ContentType = "application/x-shockwave-flash";
    if(fileName.IndexOf(".xls") > 0) response.ContentType = "application/vnd.ms-excel";
    DateTime start = DateTime.Now;
    System.IO.FileStream file = new System.IO.FileStream(fileName, System.IO.FileMode.Open);;
    long bytesize = file.Length;
    byte[] buffer = new byte[(int)bytesize];
    file.Read(buffer, 0, (int)bytesize);
    file.Close();
    response.BinaryWrite(buffer);
    response.Flush();
    DateTime end = DateTime.Now;
    FileCaching.FileCache fileCache = new FileCacheing.FileCaching.FileCache();
    fileCache.Log(fileName,bytesize,end.Subtract(start).ToString(),FileCaching.WorkType.SelectFromFolder);

    }

    So now I have two hits to the FS? Is there something I am missing here? (Disregard the FileCaching it's only stubbed out right now).

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.