Coffeehouse Thread

21 posts

Create an Open Source Cloud

Back to Forum: Coffeehouse
  • vesuvius

    I would like to create my own private cloud, hosted on my servers, which can be Windows or Linux but I would like to use .NET/WCF in the service layer. I don't need a fancy/schmancy UI like Azure and I don't want use Amazon either. The data will be large i.e. images, video and lossless audio.

    I know Hadoop/MapReduce is a central component to creating a cloud OS, what else is required?

    Look, I know I cannot compete with Azure or Amazon, but I need a far smaller similar version of those services, how would you go about it?

  • Dr Herbie

    @vesuvius: If I'm honest, I would start searching for open source implementations and start from one of those; I'm far too old and cynical to start a project like this from scratch.

    Herbie

  • Bas

    I'd have to agree with Dr Herbie, even massively scaled down that still sounds like quite a task to create yourself. I hope you do manage it and let us know how, though.

  • JohnAskew

    open stack deploy hadoop

    http://www.ibm.com/developerworks/cloud/library/cl-openstack-deployhadoop/index.html

    I don't know anything about it... EDIT: but I can imagine a lot of whiteboard time can be culled from these diagrams and documents...

  • Proton2
  • vesuvius

    @JohnAskew: that is pretty informative, not finished reading through it but just what the doctor ordered

  • vesuvius

    The only thing I see with that is an even bigger license fee than SQL I just don't see the whole world running on Azure and Amazon and it is going to get even more expensive.

  • PaoloM

    , vesuvius wrote

    I would like to create my own private cloud, hosted on my servers, which can be Windows or Linux but I would like to use .NET/WCF in the service layer. I don't need a fancy/schmancy UI like Azure and I don't want use Amazon either. The data will be large i.e. images, video and lossless audio.

    I know Hadoop/MapReduce is a central component to creating a cloud OS, what else is required?

    Look, I know I cannot compete with Azure or Amazon, but I need a far smaller similar version of those services, how would you go about it?

    Define "private cloud".

    What do you need to do with it? What are the requirements? What makes you think that distributed massive storage components are required in your case?

  • vesuvius

    @PaoloM: I need to be able to store large amounts of data, lets take http://soundcloud.com/ for example (or even channel 9 but far smaller scale) where they have a CDN.

    The problem is one of storage, as I don't think all the videos here on channel 9 are saved in SQL. NoSQL seems to be the option, but I need load balancing and multitenancy, the ability to process volume shadow copy and so on and pretty good reliability and performance

    WCF is pretty powerful in allowing you to get the content onto the server, but I need a clean way of managing it all. I don't need service bus architecture, as at present it will be C++/ or preferably .NET based.

  • PaoloM

    , vesuvius wrote

    @PaoloM: I need to be able to store large amounts of data, lets take http://soundcloud.com/ for example (or even channel 9 but far smaller scale) where they have a CDN.

    I don't understand, do you have access to a CDN or you're saying that soundcloud and ch9 have a CDN? Smiley

    The problem is one of storage, as I don't think all the videos here on channel 9 are saved in SQL.

    I don't think they're stored in a database at all. If I were to create such an architecture, I would use a RDBMS for storing metadata and just a file system to store the actual large files.

    NoSQL seems to be the option, but I need load balancing and multitenancy, the ability to process volume shadow copy and so on and pretty good reliability and performance

    My understanding of NoSQL systems is that they're used mainly where you have to process massive (upwards of millions of items) amounts of loosely typed data, not where you have a small amount of large files...

  • JohnAskew

    Soundcloud is a nice offering, I use it quite a lot, but the W7P client should have their playlist operations run in a background thread so it doesn't interrupt playback!

    They do/should have a CDN...

  • vesuvius

    , PaoloM wrote 

    I don't understand, do you have access to a CDN or you're saying that soundcloud and ch9 have a CDN? Smiley

    I think I saw a post somewhere saying Channel 9 uses or has moved to a CDN on Azure. I think it was when many of us were experiencing deja-vu seeing posts from a few months back

    I don't think they're stored in a database at all. If I were to create such an architecture, I would use a RDBMS for storing metadata and just a file system to store the actual large files.

    I guess for me it would have to be using a tried a tested metadata system. That is the part that I am most concerned about. It's not that I am scared of innovating, but this is something I don't want to have to have to write and test from scratch. Is there a system in wide use?

    My understanding of NoSQL systems is that they're used mainly where you have to process massive (upwards of millions of items) amounts of loosely typed data, not where you have a small amount of large files...

    I have only looked into this briefly, but I have more questions than answers at the moment.

  • Dr Herbie

    @vesuvius: I just bought "NoSQL Distilled" from Amazon (waiting arrival next week).  I'll post a review once I get around to reading it.

    Herbie

    http://www.amazon.co.uk/gp/product/0321826620/ref=oh_details_o03_s00_i00

  • Bass

    How much GNU/Linux experience do you have?

  • MasterPi

    Are you just basically looking to create a distributed file system to be able to scale for large data? There's MogileFS, but you could also easily code one up to suit your needs.

    Unless you wanted something specific in terms of a "cloud"...

  • vesuvius

    @Bass: Not a huge amount but I can hire a developer if the solution is robust

  • vesuvius

    @MasterPie: will have a look at that later @ work

  • Bass

    GlusterFS is another option. Hadoop's FS (HDFS) is also obviously another option, esp. if you intend to use other components of Hadoop. OpenStack Swift also. Ceph also. And more...

    Most if not all of these can be interacted with via FUSE modules or REST over HTTP, so writing a service in .NET should be possible (not even considering Mono).

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.