IOT Analytics Architecture Whiteboard with David Crook

Sign in to queue

Description

When working with IOT one of the more common questions we get is this:  "What is the typical architecture in IOT scenarios."  In this video, David Crook uses a white board to diagram and discuss a very common architecture when dealing with IOT devices and then addresses some questions the audience had at the end of the talk.  

Embed

Download

The Discussion

  • User profile image
    Lars

    Great explanation. What is you sugested datastore? Would azure tablestore be OK, for how should I store the files to be able to use them in hadoop (or spark)

  • User profile image
    DrCrook

    @Lars: wasb:// Blob storage is HDFS compliant, however I would suggest for new products to use Azure Data Lake as it is HDFS compliant, has infinite data capabilities and will support U-SQL and Azure Data Lake Analytics packages. 

  • User profile image
    larsoleruben

    OK, thanks a lot for you feedback. I really like this architecture. Do you have any customers doing data validation in the stream analytics part before saving data to the storage? I mean building a model of the devices from which the data comes and then by machine learning mark data to be questionable? Would that be possible to do for very large amounts of data?

  • User profile image
    DrCrook

    I don't see why not.  I'm interested to hear the use case.  One thing to note is because I can do something, doesn't mean I necessarily will.  For example, to generate an ML model on the fly in a stream means you have access to basically a windowed snapshot of the data, which is likely not very much data, you could theoretically bring in the historical stores as well, but then in my opinion you are defeating the purpose of Stream Analytics. 

    I would generate an ML model from my historical stores first, then dynamically pull up that model from stream and compare incoming objects to that.  I also do normalization of windowed objects (if necessary) in the stream.  You have to architect you ML algorithm fairly intelligently to use in Stream Analytics as to update the query itself, you need to recycle the stream job.  You could theoretically stand up a second job, then shut down the first.  I haven't tried it, but it should work.

    As for quantity of data that this will handle, you get up to16 channels per hub, and here is the page for input: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-availability-and-support-faq/

    You can then have different stream jobs listening to 1 or many channels and if necessary nest them by having the output feed into another input.

    Sounds like a great session topic :) 

     

     

  • User profile image
    meeran

    ok

  • User profile image
    Giuseppe Mascarella

    Great job in make is so simple and easy to remember.

Add Your 2 Cents