@Deepak: I think where you calculate depends on your use case. You can calculate it in the Data Lake on historical data and use it wherever you want if you like. You can also calculate it in the data stream on a rolling window of values. I think it depends on what you are calculating that deviation for as well as where, when and why you want to use it.
Performance of course as well as structure of the data streams is also important to consider. For example, If the solution has multiple hubs for the same devices/data, you may have to do it at the data lake or come up with a stream aggregation methodology to push everything in properly. Remember, there are ways to reduce data at the first processor and still get the same answers to the aggregation processor as well if speed/size are issues for you.
I suppose the sum of the answer is "it depends greatly" as there are so many different ways to solve the problem.
I don't see why not. I'm interested to hear the use case. One thing to note is because I can do something, doesn't mean I necessarily will. For example, to generate an ML model on the fly in a stream means you have access to basically a windowed snapshot of the data, which is likely not very much data, you could theoretically bring in the historical stores as well, but then in my opinion you are defeating the purpose of Stream Analytics.
I would generate an ML model from my historical stores first, then dynamically pull up that model from stream and compare incoming objects to that. I also do normalization of windowed objects (if necessary) in the stream. You have to architect you ML algorithm fairly intelligently to use in Stream Analytics as to update the query itself, you need to recycle the stream job. You could theoretically stand up a second job, then shut down the first. I haven't tried it, but it should work.
@Lars: wasb:// Blob storage is HDFS compliant, however I would suggest for new products to use Azure Data Lake as it is HDFS compliant, has infinite data capabilities and will support U-SQL and Azure Data Lake Analytics packages.