@Rowland: U-SQL is based on SCOPE (http://academic.research.microsoft.com/Paper/4439987.aspx). It has some similar concepts as Pig such as the global expression composition and thus the ability to optimize a script globally. I wouldn't call it a knock off of Pig though. The language is otherwise much more SQL and C# based.
@pserranne: I am glad you like it! Please let us know if you want us to drill deeper into any specific aspects.
@sokhaty: User-defined Aggregators are available now (that part of the documentation needs still to be written, thanks for reminding me :)). However they cannot be used in the context of windowing expressions. Please go to
to file a request. Do you need user-defined aggregators with windowing expression? User-defined ranking or user-defined analytics functions? Some of them are on our backlog but at a lower priority right now.
We are currently working on a local development experience inside VisualStudio (for now) that will provide local running of U-SQL, local debugging of server failures and more. However I would not call it an emulator (emulator implies some heavy weight installation requirement that we want to avoid) :).
@srikalyan: There is not really an ontology behind it (yet). We are extracting keyphrases or tags (in this case only 1-grams, although the algorithms could also handle n-grams). That information is made accessible in relational form through the table-valued functions. Generating and supporting ontologies is definitively something we are looking into for the future. How close that future is depends on many things though. What do you mean with access priviledges and authentication? This feature is integrated into SQL Server with Fulltext search so you have the standard SQL Server access controls.
@BLinden:All the SQL Server Full-text iFilters are supported. So in the case of XML or HTML, we will not index the markup tags or attribute names. If you want PDF support, you will need to install a third-party PDF iFilter such as Adobe's or FoxIt's.
"Is at least the fact that all of this is done with strong typing something new?"
This depends. XQuery and SQL both provide for both static and strong typing on declarative expressions. However, the integration into the programming language type system is well-done (and thus could be considered "novel").
Regarding the killer use cases for XML: I try to address this somewhat in the non-demo part. If your XML describes basically relational data that you want to repurpose, slice and dice in different ways, shredding it into relational form and doing normalization
is a good idea. If the XML OTOH is representing a markup document (such as a WordML or XHTML document), or the XML represents a logical unit (aka object) which you want to store and retrieve as efficiently as possible while still having the ability to query
into its components, storing the data as XML is easier and often preferrable.