Loading user information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading user information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements


Bart De Smet bdesmet Bart De Smet ​[MSFT::SQL::​Cloud​Programmabi​lity::Rx]
  • Bart De Smet: MinLINQ - The Essence of LINQ

    @Gregory81: Well, you're definitely welcome to do this stuff in Haskell. Let me say you won't be the first (hint Wink). Stay tuned.

    You're right that IE<T> and IO<T> are a bit more complex than what we've discussed here. I conveniently swept errors and disposing under the carpet. The former can be deal with by treating it as a value that can be hidden behind the T (i.e. Materialize approach, T is Notification<T>). The latter (disposable aspect) pops up in the types for sure; disposables have a very imperative nature to it and can be represented as IO () in Haskell.

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @Gregory81: Let's be clear, I'm talking here about Cata as the folding (for LINQ to Objects people, think Aggregate) operator for sequences of data, which leaves the IEnumerable<T> world in favor of a value of type R, outside the monad. In this interpretation, it's like unsafePerformIO in Haskell:

    • Ana (and return) gets you in.  A -> (A -> T) -> (A -> A) -> M<T>
    • Bind (and >>=) keeps you in.  M<T> -> (T -> M<R>) -> M<R>
    • Cata (and unsafePerformIO) get you out. M<T> -> C -> (C -> T -> C) -> C

    In our world of Rx and Ix, the monad's "aspect" is about lazy acquisition of data (which could go with lazy computation of the elements in the stream of course). The typing point of view is shown above too. If you make Cata lazy, it will be not the "streaming multiple values" kind of laziness that results; instead, you're deferring the catastrophy that's about to happen to the sequence being folded into a single value. (Exercise: There is a more mild form of aggregation in Rx though, but I wouldn't call the a Cata anymore. Go and find it.)

    All the side-effects that are due to iteration of the sequence (till the very end, provided there's even an end to it) will be surfaced at the point this kind of Cata is executed. It's like building LINQ to Objects on IEnumerable<T> by having operators turn their input sequence into a List<T> first (which is a Cata using Add calls), apply things like a filter or a map on the entire in-memory list, and then unfolding it again into a sequence (which is an Ana using the enumerator as the state, MoveNext as a terminating condition and Current as the selector). Well, if the user only wanted the projection of the first element, we definitely did too much work now (and we could get stuck if the sequence is infinite). Laziness was dropped inside the implementation, and the result of that leaks to the users (side-effects, bottoming out on infinite sequences, etc.). In that sense, using a Cata is harmful when used in inappropriate places.

    Have a look at Erik's paper titled "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" for some interesting insights (warning: Greek letters). Tip: there's more than ana and cata...

    A few practical things related to this discussion:

    • LINQ to SQL, and IQueryable<T> in general, leaves the monad early for aggregation operators. For example, products.Sum(p => p.Price) returns a decimal on the spot. Laziness is gone. The IEnumerable<T> monad retains laziness in the face of composition of such sequences. Because of this, you can't do something like dividing Sum and Count and have the whole thing executed remotely. Instead, you'll have two roundtrips. This is like "forgetting" about the M in M<T>, hence the unsafePerformIO analogy.
    • In Rx, we leave aggregates in the monad. Only First, Last, and Single are permitted to leave the monad (ignoring non-operators such as Subscribe). This way you retain the composition over such sequences, which is where monadic computation shines. For example, you could apply the Timeout operator to computation of an aggregate and unsubscribe before it actually comes with the result.

    For what GroupBy is concerned, you're absolutely right about it adding another layer. Keep thinking Smiley. Buffer is a different beast indeed - did you already look at the changes we made in the last Rx release? That may give some insightful clues... (Btw, a Channel 9 video on the latest Rx video is still coming up.)

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @Sebastian Gregor: IEnumerable<T> is a monad. It has a Return (new[] { value }) and a Bind (from x in xs from y in ys(x) select f(x, y) turns into xs.SelectMany(x => ys(x), (x, y) => f(x, y))) which is sufficient for our conclusion.

    What's extra though is the sequence part of an IEnumerable<T>, or any other sequence type we're talking about (I know at least three more in the Rx/Ix puzzle, in fact I do know even more Wink). This is where the Ana abstraction can be seen as a sequence-centric Return, in that many values can be computed to be turned into an object that's inside the monad.

    To see the layer map, check out the www.codeplex.com/LINQSQO project and use Visual Studio 2010 architecture explorer facilities to see the dependency graph of the methods in MinLinq. This will clearly show how all of the operators I've implemented there find their roots in Ana, Bind, and Cata.

    For your requested tip: Zip is a nice one you bring up. I'm emphasizing nice a bit sarcastically. What Zip does is take a pair of sequences and turn it into a sequence of pairs (generalized using a projection function):  Tuple<IEnumerable<A>, IEnumerable<B>> --> IEnumerable<Tuple<A, B>>. To achieve this (lazily and whatnot), you got to obtain elements from both sequence at the "same pace". Now have a look whether you got such a primitive hanging in there somewhere Smiley. Notice though a Zip can be used to build the fancy Where and Select overloads that take a natural index number in their function parameter (Zip with the infinite Ana of natural numbers)...

    Using Cata is rarely useful in operators itself, and in fact harmful as it breaks laziness. One exception is where you got to see the whole sequence anyway before you can produce any output. Examples include OrderBy and Reverse (exercise: write those using Ana and Cata).

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @exoteric: Don't stare too blind at the syntax here, it's merely a way to show the IE/IO duality in the simplest way possible by just reversing arrows. Other than that, we're simply working based on the Func<Func<T>> type.

    We fully embrace side-effects in this setting, as C# is good at. The arrows are not purist functional arrows, but rather fat arrows that can contain what you refer to as "magic". I believe I mentioned one way to turn it into a functional Haskell-style setting is to decorate the arrows (or rather their return type) with the IO monad. All we're doing here is bringing laziness to the table in one suitable way, but we're completely honest about side-effects induced by the cursor-based nature of enumerators.

    A head/tail decomposition at the function level, as done by your Stream<T> sample, is another possibility, introducing recursion at a type level. Whether it's strictly better is subject to debate though. There's only one value of type Void, hence a purist Streamer can only return the same Stream<T> every time the function is called. In fact, () -> T is a lazily computed constant; when a side-effect is desired, one would simply write IO T in Haskell (even the () -> is redundant due to the lazy nature of Haskell's evaluation mechanism).

    This gets into a chicken-or-egg IO problem (somehow I like to associate IO with chickens and eggs) for realistic scenarios where you're querying data from some store such as a database, webservice, and whatnot. There you'll have to have an IO somewhere. After all, the contents of the store can change and you'll want a new enumerator that represents a cursor on the world at that point in time (time is another way to think about implicit side-effects, where an impure function is implicitly parameterized on time). Related to the enumerator is some state that reflects server-side state too, moving over the result set as it was obtained at the point of the (initial?) request (providing a consistent - or not, depending on the guarantees provided by the store - view of the world at that time).

    For in-memory lazily computed streams, such as prime numbers to use an academic sample, we can do better. There all the properties of a purist functional and lazily evaluated language such as Haskell shine.

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @corvuscorax: I'm Belgian, from the Flemish part, near Ghent, where we speak Flemish, a variant of Dutch. I mostly think in English when I'm in the US or outside my home country, yes (though I curse in Flemish occasionally).

    Met vriendelijke groeten,

  • Bart De Smet: MinLINQ - The Essence of LINQ

    Dear Niners,

    You can actually find an implementation of FEnumerable and a bunch of operators on www.codeplex.com/linqsqo. To show it truly works, I've provided a port of Luke Hoban's LINQ-based Ray Tracer as well, as part of the code. More information on conversion operators between classic IEnumerable and purist FEnumerable can be found in my blog post, mentioned in the description of the video above. I also describe how to get query expression syntax working on top of the FEnumerable implementation.


  • Bart De Smet: MinLINQ - The Essence of LINQ

    @Gregory81: Perfect answer. Materialize - Where (or SelectMany of course...) - Dematerialize is a roundtrip scheme that can be used to implement exception handling for sequences. T is turned into Notifcation<T> and back.

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @AdamSpeight2008: To define First, think of an Aggregate whose accumulation object tracks the first element and whether it has seen it already. However, keep in mind this is not the best definition of First by any means, because Aggregate will drain the entire sequence before it can produce the result. This doesn't work for side-effecting iterators (in LINQ to Objects, you won't see the effects for anything but enumeration to the first element), and infinite sequences. Notice how First is one of the only operators in Rx that leaves the monad; aggregates in the Ix world are a bit of a cata* (fill in the blanks yourself) from a monadic point of view. In Rx, aggregates return single values inside the monad, which allows for cancellation, and a Cata (in the sense used in the interview) has less value in this world.

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @ShinNoNoir: Thanks for watching in 2011 too! The Amb roundtrip is indeed a beautiful one.

    There are plenty of other ones, so here's one exercise for the viewers: assume your team has only the budget to use one try...catch block (assume each block costs as $1,000 and the licensing model of the C# compiler is to pay per try...catch used) in the entire codebase of your LINQ to Objects implementation of the Standard Query Operators (i.e. IEnumerable<T> based). How'd you spend it (*) such that you can easily implement error-handling operators in the future? Examples include operators such as Catch :: IE<T> -> (Exception -> IE<T>) -> IE<T>, but also Retry, etc.

    Tip: you want to realize Catch, Retry, etc. using a roundtrip scheme of IE<T> -> ??<T> -> (wow, error handling got so much easier here thanks to the $1,000 well-spent) -> ??<T> -> IE<T>. Try to find what I've highlighted in bold and you may be onto something (nested tip: we got this in Rx and Ix).

    (*) Constraint: use the try...catch inside a proper operator/combinator that deals with sequences. I.e. simply wrapping try...catch in a generic helper method such as the one shown below is not an option:

    static void Catch<E>(Action action, Action<E> handler) where E : Exception
    catch (E e)

  • Bart De Smet: MinLINQ - The Essence of LINQ

    @exoteric: Sure, there's real-world significance, ranging from setting a design philosophy for a query language or processor with proper layering of operators and their semantics, where compositionality rules the world, onto implementations itself. I won't yet comment on the latter category at this stage, rest assured we can keep ourselves more than busy on our team.

    For now, it should be clear that the MinLINQ implementation is truly meant as an academic exercise, in particular when it comes to using the implementation for real-world scenarios, due to the various shortcuts that have been taken (no disposable nature, error handling omitted, etc.).

    Exercise for the viewers: what'd be a good generalization of the set of GroupBy operators? How about other kinds of partitioning of an input sequence in chunks (e.g. Buffer* operators in Rx)?