Going Deep

Bart De Smet: MinLINQ - The Essence of LINQ

Download this episode

Download Video


As you must know by now, Erk Meijer and team spend time thinking about and discovering the Essence in things. One year ago today, Bart De Smet blogged about the notion of a core set of LINQ operators, MinLINQ, the essence of LINQ.

"Hey Bart, what is MinLINQ, exactly?"

"MinLINQ is an implementation of the LINQ to Objects Standard Query Operators using a function-oriented layered approach. Based on three essential operators called Ana, Bind and Cata, others are implemented. While the current implementation focuses on IEnumerable<T> exclusively, the same layering can be used to a dual IObservable<T> implementation."

Sit back, relax and jump into the layered functional rabbit hole where we meet some interesting characters named Ana, Bind and Cata. It's all about the essence, Alice.

Happy New Year!





Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • conard

      Can't wait to see what Bart has for us next.  Big Smile

    • felix9

      Ana, Bind, Cata

      Shape, Wrap, Roll

      CAR, CDR, CONS

      balabing, balabang, balaboom

    • exoteric
      Great! @Bart you mention this as an academic exercise. Don't you see real-world significance for this kind of simplification or is it perhaps at odds with things like optimal translation of reified expressions into SQL? @Felix don't forget hylo-, synchro-, etc.
    • jeroenverhu​lst

      Mindblowing, as usual Smiley

      Great work Bart!


    • ShinNoNoir

      Great way to start the year (currently half way through the video). Smiley

      @Bart: I'm amazed by your simple IE->IO->Amb->IE roundtrip example. It is concise yet clear, demonstrating the importance of orthogonality.

    • bdesmet

      @exoteric: Sure, there's real-world significance, ranging from setting a design philosophy for a query language or processor with proper layering of operators and their semantics, where compositionality rules the world, onto implementations itself. I won't yet comment on the latter category at this stage, rest assured we can keep ourselves more than busy on our team.

      For now, it should be clear that the MinLINQ implementation is truly meant as an academic exercise, in particular when it comes to using the implementation for real-world scenarios, due to the various shortcuts that have been taken (no disposable nature, error handling omitted, etc.).

      Exercise for the viewers: what'd be a good generalization of the set of GroupBy operators? How about other kinds of partitioning of an input sequence in chunks (e.g. Buffer* operators in Rx)?

    • bdesmet

      @ShinNoNoir: Thanks for watching in 2011 too! The Amb roundtrip is indeed a beautiful one.

      There are plenty of other ones, so here's one exercise for the viewers: assume your team has only the budget to use one try...catch block (assume each block costs as $1,000 and the licensing model of the C# compiler is to pay per try...catch used) in the entire codebase of your LINQ to Objects implementation of the Standard Query Operators (i.e. IEnumerable<T> based). How'd you spend it (*) such that you can easily implement error-handling operators in the future? Examples include operators such as Catch :: IE<T> -> (Exception -> IE<T>) -> IE<T>, but also Retry, etc.

      Tip: you want to realize Catch, Retry, etc. using a roundtrip scheme of IE<T> -> ??<T> -> (wow, error handling got so much easier here thanks to the $1,000 well-spent) -> ??<T> -> IE<T>. Try to find what I've highlighted in bold and you may be onto something (nested tip: we got this in Rx and Ix).

      (*) Constraint: use the try...catch inside a proper operator/combinator that deals with sequences. I.e. simply wrapping try...catch in a generic helper method such as the one shown below is not an option:

      static void Catch<E>(Action action, Action<E> handler) where E : Exception
      catch (E e)

    • CKurt

      Great video! Mindblowing and somehow back to basics.

    • Adam​Speight2008

      Great interview and video to start the new year off with.

      Max and Min in terms of using Aggregate could be written like so.

       Function Max(Of T As IComparable)(ByVal c As IEnumerable(Of T)) As T
          Return Enumerable.Aggregate(c,
                                      Function(a, e) (
                                        Function(_x As T, _y As T) {_y, _x, _x}(Math.Sign(_x.CompareTo(_y)) + 1)
                                          )(a, e)
        End Function
        Function Min(Of T As IComparable)(ByVal c As IEnumerable(Of T)) As T
          Return Enumerable.Aggregate(c,
                                      Function(a, e) (
                                        Function(_x As T, _y As T) {_x, _x, _y}(Math.Sign(_x.CompareTo(_y)) + 1)
                                          )(a, e)
        End Function

      I'm using 

      Math.Sign(_x.CompareTo(_y)) + 1

      because the IComparable interface doesn't require the return value to limited to the values -1, 0 and +1, so I have to normalise it to this range. 


      Note that the above implementation need the extension method .First. So this requires a function to exist which returns the first value of the Enumerable. According to my understanding this function would be a CATA, so it is based off an Aggregate.

      Now I think that be challenge. How do write First in terms of Aggregate. (Can it be done without the indexing overload of SelectMany?)

      Thinking Headwear on. 

    • Adam​Speight2008

      Done First but it used the Indexing overload.

        Function First(Of T)(ByVal src As IEnumerable(Of T)) As T
          Return Enumerable.Aggregate(Of T)(
            Enumerable.SelectMany(Of T, T)(src,
                                           Function(curr As T, index As Integer) As IEnumerable(Of T)
                                             Return If(index = 0, Enumerable.Repeat(curr, 1), Enumerable.Empty(Of T))
                                           End Function), Function(accum, curr) (curr))
        End Function

    • Adam​Speight2008

      How would Sum work?

      Using Aggregate it would simply be.


        Function Sum(Of T )(ByVal src As IEnumerable(Of T)) As Integer    Return Enumerable.Aggregate(src, Function(accum, curr) accum + curr)
        End Function

      But the problem is the addition, it would work if the Generic Type could constrained to only type that implement an addition operator.

      Would this suggest that .Sum() isn't generic and is an Extension method with multiple overloads, one for each arithmetical type. Eg Long, Integer, Double etc? 

    • ShinNoNoir


      The problem is that operators are static in C# (Shared in VB), so you are pretty much forced to multiple overloads.

      Here's something I found about generic operators written by Marc Gravell and Jon Skeet: http://www.yoda.arachsys.com/csharp/genericoperators.html


    • Gregory81

      , bdesmet wrote   

       assume your team has only the budget to use one try...catch block in the entire codebase of your LINQ to Objects implementation of the Standard Query Operators (i.e. IEnumerable<T> based). How'd you spend it (*) such that you can easily implement error-handling operators in the future?

      That can be done using operator analogous to Materialize of Ix. It makes transition from implicit exception-handling mode to explicit Error monad.






    • ShinNoNoir

      So I've watched the video and been pondering about something. Bart, you use Ana instead of Unit (or Return). As you explained in the video, you can define Unit in terms of Ana. So far's everything fine.

      The thing I've been pondering about, can you always define Ana for each monad M?

    • bdesmet

      @AdamSpeight2008: To define First, think of an Aggregate whose accumulation object tracks the first element and whether it has seen it already. However, keep in mind this is not the best definition of First by any means, because Aggregate will drain the entire sequence before it can produce the result. This doesn't work for side-effecting iterators (in LINQ to Objects, you won't see the effects for anything but enumeration to the first element), and infinite sequences. Notice how First is one of the only operators in Rx that leaves the monad; aggregates in the Ix world are a bit of a cata* (fill in the blanks yourself) from a monadic point of view. In Rx, aggregates return single values inside the monad, which allows for cancellation, and a Cata (in the sense used in the interview) has less value in this world.

    • bdesmet

      @Gregory81: Perfect answer. Materialize - Where (or SelectMany of course...) - Dematerialize is a roundtrip scheme that can be used to implement exception handling for sequences. T is turned into Notifcation<T> and back.

    • bdesmet

      Dear Niners,

      You can actually find an implementation of FEnumerable and a bunch of operators on www.codeplex.com/linqsqo. To show it truly works, I've provided a port of Luke Hoban's LINQ-based Ray Tracer as well, as part of the code. More information on conversion operators between classic IEnumerable and purist FEnumerable can be found in my blog post, mentioned in the description of the video above. I also describe how to get query expression syntax working on top of the FEnumerable implementation.


    • corvuscorax

      Hey Bart,

      Completely tangential and just out of curiosity ... you think in English? Not in French or Flemish? (You're from Belgium*, right?)

      be well

      *Pardon me

    • bdesmet

      @corvuscorax: I'm Belgian, from the Flemish part, near Ghent, where we speak Flemish, a variant of Dutch. I mostly think in English when I'm in the US or outside my home country, yes (though I curse in Flemish occasionally).

      Met vriendelijke groeten,

    • cammcad

      Amazing work Bart! Looking forward to more lectures. Keep the functional speak coming.

    • Jules

      Interesting. Just another idea to get early stopping, you could instead of passing in the aggregate so far, pass in a function that when called computes the rest. Then you can choose to call this function or not.
      Instead of this:
      let fold xs init f =     match xs with    | [] -> init    | x::r -> f x (fold r init f)
      You do this:
      let fold xs init f =     match xs with    | [] -> init    | x::r -> f x (fun () -> (fold r init f))
      If we assume that the && operator is short circuiting, then the following is a short circuiting All function:
      let all p xs = fold (fun x r -> p x && r()) true xs
      i.e. as soon as the predicate returns false, the loop stops and returns false.

    • exoteric

      I see this pseudo-syntactic definition

      () -> (() -> T?)

      but raise you this haxe-syntactic definition (If I recall syntax correctly)

      alias Streamer = Void -> Stream<T>

      enum Stream<T>
          some(x: T, xs: Void -> Stream<T>);

      This is a more purely functional style as a Streamer would normally always have the same result, every time you call it and you have to deconstruct the stream to get to successive elements, whereas with FEnumerable, somehow magically, the same enumerator function will return different results every time you call it.

      It just happens there's no natural way to model this kind of structure in C#.

    • bdesmet

      @exoteric: Don't stare too blind at the syntax here, it's merely a way to show the IE/IO duality in the simplest way possible by just reversing arrows. Other than that, we're simply working based on the Func<Func<T>> type.

      We fully embrace side-effects in this setting, as C# is good at. The arrows are not purist functional arrows, but rather fat arrows that can contain what you refer to as "magic". I believe I mentioned one way to turn it into a functional Haskell-style setting is to decorate the arrows (or rather their return type) with the IO monad. All we're doing here is bringing laziness to the table in one suitable way, but we're completely honest about side-effects induced by the cursor-based nature of enumerators.

      A head/tail decomposition at the function level, as done by your Stream<T> sample, is another possibility, introducing recursion at a type level. Whether it's strictly better is subject to debate though. There's only one value of type Void, hence a purist Streamer can only return the same Stream<T> every time the function is called. In fact, () -> T is a lazily computed constant; when a side-effect is desired, one would simply write IO T in Haskell (even the () -> is redundant due to the lazy nature of Haskell's evaluation mechanism).

      This gets into a chicken-or-egg IO problem (somehow I like to associate IO with chickens and eggs) for realistic scenarios where you're querying data from some store such as a database, webservice, and whatnot. There you'll have to have an IO somewhere. After all, the contents of the store can change and you'll want a new enumerator that represents a cursor on the world at that point in time (time is another way to think about implicit side-effects, where an impure function is implicitly parameterized on time). Related to the enumerator is some state that reflects server-side state too, moving over the result set as it was obtained at the point of the (initial?) request (providing a consistent - or not, depending on the guarantees provided by the store - view of the world at that time).

      For in-memory lazily computed streams, such as prime numbers to use an academic sample, we can do better. There all the properties of a purist functional and lazily evaluated language such as Haskell shine.

    • Sebastian Gregor

      Thanks so much for your explanations bart!
      After some introduction to monads i once tried to understand linq exactly in that primitive building block manner but gave up in the middle. so this was really helping!
      However i would like to conclude IEnumerable<T> is a monad. But isn't it more also - at least when you count in its extension methods? To understand if it is exactly a monad or can be viewed as a construct based upon a monad, it still would be great to have a list of all operators that are based upon the 3 primitives.
      Or maybe even better: It would help to know all other paradigms (or their primitives) of those operators not covered by the monad aspect of IEnumerable<T> / IObservable<T>.
      And please give me a hint: Is it possible to write an elegant short version of "zip" based on ana, bind and cata? I would like to try that...

    • bdesmet

      @Sebastian Gregor: IEnumerable<T> is a monad. It has a Return (new[] { value }) and a Bind (from x in xs from y in ys(x) select f(x, y) turns into xs.SelectMany(x => ys(x), (x, y) => f(x, y))) which is sufficient for our conclusion.

      What's extra though is the sequence part of an IEnumerable<T>, or any other sequence type we're talking about (I know at least three more in the Rx/Ix puzzle, in fact I do know even more Wink). This is where the Ana abstraction can be seen as a sequence-centric Return, in that many values can be computed to be turned into an object that's inside the monad.

      To see the layer map, check out the www.codeplex.com/LINQSQO project and use Visual Studio 2010 architecture explorer facilities to see the dependency graph of the methods in MinLinq. This will clearly show how all of the operators I've implemented there find their roots in Ana, Bind, and Cata.

      For your requested tip: Zip is a nice one you bring up. I'm emphasizing nice a bit sarcastically. What Zip does is take a pair of sequences and turn it into a sequence of pairs (generalized using a projection function):  Tuple<IEnumerable<A>, IEnumerable<B>> --> IEnumerable<Tuple<A, B>>. To achieve this (lazily and whatnot), you got to obtain elements from both sequence at the "same pace". Now have a look whether you got such a primitive hanging in there somewhere Smiley. Notice though a Zip can be used to build the fancy Where and Select overloads that take a natural index number in their function parameter (Zip with the infinite Ana of natural numbers)...

      Using Cata is rarely useful in operators itself, and in fact harmful as it breaks laziness. One exception is where you got to see the whole sequence anyway before you can produce any output. Examples include OrderBy and Reverse (exercise: write those using Ana and Cata).

    • exoteric

      @Bart yes, Streamer should be something like S -> Stream<T> or it will be pretty boring, but that should do it, I think Smiley

    • Gregory81

      @bdesmet: On difference between GroupBy and Buffer* methods. GroupBy adds new "monadic layer", so it's signature (not taking Keys into account) should be M<T> -> T -> TKey -> M<M<T>>. It looks like specialized version of Ana applied to already existing monadic value as a seed.

      Buffer is a different beast as it adds takes M<T> and produces M<List<T>>.

      , bdesmet wrote

      Using Cata is rarely useful in operators itself, and in fact harmful as it breaks laziness. One exception is where you got to see the whole sequence anyway before you can produce any output. 


      Didn't get that part. If Cata itself is lazy there should not be any problems. Or you mean that lazy invocations of Cata are usefull and direct invocations are harmfull?

      Correct me if I'm wrong, but I thought that for every monad there exist Ana-Cata pair and every monadic operation may be written in terms of Ana and Cata. However in general there's no easy way to find Ana and Cata for a given monad.

    • bdesmet

      @Gregory81: Let's be clear, I'm talking here about Cata as the folding (for LINQ to Objects people, think Aggregate) operator for sequences of data, which leaves the IEnumerable<T> world in favor of a value of type R, outside the monad. In this interpretation, it's like unsafePerformIO in Haskell:

      • Ana (and return) gets you in.  A -> (A -> T) -> (A -> A) -> M<T>
      • Bind (and >>=) keeps you in.  M<T> -> (T -> M<R>) -> M<R>
      • Cata (and unsafePerformIO) get you out. M<T> -> C -> (C -> T -> C) -> C

      In our world of Rx and Ix, the monad's "aspect" is about lazy acquisition of data (which could go with lazy computation of the elements in the stream of course). The typing point of view is shown above too. If you make Cata lazy, it will be not the "streaming multiple values" kind of laziness that results; instead, you're deferring the catastrophy that's about to happen to the sequence being folded into a single value. (Exercise: There is a more mild form of aggregation in Rx though, but I wouldn't call the a Cata anymore. Go and find it.)

      All the side-effects that are due to iteration of the sequence (till the very end, provided there's even an end to it) will be surfaced at the point this kind of Cata is executed. It's like building LINQ to Objects on IEnumerable<T> by having operators turn their input sequence into a List<T> first (which is a Cata using Add calls), apply things like a filter or a map on the entire in-memory list, and then unfolding it again into a sequence (which is an Ana using the enumerator as the state, MoveNext as a terminating condition and Current as the selector). Well, if the user only wanted the projection of the first element, we definitely did too much work now (and we could get stuck if the sequence is infinite). Laziness was dropped inside the implementation, and the result of that leaks to the users (side-effects, bottoming out on infinite sequences, etc.). In that sense, using a Cata is harmful when used in inappropriate places.

      Have a look at Erik's paper titled "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" for some interesting insights (warning: Greek letters). Tip: there's more than ana and cata...

      A few practical things related to this discussion:

      • LINQ to SQL, and IQueryable<T> in general, leaves the monad early for aggregation operators. For example, products.Sum(p => p.Price) returns a decimal on the spot. Laziness is gone. The IEnumerable<T> monad retains laziness in the face of composition of such sequences. Because of this, you can't do something like dividing Sum and Count and have the whole thing executed remotely. Instead, you'll have two roundtrips. This is like "forgetting" about the M in M<T>, hence the unsafePerformIO analogy.
      • In Rx, we leave aggregates in the monad. Only First, Last, and Single are permitted to leave the monad (ignoring non-operators such as Subscribe). This way you retain the composition over such sequences, which is where monadic computation shines. For example, you could apply the Timeout operator to computation of an aggregate and unsubscribe before it actually comes with the result.

      For what GroupBy is concerned, you're absolutely right about it adding another layer. Keep thinking Smiley. Buffer is a different beast indeed - did you already look at the changes we made in the last Rx release? That may give some insightful clues... (Btw, a Channel 9 video on the latest Rx video is still coming up.)

    • Gregory81


      First of all the easy part: not so catastrophic version of Aggregate is Scan.

      On lazyness as the main monadic aspect of IEnumerable. The problem is that both IEnumerable and IObservable are quite complex creations. There are at least three (maybe more) aspects in IEnumerable (like returning multiple values, handling exceptions and deferring side-effects) and at least four (or even five) in IObservable. Had we used haskell (not that I know it), instead of IEnumerable we'd written something like List<Error<IO<T>>> (or sometimes Set<Error<IO<T>>>). In Ix there are a lot of operators that leave one monadic aspect or the other. Like Catch leaves Error monad, or SumEnumerable leaves List monad (but not IO monad) - in some sense these are catamorphisms. Of course, if we're talking of IO as a main part of IEnumerable monad then indeed Cata would equal to unsafePerformIO.

      What I was trying to say in previous post, given that the only way to deconstuct a monadic value is Cata, you'd had to use it somehow in the implementation of Bind.

      By the way I think it can be good exercise to write approximate signatures of Ix and Rx operators in haskell, taking side-effects and exceptions into account.

      Erik's paper is my reading list (already quite some time). I yet had to get used to all hylomorphisms and co.


    • luquor

      Like Catch leaves Error monad, or SumEnumerable leaves List monad (but not IO monad) - in some sense these are catamorphisms. 

    • Heavens​Revenge

      Have you considered "bottom" to express null equivalency in your syntax for your core implementation? As a singular binding bottom function application equal to null might do the trick.


      Im no LINQ or C# pro, do you think it would be a suitable solution for you Bart?

    • bdesmet

      @Gregory81: Well, you're definitely welcome to do this stuff in Haskell. Let me say you won't be the first (hint Wink). Stay tuned.

      You're right that IE<T> and IO<T> are a bit more complex than what we've discussed here. I conveniently swept errors and disposing under the carpet. The former can be deal with by treating it as a value that can be hidden behind the T (i.e. Materialize approach, T is Notification<T>). The latter (disposable aspect) pops up in the types for sure; disposables have a very imperative nature to it and can be represented as IO () in Haskell.

    • Gregory81

      @bdesmet:About GroupBy and Buffer methods. That's the best what I can come up with:

      The elements of IE<T> and IO<T> can be associated with additional values: index and time (for IO<T>). These values arise from specific properties of Bind operator of each monad.

      For IE<T> Bind preserves internal ordering of elements within containers. It is possible to write non-preserving version too (and Px has that version AFAIR). Even inside standart LINQ not all operators respect the order (for example, Union and other set-related operators but that's because they should belong to different monad). However basic operations do preserve the order.

      For IO<T> as events have global order (they can be ordered by time when an event happened) Bind preserves that global order.

      Buffer operators do partitioning based on these additional values and preserve the ordering while GroupBy family does not.

    • tomkirbygre​en

      Can we please get Bart back on soon! His stuff is always by turns enlightening and inspiring.

    • exoteric

      @dot_tom:I'm most interested in the "LINQ 2 MSIL" that Bart has alluded to previously, but expect that if anything comes out of that idea, we'll see it when it's ready.

    • giddy2

      thanks so much Bart, was an extremely en lightning video, I always like to go down the layers of abstraction sometime for fun, but sometimes the process can be painful, you did a great job simplifying everything and made it so exciting.

      Finally have the courage to read up and follow Jon Skeet's Reimplementation of LINQ on his blog.

    • Sebastian Gregor

      @bart: zip is special and can't be implemented by a combination of ana and bind, right? maybe already the term monad (mono) should make it obvious that combining two monads into one is not possible by its atomic operations. And ana / generate is just a more generative version of the atomic monad operation "return". since there is no atom that can handle two M<T> the only ways of building such a zip function would either force us to make use of cata and ana (breaking lazyness) or - better - make use of the special sequence centric additions to Enumerable<T> or Observable<T>. we are forced to manually iterate at same pace over the sequences and since i don't see any way to do that with one single foreach loop (it only takes one IEnumerable<T>), i assume i would need to implement IEnumerable by manually working with the Enumerators of the incoming enumerables. as far i can see yield return could help me with constructing the new enumerable so that i don't need to implement an own enumerator.
      is that right?

    • Sebastian Gregor

      wouldn't it be nice to have a foreach loop in c# that can iterate over several enumerables at the same pace? i know that the term for"each" is misleading then because it doesn't express which permutations of the single sequence elements are included... but it would feel natural that this no cross operation like nested foreach loops, but more a single 1D loop cutting through all dimensions at the same pace. so the only semantic question would be: iterate until the smallest or the largest sequence is finished iterating. but i would guess that 95% would be happy with smallest sequence...

    • Charles

      @dot_tom: There will be a lot of Bart on 9 in 2011. You can count on that.

    • bdesmet

      @Sebastian Gregor: Your analysis is right. Concerning language syntax, one question is whether you want one particular control flow primitive (foreach) to know about some kind of multi-value simultaneous treatment (zip), or you want to shift that into its own construct. I'm more in the latter camp where first-class tuples (with pattern matching) allow one to combine all the needed building blocks. Here's what Zip looks like in hypothetical syntax where * is a suffix type constructor for sequence and (,) is tupling syntax both in the type world and the value world:

      (T1, T2)* Zip<T1, T2>(this (T1*, T2*) source);


      IEnumerable<Tuple<T1, T2>> Zip<T1, T2>(this Tuple<IEnumerable<T1>, IEnumerable<T2>> source);

      Notice the beautiful symmetry of the lightweight signature shown above. Zip takes a tuple of stars to a star of a tuples. Assume the following use:

      foreach (var (i, c) in (1..10, 'a'..'j').Zip())
          f(i, c);

      == (ignore expansion of start..end range expressions)

      foreach (var __t in Tuple.Create(new[] { 1, 2, ..., 10 }, new[] { 'a', 'b', ..., 'j' }).Zip())
          var i = __t.Item1;
          var c = __t.Item2;
          f(i, c);

      Now the question is whether or not you want the GetEnumerator pattern to work on a "tuple of stars". If so, the Zip() call becomes redundant (if "foreach" were to resolve against extension methods, you could even do this right now). However, there are other ways of combining two sequences, such as Rx's CombineLatest. So this lightweight explicitness could be considered a good thing.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.