posted by conard

]]>Shape, Wrap, Roll

CAR, CDR, CONS

balabing, balabang, balaboom

posted by felix9

]]>Great work Bart!

posted by jeroenverhulst

]]>@Bart: I'm amazed by your simple IE->IO->Amb->IE roundtrip example. It is concise yet clear, demonstrating the importance of orthogonality.

posted by ShinNoNoir

]]>For now, it should be clear that the MinLINQ implementation is truly meant as an academic exercise, in particular when it comes to using the implementation for real-world scenarios, due to the various shortcuts that have been taken (no disposable nature, error handling omitted, etc.).

Exercise for the viewers: what'd be a good generalization of the set of GroupBy operators? How about other kinds of partitioning of an input sequence in chunks (e.g. Buffer* operators in Rx)?

posted by bdesmet

]]>There are plenty of other ones, so here's one exercise for the viewers: assume your team has only the budget to use one try...catch block (assume each block costs as $1,000 and the licensing model of the C# compiler is to pay per try...catch used) in the entire codebase of your LINQ to Objects implementation of the Standard Query Operators (i.e. IEnumerable<T> based). How'd you spend it (*) such that you can easily implement error-handling operators in the future? Examples include operators such as Catch :: IE<T> -> (Exception -> IE<T>) -> IE<T>, but also Retry, etc.

Tip: you want to realize Catch, Retry, etc. using a roundtrip scheme of IE<T> **-> ??<T>** -> (wow, error handling got so much easier here thanks to the $1,000 well-spent) -> **??<T> ->** IE<T>. Try to find what I've highlighted in bold and you may be onto something (nested tip: we got this in Rx and Ix).

(*) Constraint: use the try...catch inside a proper operator/combinator that deals with sequences. I.e. simply wrapping try...catch in a generic helper method such as the one shown below is ** not** an option:

static void Catch<E>(Action action, Action<E> handler) where E : Exception { try { action(); } catch (E e) { handler(e); } }

posted by bdesmet

]]>posted by CKurt

]]>Max and Min in terms of using Aggregate could be written like so.

Function Max(Of T As IComparable)(ByVal c As IEnumerable(Of T)) As T Return Enumerable.Aggregate(c, c.First, Function(a, e) ( Function(_x As T, _y As T) {_y, _x, _x}(Math.Sign(_x.CompareTo(_y)) + 1) )(a, e) ) End Function Function Min(Of T As IComparable)(ByVal c As IEnumerable(Of T)) As T Return Enumerable.Aggregate(c, c.First, Function(a, e) ( Function(_x As T, _y As T) {_x, _x, _y}(Math.Sign(_x.CompareTo(_y)) + 1) )(a, e) ) End Function

I'm using

Math.Sign(_x.CompareTo(_y)) + 1

because the IComparable interface doesn't require the return value to limited to the values **-1**, **0** and **+1**, so I have to normalise it to this range.

Note that the above implementation need the extension method **.First. **So this requires a function to exist which returns the first value of the Enumerable. According to my understanding this function would be a CATA, so it is based off an Aggregate.

Now I think that be challenge. How do write First in terms of Aggregate. (Can it be done without the indexing overload of SelectMany?)

Thinking Headwear on.

posted by AdamSpeight2008

]]>Function First(Of T)(ByVal src As IEnumerable(Of T)) As T Return Enumerable.Aggregate(Of T)( Enumerable.SelectMany(Of T, T)(src, Function(curr As T, index As Integer) As IEnumerable(Of T) Return If(index = 0, Enumerable.Repeat(curr, 1), Enumerable.Empty(Of T)) End Function), Function(accum, curr) (curr)) End Function

posted by AdamSpeight2008

]]>Using Aggregate it would simply be.

Function Sum(Of T )(ByVal src As IEnumerable(Of T)) As Integer Return Enumerable.Aggregate(src, Function(accum, curr) accum + curr) End Function

But the problem is the addition, it would work if the Generic Type could constrained to only type that implement an addition operator.

Would this suggest that .Sum() isn't generic and is an Extension method with multiple overloads, one for each arithmetical type. Eg Long, Integer, Double etc?

posted by AdamSpeight2008

]]>The problem is that operators are static in C# (Shared in VB), so you are pretty much forced to multiple overloads.

Here's something I found about generic operators written by Marc Gravell and Jon Skeet: http://www.yoda.arachsys.com/csharp/genericoperators.html

posted by ShinNoNoir

]]>3 hours ago, bdesmet wrote

assume your team has only the budget to use one try...catch block in the entire codebase of your LINQ to Objects implementation of the Standard Query Operators (i.e. IEnumerable<T> based). How'd you spend it (*) such that you can easily implement error-handling operators in the future?

That can be done using operator analogous to Materialize of Ix. It makes transition from implicit exception-handling mode to explicit Error monad.

posted by Gregory81

]]>The thing I've been pondering about, can you always define Ana for each monad M?

posted by ShinNoNoir

]]>posted by bdesmet

]]>posted by bdesmet

]]>You can actually find an implementation of FEnumerable and a bunch of operators on www.codeplex.com/linqsqo. To show it truly works, I've provided a port of Luke Hoban's LINQ-based Ray Tracer as well, as part of the code. More information on conversion operators between classic IEnumerable and purist FEnumerable can be found in my blog post, mentioned in the description of the video above. I also describe how to get query expression syntax working on top of the FEnumerable implementation.

Cheers,

-Bart

posted by bdesmet

]]>Completely tangential and just out of curiosity ... you think in English? Not in French or Flemish? (You're from Belgium*, right?)

be well

-h-

posted by corvuscorax

]]>Met vriendelijke groeten,

-Bart

posted by bdesmet

]]>posted by cammcad

]]>let fold xs init f = match xs with | [] -> init | x::r -> f x (fold r init f)

You do this:

let fold xs init f = match xs with | [] -> init | x::r -> f x (fun () -> (fold r init f))

If we assume that the && operator is short circuiting, then the following is a short circuiting All function:

let all p xs = fold (fun x r -> p x && r()) true xs

i.e. as soon as the predicate returns false, the loop stops and returns false.

posted by Jules

]]>() -> (() -> T?)

but raise you this haxe-syntactic definition (If I recall syntax correctly)

alias Streamer = Void -> Stream<T>

enum Stream<T>

{

none;

some(x: T, xs: Void -> Stream<T>);

}

This is a more purely functional style as a Streamer would normally always have the same result, every time you call it and you have to deconstruct the stream to get to successive elements, whereas with FEnumerable, somehow magically, the same enumerator function will return different results every time you call it.

It just happens there's no *natural* way to model this kind of structure in C#.

posted by exoteric

]]>We fully embrace side-effects in this setting, as C# is good at. The arrows are not purist functional arrows, but rather fat arrows that can contain what you refer to as "magic". I believe I mentioned one way to turn it into a functional Haskell-style setting is to decorate the arrows (or rather their return type) with the IO monad. All we're doing here is bringing laziness to the table in one suitable way, but we're completely honest about side-effects induced by the cursor-based nature of enumerators.

A head/tail decomposition at the function level, as done by your Stream<T> sample, is another possibility, introducing recursion at a type level. Whether it's strictly better is subject to debate though. There's only one value of type Void, hence a purist Streamer can only return the same Stream<T> every time the function is called. In fact, () -> T is a lazily computed constant; when a side-effect is desired, one would simply write IO T in Haskell (even the () -> is redundant due to the lazy nature of Haskell's evaluation mechanism).

This gets into a chicken-or-egg IO problem (somehow I like to associate IO with chickens and eggs) for realistic scenarios where you're querying data from some store such as a database, webservice, and whatnot. There you'll have to have an IO somewhere. After all, the contents of the store can change and you'll want a new enumerator that represents a cursor on the world at that point in time (time is another way to think about implicit side-effects, where an impure function is implicitly parameterized on time). Related to the enumerator is some state that reflects server-side state too, moving over the result set as it was obtained at the point of the (initial?) request (providing a consistent - or not, depending on the guarantees provided by the store - view of the world at that time).

For in-memory lazily computed streams, such as prime numbers to use an academic sample, we can do better. There all the properties of a purist functional and lazily evaluated language such as Haskell shine.

posted by bdesmet

]]>However i would like to conclude IEnumerable<T> is a monad. But isn't it more also - at least when you count in its extension methods? To understand if it is exactly a monad or can be viewed as a construct based upon a monad, it still would be great to have a list of all operators that are based upon the 3 primitives.

Or maybe even better: It would help to know all other paradigms (or their primitives) of those operators not covered by the monad aspect of IEnumerable<T> / IObservable<T>.

And please give me a hint: Is it possible to write an elegant short version of "zip" based on ana, bind and cata? I would like to try that...

posted by Sebastian Gregor

]]>What's extra though is the *sequence* part of an IEnumerable<T>, or any other sequence type we're talking about (I know at least three more in the Rx/Ix puzzle, in fact I do know even more ). This is where the Ana abstraction can be seen as a sequence-centric Return, in that many values can be computed to be turned into an object that's inside the monad.

To see the layer map, check out the www.codeplex.com/LINQSQO project and use Visual Studio 2010 architecture explorer facilities to see the dependency graph of the methods in MinLinq. This will clearly show how all of the operators I've implemented there find their roots in Ana, Bind, and Cata.

For your requested tip: Zip is a *nice* one you bring up. I'm emphasizing *nice* a bit sarcastically. What Zip does is take a pair of sequences and turn it into a sequence of pairs (generalized using a projection function): Tuple<IEnumerable<A>, IEnumerable<B>> --> IEnumerable<Tuple<A, B>>. To achieve this (lazily and whatnot), you got to obtain elements from both sequence at the "same pace". Now have a look whether you got such a primitive hanging in there somewhere . Notice though a Zip can be used to build the fancy Where and Select overloads that take a natural index number in their function parameter (Zip with the infinite Ana of natural numbers)...

Using Cata is rarely useful in operators itself, and in fact harmful as it breaks laziness. One exception is where you got to see the whole sequence anyway before you can produce any output. Examples include OrderBy and Reverse (exercise: write those using Ana and Cata).

posted by bdesmet

]]>posted by exoteric

]]>Buffer is a different beast as it adds takes M<T> and produces M<List<T>>.

2 hours ago, bdesmet wrote

Using Cata is rarely useful in operators itself, and in fact harmful as it breaks laziness. One exception is where you got to see the whole sequence anyway before you can produce any output.

Didn't get that part. If Cata itself is lazy there should not be any problems. Or you mean that lazy invocations of Cata are usefull and direct invocations are harmfull?

Correct me if I'm wrong, but I thought that for every monad there exist Ana-Cata pair and every monadic operation may be written in terms of Ana and Cata. However in general there's no easy way to find Ana and Cata for a given monad.

posted by Gregory81

]]>**Ana**(and return) gets you in.**A**-> (A -> T) -> (A -> A) ->**M<T>****Bind**(and >>=) keeps you in.**M<T>**-> (T -> M<R>) ->**M<R>****Cata**(and unsafePerformIO) get you out.**M<T>**-> C -> (C -> T -> C) ->**C**

In our world of Rx and Ix, the monad's "aspect" is about lazy acquisition of data (which could go with lazy computation of the elements in the stream of course). The typing point of view is shown above too. If you make Cata lazy, it will be not the "streaming multiple values" kind of laziness that results; instead, you're deferring the *catastrophy* that's about to happen to the sequence being folded into a single value. (Exercise: There is a more mild form of aggregation in Rx though, but I wouldn't call the a Cata anymore. Go and find it.)

All the side-effects that are due to iteration of the sequence (till the very end, provided there's even an end to it) will be surfaced at the point this kind of Cata is executed. It's like building LINQ to Objects on IEnumerable<T> by having operators turn their input sequence into a List<T> first (which is a Cata using Add calls), apply things like a filter or a map on the entire in-memory list, and then unfolding it again into a sequence (which is an Ana using the enumerator as the state, MoveNext as a terminating condition and Current as the selector). Well, if the user only wanted the projection of the first element, we definitely did too much work now (and we could get stuck if the sequence is infinite). Laziness was dropped inside the implementation, and the result of that leaks to the users (side-effects, bottoming out on infinite sequences, etc.). In that sense, using a Cata is harmful when used in inappropriate places.

Have a look at Erik's paper titled "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" for some interesting insights (warning: Greek letters). Tip: there's more than ana and cata...

A few practical things related to this discussion:

- LINQ to SQL, and IQueryable<T> in general, leaves the monad early for aggregation operators. For example, products.Sum(p => p.Price) returns a decimal on the spot. Laziness is gone. The IEnumerable<T> monad retains laziness in the face of composition of such sequences. Because of this, you can't do something like dividing Sum and Count and have the whole thing executed remotely. Instead, you'll have two roundtrips. This is like "forgetting" about the M in M<T>, hence the unsafePerformIO analogy.
- In Rx, we leave aggregates in the monad. Only First, Last, and Single are permitted to leave the monad (ignoring non-operators such as Subscribe). This way you retain the composition over such sequences, which is where monadic computation shines. For example, you could apply the Timeout operator to computation of an aggregate and unsubscribe before it actually comes with the result.

For what **GroupBy **is concerned, you're absolutely right about it adding another layer. Keep thinking . **Buffer **is a different beast indeed - did you already look at the changes we made in the last Rx release? That may give some insightful clues... (Btw, a Channel 9 video on the latest Rx video is still coming up.)

posted by bdesmet

]]>First of all the easy part: not so catastrophic version of Aggregate is Scan.

On lazyness as the main monadic aspect of IEnumerable. The problem is that both IEnumerable and IObservable are quite complex creations. There are at least three (maybe more) aspects in IEnumerable (like returning multiple values, handling exceptions and deferring side-effects) and at least four (or even five) in IObservable. Had we used haskell (not that I know it), instead of IEnumerable we'd written something like List<Error<IO<T>>> (or sometimes Set<Error<IO<T>>>). In Ix there are a lot of operators that leave one monadic aspect or the other. Like Catch leaves Error monad, or SumEnumerable leaves List monad (but not IO monad) - in some sense these are catamorphisms. Of course, if we're talking of IO as a main part of IEnumerable monad then indeed Cata would equal to unsafePerformIO.

What I was trying to say in previous post, given that the only way to deconstuct a monadic value is Cata, you'd had to use it somehow in the implementation of Bind.

By the way I think it can be good exercise to write approximate signatures of Ix and Rx operators in haskell, taking side-effects and exceptions into account.

Erik's paper is my reading list (already quite some time). I yet had to get used to all hylomorphisms and co.

posted by Gregory81

]]>http://en.wikipedia.org/wiki/Bottom_type

Im no LINQ or C# pro, do you think it would be a suitable solution for you Bart?

posted by HeavensRevenge

]]>You're right that IE<T> and IO<T> are a bit more complex than what we've discussed here. I conveniently swept errors and disposing under the carpet. The former can be deal with by treating it as a value that can be hidden behind the T (i.e. Materialize approach, T is Notification<T>). The latter (disposable aspect) pops up in the types for sure; disposables have a very imperative nature to it and can be represented as IO () in Haskell.

posted by bdesmet

]]>The elements of IE<T> and IO<T> can be associated with additional values: index and time (for IO<T>). These values arise from specific properties of Bind operator of each monad.

For IE<T> Bind preserves internal ordering of elements within containers. It is possible to write non-preserving version too (and Px has that version AFAIR). Even inside standart LINQ not all operators respect the order (for example, Union and other set-related operators but that's because they should belong to different monad). However basic operations do preserve the order.

For IO<T> as events have global order (they can be ordered by time when an event happened) Bind preserves that global order.

Buffer operators do partitioning based on these additional values and preserve the ordering while GroupBy family does not.

posted by Gregory81

]]>posted by tomkirbygreen

]]>posted by exoteric

]]>Finally have the courage to read up and follow Jon Skeet's Reimplementation of LINQ on his blog.

posted by giddy2

]]>posted by Sebastian Gregor

]]>C

posted by Charles

]]>(T1, T2)* Zip<T1, T2>(this (T1*, T2*) source);

==

IEnumerable<Tuple<T1, T2>> Zip<T1, T2>(this Tuple<IEnumerable<T1>, IEnumerable<T2>> source);

Notice the beautiful symmetry of the lightweight signature shown above. Zip takes a tuple of stars to a star of a tuples. Assume the following use:

foreach (var (i, c) in (1..10, 'a'..'j').Zip())

f(i, c);

== (ignore expansion of start..end range expressions)

foreach (var __t in Tuple.Create(new[] { 1, 2, ..., 10 }, new[] { 'a', 'b', ..., 'j' }).Zip())

{

var i = __t.Item1;

var c = __t.Item2;

f(i, c);

}

Now the question is whether or not you want the GetEnumerator pattern to work on a "tuple of stars". If so, the Zip() call becomes redundant (if "foreach" were to resolve against extension methods, you could even do this right now). However, there are other ways of combining two sequences, such as Rx's CombineLatest. So this lightweight explicitness could be considered a good thing.

posted by bdesmet

]]>