@Gregory81: Let's be clear, I'm talking here about Cata as the folding (for LINQ to Objects people, think Aggregate) operator for sequences of data, which leaves the IEnumerable<T> world in favor of a value of type R, outside the monad. In this interpretation, it's like unsafePerformIO in Haskell:
- Ana (and return) gets you in. A -> (A -> T) -> (A -> A) -> M<T>
- Bind (and >>=) keeps you in. M<T> -> (T -> M<R>) -> M<R>
- Cata (and unsafePerformIO) get you out. M<T> -> C -> (C -> T -> C) -> C
In our world of Rx and Ix, the monad's "aspect" is about lazy acquisition of data (which could go with lazy computation of the elements in the stream of course). The typing point of view is shown above too. If you make Cata lazy, it will be not the "streaming multiple values" kind of laziness that results; instead, you're deferring the catastrophy that's about to happen to the sequence being folded into a single value. (Exercise: There is a more mild form of aggregation in Rx though, but I wouldn't call the a Cata anymore. Go and find it.)
All the side-effects that are due to iteration of the sequence (till the very end, provided there's even an end to it) will be surfaced at the point this kind of Cata is executed. It's like building LINQ to Objects on IEnumerable<T> by having operators turn their input sequence into a List<T> first (which is a Cata using Add calls), apply things like a filter or a map on the entire in-memory list, and then unfolding it again into a sequence (which is an Ana using the enumerator as the state, MoveNext as a terminating condition and Current as the selector). Well, if the user only wanted the projection of the first element, we definitely did too much work now (and we could get stuck if the sequence is infinite). Laziness was dropped inside the implementation, and the result of that leaks to the users (side-effects, bottoming out on infinite sequences, etc.). In that sense, using a Cata is harmful when used in inappropriate places.
Have a look at Erik's paper titled "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" for some interesting insights (warning: Greek letters). Tip: there's more than ana and cata...
A few practical things related to this discussion:
- LINQ to SQL, and IQueryable<T> in general, leaves the monad early for aggregation operators. For example, products.Sum(p => p.Price) returns a decimal on the spot. Laziness is gone. The IEnumerable<T> monad retains laziness in the face of composition of such sequences. Because of this, you can't do something like dividing Sum and Count and have the whole thing executed remotely. Instead, you'll have two roundtrips. This is like "forgetting" about the M in M<T>, hence the unsafePerformIO analogy.
- In Rx, we leave aggregates in the monad. Only First, Last, and Single are permitted to leave the monad (ignoring non-operators such as Subscribe). This way you retain the composition over such sequences, which is where monadic computation shines. For example, you could apply the Timeout operator to computation of an aggregate and unsubscribe before it actually comes with the result.
For what GroupBy is concerned, you're absolutely right about it adding another layer. Keep thinking . Buffer is a different beast indeed - did you already look at the changes we made in the last Rx release? That may give some insightful clues... (Btw, a Channel 9 video on the latest Rx video is still coming up.)