Coffeehouse Thread

26 posts

Immutable collections for .NET arrive

Back to Forum: Coffeehouse
  • User profile image
    exoteric
  • User profile image
    MasterPi

    Yay! Glad there was some feedback on your connect request, too.

  • User profile image
    Bass

    Now you just have to convince people to use these instead. Smiley

  • User profile image
    felix9

    Yep, this is very nice stuff. Hopefully even better with language level support from Joe Duffy's team. Smiley

  • User profile image
    Charles

    Will be interviewing these guys tomorrow...

    Have questions? Now's the time to ask. Ask here!!


    C

  • User profile image
    exoteric

    Looking forward to it!

    Won't have time to study the new types in time for the interview but of course the weighed design and engineering considerations, interface support and that sort is of interest, as well as internal uses by Microsoft.

  • User profile image
    Charles

    Sorry for the short notice... Just got back from vacation. Happy New Year! C

  • User profile image
    evildictait​or

    Why is List.Add O(lg n) for immutable? You can abuse the fact that any subset of an immutable list is immutable to make it O(1) surely:

    class ImmutableList<T> 
    {
      int cachedPrefixLen = -1;
      Immutable<T> prefix;
      T[] item;
      ImmutableList<T>(Immutable<T> prefix, params T[] nextItems)
      {
        prefix = prefix:
        item = nextItems;

      }
      public ImmutableList<T> Add(T item)
      {
        // O(1)
        return new ImmutableList(this, item);
      }
      public ImmutableList<T> AddRange(T[] items)
      {
        // O(1)
        return new ImmutableList(this, items);
      }
      private int PrefixLen { get { if(cachedPrefixLen == -1) cachedPrefixLen = prefix == null? 0 : prefix.Count; return cachedPrefixLen; }}
      public int Count { get { return PrefixLen + item.Length; } }
      public void ToArray(T[] dst, int offset, int count)
      {
        int a = Math.Min(dst.Length, count);
        int b = count - a;
        prefix.ToArray(dst, offset, a);
        Array.Copy(dst, offset + a, b);
      }
      public bool Contains(T t)
      {
        return prefix.Contains(item) || item.Contains(t);
      }

      public T this[int index] 
      {
        get { if(index < PrefixLen) return prefix[index]; else return items[PrefixLen - index];
      }

    // etc
    }

  • User profile image
    Auxon0

    At what point did someone finally say, "this is important" and why?  Have things changed so much in the past 11 years, that immutable collections became obviously important, or was there technical reasons for not creating these libraries?  Was it a matter of expedience, or was the theory still fresh out of the ivory tower, and just not recognized as important by anyone but the academics?

  • User profile image
    Sven Groot

    @evildictaitor: Your implementation means that both Count and getting an item are no longer O(1), and that the items are not stored in contiguous memory (which is what is usually expected from a List<T>).

    In the worst case, with a list constructed by adding items one by one, these operations now have a worst case complexity of O(n).

    EDIT: It sounds like the immutable collections actually have a lookup time of O(log n), and don't use contiguous storage anyway.

  • User profile image
    evildictait​or

    , FuncOfT wrote

    At what point did someone finally say, "this is important" and why?  Have things changed so much in the past 11 years, that immutable collections became obviously important, or was there technical reasons for not creating these libraries?  Was it a matter of expedience, or was the theory still fresh out of the ivory tower, and just not recognized as important by anyone but the academics?

    The main reason is that if you know something is immutable then you know certain classes of actions on that collection can be done without doing something bad.

    For example, if foo is a mutable collection, you can prove that DoSomething has no IO side-effects, and you see the code

    for(int i = 0; i < foo.Count; i++) DoSomething(foo[i])

    Can the compiler automatically convert that into the following?

    Parallel.For(0, foo.Count, (i) => DoSomething(foo[i]));

    Or even more crazy:

    DistributeWorkOverDifferentAzureProcessors.For(0, foo.Count, (i) => DoSomething(foo[i]))

    Immutability is much more powerful than just being a cheap way of easily preventing naughty callers from changing the private collections owned by your class - they are a property of the collection that allows either a human or a machine to make fundamental changes to the code such as making the collection more suited to O(1) storage and retrieval from some of MSR's experimental databases, automatic parallelization of code not just to other threads within a machine, but to other processors that are completely removed from your machine.

    Although we've had compute clusters and parallelization for a while now, we're only just starting to get to the point where languages are taking the brunt of the complexity, and one of the ways of getting to where we need to be is to have more descriptive and better annotated types in the language, such as being able to mark objects as immutable.

    For that reason, Immutable collections are only just becoming important enough to warrant space in the base libraries of .NET. Back when immutability was only for protecting private members you could just have written a wrapper MutableList<T> : List<T> { override void Add() { throw; } }, so it was less important to have fast and inbuilt types to manage the immutability.

  • User profile image
    evildictait​or

    , Sven Groot wrote

    @evildictaitor: Your implementation means that both Count and getting an item are no longer O(1), and that the items are not stored in contiguous memory (which is what is usually expected from a List<T>).

    In the worst case, with a list constructed by adding items one by one, these operations now have a worst case complexity of O(n).

    I was optimising for the benchmark Smiley

    But more seriously, Count can be cached, so in a loop for(int i = 0; i < Count; i++) { Something(i); } you get an amortized cost of O(1). Indexing is more of a problem because you can't have both the indexer and the adder at O(1), but what you can do is you can realize that the constant in O(cn) for indexing is the reciprocal of the depth of the prefixes. By collapsing up to adjacent small prefixes' "item" you incur an O(P) operation every singular Adds - i.e. an amortized O(1) for Adding. Your indexer remains O(n), but the constant here is small if you make P large

    If you get caught up in trying to reduce the big Oh for indexes (and miss the point that the constant parameter dwarfs until you get really big lists, and most lists are small), you could make P grow with nmaking the indexer O(lg n), but you'd then incur a really expensive aggregation operation on some Adds, putting your Add back to O(lg n) as well.

    Hopefully the lists are heavily benchmarked Smiley

  • User profile image
    Auxon0

    @evildictaitor: Bah, I should have said @Charles, he asked for questions for the team. Tongue Out  I still want their answers. Wink

  • User profile image
    Charles

    @FuncOfT: I'll ask the historical questions (I normally do...). Thanks for asking!

    C

  • User profile image
    AndrewArnott

    @evildictaitor: hopefully we'll have time to discuss the O(log n) time for adding and accessing ImmutableList in the video. But here's a quick explanation. Your whipped up example of how to achieve O(1) time breaks down quickly for certain mutations. For example, if I have a 1000 element list (that's backed by an array) and I add one more element to it, yes, I can stitch together a sort of segmented linked list by having a "1 + 1000" ImmutableList wrapper. But what if you want to remove one element from a segment? Now you have to copy the entire segment, which gives you O(n) time. Also, if you're building up your list one item at a time, the array segments will only be 1 item long, and all your accesses will be O(n) anyway because you'll effectively have a linked list.

    Now if all you have is a read-only list with no intention to efficiently handle mutation by allocation, then a single array, wrapped in a ReadOnlyCollection<T> is perfect. It will have the O(1) time you want for lookups. But the immutable collection strives both for an immutability guarantee and efficient mutation as well. In order to guarantee consistent and reasonably high performance for most scenarios we implemented the list not as a linked list of arrays but as a binary tree. Now any mutation or lookup takes O(log n) guaranteed, regardless of how the caller builds up the list in the first place. Again, this isn't right for everybody, but it's good for most applications. And you can still enumerate the list in O(n) time, just like an array.

    I hope that helps.

  • User profile image
    Charles

    This is now happening on Friday (and someone special will be joining us!! Smiley
    C

  • User profile image
    blowdart

    @Charles I will buy you lunch if your first question is

     

    "Immutable collections - does this change everything?"

  • User profile image
    MasterPi

    @blowdart: Yes! Charles, please let this be the first question. Big Smile

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.