Tech Off Thread

5 posts

Split and Join

Back to Forum: Tech Off
  • User profile image
    Bas

    Maybe I've always just missed the obvious, but is it just me who finds it extremely annoying that string.Split() and string.Join() expect different datatypes for the separator character? string.Join() expects a string value as the separator, and string.Join() expect either a char (via params), an array of chars or an array of strings. I'd have guessed that these two methods often operate on each other's output, and so defining a const separator value would be useful, but apparently not. The simplest way I can think of is defining the separator character as a const char, and then calling .ToString() on that char for the Join() method, but that's still a pain.

    What am I missing?

  • User profile image
    JohnAskew

    For Split(), I often resort to:

    ",".ToCharArray()   // just comma delimiter

    " ,.;".ToCharArray()   // space, comma, period, semi-colon delimiters

    but this helps you naught...

  • User profile image
    spivonious

    @Bas: I'd imagine it's because Join is concatenating strings and Split is going through character by character looking for the separator. Still, it probably would have made sense to put overloads in there.

  • User profile image
    evildictait​or

    If I remember correctly (this decision was a loooong time ago), the reason for this was that either .NET 1.0 or a pre-release of .NET had

    string[] String::Split(char ch);
    string[] String::Split(params char[] chs);
    string[] String::Split(IEnumerable<char> chs);

    the third overload was removed from .NET because string is IEnumerable<char> and this led to confusion:

    foreach(var x in "Hello World".Split("el")) Console.WriteLine(x);

    would yield
    H



    o<space>Wor
    d

    Rather than

    H
    lo<space>World

    as most people would expect.

    Therefore the decision was that the overload string[] String::split(IEnumerable<char> chs) should be removed.

    Unfortunately you can't then add String::Split(string chs), since this means that you've just changed what "Foo".Split("Bar") means (it used to mean split by 'B', 'a' and 'r', since string is a collection of chars and matches IEnumerable<char> and now it means split by "Bar" since you have an overload of string).

    So long story short is that a lot of this nastiness is there for frankly pretty old reasons. Adding overloads in future probably isn't a bad idea, but since most people have been coping (string.Split(x, new string[] { y }) does what most people expect), I think this has been pretty low down the list of priorities for .NET. You need a good reason to change the base library once it's used by millions of customers, and I'm not sure this is a good enough reason to change it.

  • User profile image
    Bas

    That makes sense. I knew there must've been a reason for it, I just couldn't figure it out. I hadn't thought about the string/character array thing.

    I wasn't really expecting a change, it just struck me as some annoying holdover and I wanted to know why it was there. Now I know.

    What strikes me though: why couldn't they have simply added an overload to Join that takes a char? At least that way I can simply use the same values for both methods. It feels.. cleaner. Ah well.

    Thanks for enlightening me!

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.