Tech Off Thread

14 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Why are string types immutable in C#?

Back to Forum: Tech Off
  • User profile image
    jsrfc58

    Okay, I am trying to teach myself some C#.  So I was using the lessons found at:

    http://www.softsteel.co.uk/tutorials/cSharp/lesson4.html

    ..and eventually I will move on to some books.  From the tutorial:

    This tutorial wrote:
    This code illustrates how changing a property of an object using a particular reference to it is reflected in all other references to it. Note, however, that although strings are reference types, they work rather more like value types. When one string is set to the value of another, eg

    string s1 = "hello";
    string s2 = s1;
    Then s2 does at this point reference the same string object as s1. However, when the value of s1 is changed, for instance with

    s1 = "goodbye";
    what happens is that a new string object is created for s1 to point to. Hence, following this piece of code, s1 equals "goodbye", whereas s2 still equals "hello".

    The reason for this behaviour is that string objects are 'immutable'. That is, the properties of these objects can't themselves change. So in order to change what a string variable references, a new string object must be created.



    I understand the immutability portion, however, and what happens, but I am wondering what the design reason was behind this.  I can pass it off as some type of "backwards compatability with C/C++" mentality, but at what point do you say "let's make strings changable on the fly"? I seem to remember working with a version of BASIC over a decade ago and it had no issues with allowing the user to change the value of a string after the string was created.  I can understand from a stack/heap standpoint how it probably works, but my question is...why not make them changeable? Maybe I am missing something, and again, I am new to C#, although I have looked at some Java code in the past and wondered the exact same thing.

  • User profile image
    Tom Servo

    BTW, it's a .NET thing, not just C#.

  • User profile image
    Manip

    I think what you are not considering is how memory in a modern computer works.

    When you ask the OS to give you some memory to store data (like a string) that space can be anywhere in the memory address space...

    So, let's assume you asked the OS to store a 10 character string (10 * 2 bytes = 20 bytes), so you have your 20 bytes and you place your String data in there.

    Now the programmer thinks, let's place another character at the end of that 10 character string... So you ask the OS for another 2 bytes (a single character) but find that those two extra bytes are located at some generic and random location in memory.... Problem; you can't just have 10 characters at one place and the extra 1 at another... So to solve this each time you increase the size of a string the string is destroyed then the total length is requested from the memory pool.

    Basic also does this it is just transparent. Your strings *are* destroyed, you just don't have memory pointers in basic so don't notice.


    Here is an analogy to help (best I could think of).

    You and some friends go to the TicketMaster web-site to try and buy 5 tickets to the latest concert; you are given seat numbered 523, 524, 525, 526 and 527. Then another one of your friends suggests they want to come too... So you go to the TicketMaster website again in the hopes that you can buy seat 528 or 522 so you can all sit next to one another ... But the site won't let you pick where you want to sit. 

    Which leaves you two chooses. Either cancel all of your friend's seats and re-book all six or seat that last person somewhere else... But if you sat them somewhere else you might have trouble finding them after... So you decide to pick the first option and to cancel and re-book the full number.

     

     

  • User profile image
    MikeGoatly

    Good example, Manip!  I'll remember that one in case I'm asked sometime!

  • User profile image
    jsrfc58

    Manip wrote:
    Now the programmer thinks, let's place another character at the end of that 10 character string... So you ask the OS for another 2 bytes (a single character) but find that those two extra bytes are located at some generic and random location in memory.... Problem; you can't just have 10 characters at one place and the extra 1 at another... So to solve this each time you increase the size of a string the string is destroyed then the total length is requested from the memory pool.

    Basic also does this it is just transparent. Your strings *are* destroyed, you just don't have memory pointers in basic so don't notice.


    After seeing your reply and looking back over the original tutorial's code, I realized I misread part of the tutorial.  I think it was the way it was worded...

    [quote user ="This tutorial"]The reason for this behaviour is that string objects are 'immutable'. That is, the properties of these objects can't themselves change. So in order to change what a string variable references, a new string object must be created.[/quote]

    And, I wouldn't expect "s2" in the example above to take on the value of s1 when s1 changes.  Poor example, and a case of me trying to do too many things at once. Thanks for the replies, though!

    Good analogy by the way...

    Manip wrote:
    Here is an analogy to help (best I could think of).

    You and some friends go to the TicketMaster web-site to try and buy 5 tickets to the latest concert; you are given seat numbered 523, 524, 525, 526 and 527. Then another one of your friends suggests they want to come too... So you go to the TicketMaster website again in the hopes that you can buy seat 528 or 522 so you can all sit next to one another ... But the site won't let you pick where you want to sit. 

    Which leaves you two chooses. Either cancel all of your friend's seats and re-book all six or seat that last person somewhere else... But if you sat them somewhere else you might have trouble finding them after... So you decide to pick the first option and to cancel and re-book the full number.

  • User profile image
    figuerres

    dang I don't have time right now but....

    all of you are close but missing a few things:


    1)  in C strings lead to buffer overruns that create very ugly bugs.

    2)  allocating memory on the fly form the OS slows performance.

    and a few other details like the ones mentiuoned...


    but with the Managed model problems 1 and 2 are the big ones.

    #1 can't happen in managed code. (well as far as I know it can't)

    #2 is greatly reduced.

    thats why in a nutshell.

  • User profile image
    Sven Groot

    Manip, although you are correct, this has nothing to do with why the String object is immutable. If I append a few characters to a StringBuilder, the same move/copy/free thing is done, but the StringBuilder class is not immutable.

    Let's take a look at an example with a method that both the StringBuilder and String classes have:

    String x = "hello";
    String y = x.Replace('e', 'u');
    StringBuilder a = new StringBuilder("hello");
    StringBuilder b = a.Replace('e', 'u');
    Console.WriteLine("x: " + x);
    Console.WriteLine("y: " + y);
    Console.WriteLine("a: " + a.ToString());
    Console.WriteLine("b: " + b.ToString());

    What's the output of this program:
    x: hello
    y: hullo
    a: hullo
    b: hullo

    As you can see, the Replace call did not change the String class instance, but it did change the StringBuilder instance (the only reason why StringBuilder.Replace also returns an instance is to make it possible to chain calls. In fact, it doesn't return a new instance, but the same one, so you'll find that Object.ReferenceEquals(a, b) == true).

    Now as to why. I don't know all the reasons, but one of them is this. In .Net, String is a reference type, so it is never copied, but passed by reference. Compare this to the C++ std::string object (which is not immutable), which is passed by value. This means that if you want to use a String as a key in a Hashtable, you're fine in C++, because C++ will copy the string to store the key in the hashtable (actually std::hash_map, but still) for later comparison. So even if you later modify the std::string instance, you're fine.

    But in .Net, when you use a String in a Hashtable, it will store a reference to that instance. Now assume for a moment that strings aren't immutable, and see what happens:
    1. Somebody inserts a value x with key "hello" into a Hashtable.
    2. The Hashtable computes the hash value for the String, and places a reference to the string and the value x in the appropriate bucket.
    3. The user modifies the String instance to be "bye".
    4. Now somebody wants the value in the hashtable associated with "hello". It ends up looking in the correct bucket, but when comparing the strings it says "bye"!="hello", so no value is returned.
    5. Maybe somebody wants the value "bye"? "bye" probably has a different hash, so the hashtable would look in a different bucket. No "bye" keys in that bucket, so our entry still isn't found.

    Making strings immutable means that step 3 is impossible. If somebody modifies the string he's creating a new string object, leaving the old one alone. Which means the key in the hashtable is still "hello", and thus still correct.

    So, probably among other things, immutable strings are a way to enable strings that are passed by reference to be used as keys in a hashtable or similar dictionary object.

  • User profile image
    cjberg

    With an immutable string class you don't have to worry about ownership issues, so you can pass a reference any way you like without having to worry about external clients modifying your data.

    You should get yourself a copy of “.NET Framework Standard Library Annotated Reference, Volume 1: Base Class Library and Extended Numerics Library”. It has lots of information on subjects like this.

  • User profile image
    amotif

    BTW, immutable types are inherently callee-safe (you can pass them to a method without fear they'll be changed) and thread-safe (no methods on the type change its state, so there's nothing to synchronize).

    Sven Groot wrote:
    Compare this to the C++ std::string object (which is not immutable), which is passed by value. This means that if you want to use a String as a key in a Hashtable, you're fine in C++, because C++ will copy the string to store the key in the hashtable (actually std::hash_map, but still) for later comparison.


    Is that the case or does std::string implement copy-on-write like MFC does? (I'm a little concerned that I no longer remember...)

  • User profile image
    littleguru

    I can only repeat what has been mentioned before: by hiding the new allocating of string objects the .NET runtime would have to create always new memory chunks where the strings are stored.


    It's how current systems are build. It's the best to give the user the choice when to create a new instance of a string... It's always the best not to take all the control away from the user.

  • User profile image
    Mike Dimmick

    amotif wrote:
    Is that the case or does std::string implement copy-on-write like MFC does? (I'm a little concerned that I no longer remember...)


    The C++ standard does not mandate any reference-counting or string-sharing behaviour for std::string. As far as I can tell the version shipped with VS.NET 2003 does not do it.

    One of the problems with refcounted strings is making them threadsafe. This normally leads to a lot of overhead for the more common case of a string that's never accessed concurrently from different threads.

  • User profile image
    rhm

    I'll point out the obvious since noone else has yet: C# and .NET have immutable strings because Java has them. I suspect Java got the idea from Lisp.

  • User profile image
    Maurits

    I like Java's immutable strings - as a pointerless language, it's nice to have consistency in the way all the basic variable types are passed (yes, I'm counting String as a variable type)

  • User profile image
    Frank Hileman

    Sven Groot hit the nail on the head. There is another consideration, an internal optimization that you can use in your code as well: interned strings. An interned string is one that has been added to a big hashtable internally in the CLR (they call it a "pool"). This is used automatically for literal strings, so all literal strings with the same characters refer to the same memory location.

    You can also explicitly intern a dynamically created string by calling Intern. To compare two interned strings for equality, you only have to compare addresses, and not the contained characters.

    The concept of an interned string would be destroyed by mutable strings -- or at least, they could not be interned.

    Interning first became popular in Lisp systems. 

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.