Tech Off Thread

21 posts

C#'s \n escape

Back to Forum: Tech Off
  • User profile image
    W3bbo

    I heard some people say that C#'s "\n" escape character matches \r\n, \n, and \r.

    ...is this true?

  • User profile image
    Maurits

    Depends.

  • User profile image
    W3bbo

    Maurits wrote:
    Depends.


    ...depends?

  • User profile image
    Maurits

    Yes.

    In a string literal, \n always means 0xA.

    On the other hand, Console.WriteLine() puts out \r\n.

    In regular expressions, I'm not sure whether \n matches \n, \r\n, \r, or all of the above.

  • User profile image
    Eric Falsken

    Use Environment.NewLine instead

  • User profile image
    Maurits

    This is not unique to C#, BTW.  Many C-ish languages have magical \r-\n-\r\n Monte Carlo stuff happening behind the scenes.  Some have different behavior depending on the platform.

  • User profile image
    SlackmasterK

    I can tell you from experience that, at least in MessageBox.Show();, you explicitly need to express both: \r\n .

  • User profile image
    W3bbo

    Eric Falsken wrote:
    Use Environment.NewLine instead


    That doesn't work for ASP.NET

    Someone might submit a textarea that uses only \n for newlines, and not \r\n (or if they were on MacOS (pre-X) \r) so you need to parse the string properly.

    Ordinarily I do this:

    strInput = strInput.Replace("\r\n", "\n").Replace("\r", "\n");

    ...but that's obviously a performance hit, so if there's a better way I want to hear it.

  • User profile image
    Maurits

    str = Regex.Replace(str, "\r\n?", "\n") perhaps... it has the additional overhead of a regex, but it's just a one-pass operation instead of the two-pass replace.

    I don't know, though... the second pass Replace("\r", "\n") is pretty cheap as far as Replace calls go.  Maybe try it both ways.

  • User profile image
    sbc

    Perhaps a test should be done to see what performance hit is taken with both methods? I expect it will be negligable. Either a massive string (many megabytes big) with lots of line breaks, or a shorter string done many times (resetting it after the replacement).

    When would string replacement be noticable to the users?

  • User profile image
    Maurits

    I ran a performance analysis (using .Net 1.1) and the double-replace is much faster than the Regex (like, two orders of magnitude faster.)

    Output:
    Starting double-replace...
    ... took 00:00:00.6875000
    Starting regex replace...
    ... took 00:00:24.0937500

    Code:

    using System;
    using System.Text;
    using System.Text.RegularExpressions;

    namespace NewlineTester
    {
        /// <summary>
        /// Performance test of two ways to normalize newlines.
        /// </summary>
        class NewlineTester
        {
            private const int MIN_LINES = 500;
            private const int MAX_LINES = 1000;
            private const int MIN_WORDS_PER_LINE = 5;
            private const int MAX_WORDS_PER_LINE = 20;
            private const int MIN_LETTERS_PER_WORD = 1;
            private const int MAX_LETTERS_PER_WORD = 7;
            private const int TIMES_TO_RUN = 500;

            private System.Random r = new System.Random();

            private string GenerateSampleString()
            {
                StringBuilder sb = new StringBuilder();
                char[] alphabet = new char[26];
                string[] newlines = { "\r", "\n", "\r\n" };

                for (char i = (char)0; i < 26; i++)
                {
                    alphabet[(int)i] = (char)((int)'a' + (int)i);
                }

                int lines = r.Next(MIN_LINES, MAX_LINES);
                for (int i = 0; i < lines; i++)
                {
                    int words = r.Next(MIN_WORDS_PER_LINE, MAX_WORDS_PER_LINE);

                    for (int j = 0; j < words; j++)
                    {
                        int letters = r.Next(MIN_LETTERS_PER_WORD, MAX_LETTERS_PER_WORD);

                        for (int k = 0; k < letters; k++)
                        {
                            sb.Append(alphabet[r.Next(26)]);
                        }

                        sb.Append(" ");
                    }

                    sb.Append(newlines[r.Next(3)]);
                }

                return sb.ToString();
            }

            /// <summary>
            /// The main entry point for the application.
            /// </summary>
            [STAThread]
            static void Main(string[] args)
            {
                NewlineTester t = new NewlineTester();
                string sample = t.GenerateSampleString();
                string result;
                DateTime start;
                DateTime end;

                Console.WriteLine("Starting double-replace...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = sample.Replace("\r\n", "\n").Replace("\r", "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}", end - start);

                result = "";

                Console.WriteLine("Starting regex replace...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = Regex.Replace(sample, "\r\n?", "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}", end - start);

                result = "";
            }
        }
    }


    The "char i" stuff was an attempt to do character arithmetic, which failed.  I'm tempted to rewrite that block as:

    for (int i = 0; i < 26; i++)
    {
        alphabet[i] = (char)((int)'a' + i);
    }

    but I'm too lazy.

  • User profile image
    pacelvi

    W3bbo wrote:
    I heard some people say that C#'s "\n" escape character matches \r\n, \n, and \r.

    ...is this true?


    According to Section 2.4.4.4 ofthe C# Language Specification 1.2, the escape "\n" represents Unicode character 0x000A (In Decimal: 10). That's the Line Feed escape.

    A platform independent way to code a line break would be to use System.Enviroment.NewLine, which is defined as "A string containing "\r\n" for non-Unix platforms, or a string containing "\n" for Unix platforms. "

  • User profile image
    amotif

    Maurits wrote:
    I ran a performance analysis (using .Net 1.1) and the double-replace is much faster than the Regex (like, two orders of magnitude faster.)


    I've been surprised at the speed of String.Replace in the .Net 2 framework even when I "know better" how to accomplish multiple substitutions. More and more the simple solution with strings is the fast solution in v2. The internal implementation in String tends to be fast in many situations.

    Regex is a general pattern matching solution, and often "general" != "fast." Wink

  • User profile image
    W3bbo

    pacelvi wrote:
    W3bbo wrote:I heard some people say that C#'s "\n" escape character matches \r\n, \n, and \r.

    ...is this true?


    According to Section 2.4.4.4 ofthe C# Language Specification 1.2, the escape "\n" represents Unicode character 0x000A (In Decimal: 10). That's the Line Feed escape.

    A platform independent way to code a line break would be to use System.Enviroment.NewLine, which is defined as "A string containing "\r\n" for non-Unix platforms, or a string containing "\n" for Unix platforms. "


    as I said, Environment.NewLine doesn't work in ASP.NET because it returns the newline character(s) of the server, not the client.

  • User profile image
    Foxfire

    Maurits wrote:
    I ran a performance analysis (using .Net 1.1) and the double-replace is much faster than the Regex (like, two orders of magnitude faster.)

    Output:
    Starting double-replace...
    ... took 00:00:00.6875000
    Starting regex replace...
    ... took 00:00:24.0937500


    Sorry but thats no wonder that your regex code is so slow. You are creating a new Regex Object with every loop.

    Use: 

                Console.WriteLine("Starting regex replace...");
                start = DateTime.Now;
    Regex r = new Regex ("\r\n?", RegexOptions.Compiled);
     
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
    //result = Regex.Replace(sample, "\r\n?", "\n");
    result = r.Replace(sample, "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}", end - start);

    and you will get something like:

    Starting double-replace...
    ... took 00:00:00.6857796
    Starting regex replace...
    ... took 00:00:00.5143347

  • User profile image
    Maurits

    If the regex object is only going to be used once over its lifetime in the app, my code is the CORRECT way to judge performance.

    But it's a fair point.

    So I added a couple more tests and got this:

    (EDIT: Added a few more)

    Starting no-op as baseline...
    ... took 00:00:00

    Starting double-replace...
    ... took 00:00:00.4531221

    Starting regex replace, calling static method with Compiled...
    ... took 00:00:02.8906065

    Starting regex replace, reuse a single object with Compiled...
    ... took 00:00:02.6093583

    Starting regex replace, calling static method without Compiled...
    ... took 00:00:14.0155353

    Starting regex replace, reuse a single object without Compiled...
    ... took 00:00:14.0311602

    using System;
    using System.Text;
    using System.Text.RegularExpressions;

    namespace NewlineTester
    {
        /// <summary>
        /// Performance test of two ways to normalize newlines.
        /// </summary>
        class NewlineTester
        {
            private const int MIN_LINES = 500;
            private const int MAX_LINES = 1000;
            private const int MIN_WORDS_PER_LINE = 5;
            private const int MAX_WORDS_PER_LINE = 20;
            private const int MIN_LETTERS_PER_WORD = 1;
            private const int MAX_LETTERS_PER_WORD = 7;
            private const int TIMES_TO_RUN = 500;

            private System.Random r = new System.Random();

            private string GenerateSampleString()
            {
                StringBuilder sb = new StringBuilder();
                char[] alphabet = new char[26];
                string[] newlines = { "\r", "\n", "\r\n" };

                for (char i = (char)0; i < 26; i++)
                {
                    alphabet[(int)i] = (char)((int)'a' + (int)i);
                }

                int lines = r.Next(MIN_LINES, MAX_LINES);
                for (int i = 0; i < lines; i++)
                {
                    int words = r.Next(MIN_WORDS_PER_LINE, MAX_WORDS_PER_LINE);

                    for (int j = 0; j < words; j++)
                    {
                        int letters = r.Next(MIN_LETTERS_PER_WORD, MAX_LETTERS_PER_WORD);

                        for (int k = 0; k < letters; k++)
                        {
                            sb.Append(alphabet[r.Next(26)]);
                        }

                        sb.Append(" ");
                    }

                    sb.Append(newlines[r.Next(3)]);
                }

                return sb.ToString();
            }

            /// <summary>
            /// The main entry point for the application.
            /// </summary>
            [STAThread]
            static void Main(string[] args)
            {
                NewlineTester t = new NewlineTester();
                string sample = t.GenerateSampleString();
                string result;
                DateTime start;
                DateTime end;
                Regex r;

                Console.WriteLine("Starting no-op as baseline...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = "";
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";

                Console.WriteLine("Starting double-replace...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = sample.Replace("\r\n", "\n").Replace("\r", "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";

                Console.WriteLine("Starting regex replace, calling static method with Compiled...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = Regex.Replace(sample, "\r\n?", "\n", RegexOptions.Compiled);
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";

                Console.WriteLine("Starting regex replace, reuse a single object with Compiled...");
                start = DateTime.Now;
                r = new Regex("\r\n?", RegexOptions.Compiled);
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = r.Replace(sample, "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";

                Console.WriteLine("Starting regex replace, calling static method without Compiled...");
                start = DateTime.Now;
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = Regex.Replace(sample, "\r\n?", "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";

                Console.WriteLine("Starting regex replace, reuse a single object without Compiled...");
                start = DateTime.Now;
                r = new Regex("\r\n?");
                for (int i = 0; i < TIMES_TO_RUN; i++)
                {
                    result = r.Replace(sample, "\n");
                }
                end = DateTime.Now;
                Console.WriteLine("... took {0}\n", end - start);

                result = "";
            }
        }
    }



    EDIT: it has little to do with static method vs. object creation and everything to do with RegexOptions.Compiled.

    But Compiled only makes sense if you're going to reuse the regex frequenly.  If you only use it once, Compiled actually slows you down.

  • User profile image
    Foxfire

    Maurits wrote:

    So I added a couple more tests and got this:

    Starting double-replace...
    ... took 00:00:00.5468715

    Starting regex replace, reuse a single object with Compiled...
    ... took 00:00:03.9687246

    Starting regex replace, reuse a single object without Compiled...
    ... took 00:00:18.3123828

    Starting regex replace, calling static method...
    ... took 00:00:18.1561338


    Well I forgot to mention that I'm running .Net Framework 2.0.
    Seems as if regexes are much more optimized there.

    Starting double-replace...
    ... took 00:00:00.2968750

    Starting regex replace, reuse a single object with Compiled...
    ... took 00:00:00.2500000

    Starting regex replace, reuse a single object without Compiled...
    ... took 00:00:00.3437500

    Starting regex replace, calling static method...
    ... took 00:00:00.3593750

    (by the way - this is a different computer than the one I posted the other figures from)

    Here is another run (unfortunatelly you create random results every run):

    Starting double-replace...
    ... took 00:00:00.2812500

    Starting regex replace, reuse a single object with Compiled...
    ... took 00:00:00.2031250

    Starting regex replace, reuse a single object without Compiled...
    ... took 00:00:00.2812500

    Starting regex replace, calling static method...
    ... took 00:00:00.2968750

  • User profile image
    Maurits

    Foxfire wrote:

    I'm running .Net Framework 2.0.
    Seems as if regexes are much more optimized there.


    So it seems Smiley

    Foxfire wrote:

    unfortunatelly you create random results every run


    That's a feature...

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.