Tech Off Thread

13 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Simple string sanitization?

Back to Forum: Tech Off
  • User profile image
    Dr Herbie

    Hi,
        OK, so we all know (or should know) that strings have to be sanitised to prevent security issues for both the database and the HTML interface to prevent script injection, cross-site scripting etc.

    So I sat down to write a string sanitization routine and it came out really, really simple.  Now I'm worried that I've missed something.

    First let me point out that all non-string values in my middleware are stored as typed values. E.g. If I ask the user to enter a date it is stored in the middleware as a DateTime instance, not as a string.  This means that I'm only worried about string values.

    So my sanitization routine comes down to:

    Replace single quote with two single quotes -- this prevents users from closing a string parameter in SQL and adding their own SQL clauses.

    Replace less-than symbol with "&lt" and greater-than symbol with "&gt" -- this prevents any HTML script from being executed as HTML rather than displayed as text.

    public string Sanitize(string usString)
    {
        usString = usString.Replace("'", "''");
        usString = usString.Replace("<", "&lt");
        usString = usString.Replace(">", "&gt");
    }

    That's it.  It's too simple for my liking.

    Can I get a sanity check on this? Perplexed  Have I missed something glaringly ovbious?

    Thanks
    Herbie



  • User profile image
    Sven Groot

    If you're doing a number of replacements, consider using a StringBuilder. It's significantly faster, since it doesn't create a new String object for every replacement you do.

    Also, check out the HttpUtility.HtmlEncode (HttpServerUtility.HtmlEncode if you're doing ASP.NET server code). It won't do the quote replacement, but it will take care of &lt; and &gt;, as well as other characters that need to be represented as entities in HTML.

  • User profile image
    TommyCarlier

    Here's a function I wrote to encode text as HTML:

    public static string HtmlEncode(string s) {
       StringBuilder b = new StringBuilder(s.Length * 3 / 2);
       foreach(char c in s)
          // characters above ~ are converted to a hexadecimal entity
          if (c > '~') b.AppendFormat("&#x{0:x};", (long)c);
          else switch(c) {
             case '"': b.Append("&quot;"); break; // double quote
             case '&': b.Append("&amp;"); break;
             case '<': b.Append("&lt;"); break;
             case '>': b.Append("&gt;"); break;
             default: b.Append(c); break;
          }
       return b.ToString();
    }

    You could add the following case in the switch-statement to also catch the single-quote for SQL: case '\'': b.Append("''"); break;

  • User profile image
    Dr Herbie

    Sven Groot wrote:
    If you're doing a number of replacements, consider using a StringBuilder. It's significantly faster, since it doesn't create a new String object for every replacement you do.


    Have you profiled that?  I have used NProf to profile string concatenation with and without a string builder and found that for small numbers of iterations, it was actually quicker to NOT use the string builder.


  • User profile image
    Maurits

    This is horribly, horribly wrong.  STOP.

    Your use of the term "sanitize" borders on ethnic cleansing.  There's nothing wrong with the apostrophe, the less-than sign, and the greater-than sign.  Killing off these characters isn't going to make you any friends.

    Instead of sanitizing, encode.  Data is data.  How to encode it depends on where you put it... if you are printing data as HTML, HTMLEncode it:

    replace & with &amp;
    replace < with &lt;
    replace > with &gt;
    replace " with &quot;
    replace newlines with <br> plus a newline

    if you're passing data as a parameter into a SQL string, SQL-encode it:

    replace ' with ''
    (EDIT: This assumes you're passing the parameter as a string.
    If you're passing the parameter as a numeric type, cast the value to that type or you're still vulnerable to injection.)

    if you're putting data into a javascript literal, javascript-encode it:


    t = "<% = JavascriptEncode(t) %>";

    // pseudo-C#
    string JavascriptEncode(string t)
    {
        // replace \ with \\ and " with \" and ' with \'
        return /(\\|"|')/g.replace(t, "\\$1");
    }

  • User profile image
    Wells

    Don't reinvent the wheel.  I don't even need to check and I know that there will be a standard function to perform html encoding for you.

    Another point worth mentioning: if you can, rather than 'sanitizing' input by removing bad stuff, how about only allowing correct stuff.

    For example, if you are prompting the user to enter their username, use a regex to enforce something like [A-Za-z0-9]+ as well as encoding it correctly.  By checking only for bad stuff, you are always going to miss something.  Like you did in your code Wink

    So once again, use the standard functions.  There are functions for everything you will need to do: html encode, url encode, SQL escape... everything!

    EDIT: And I see you're using the "us" prefix that Joel Spolsky has written about?  That's awsome, works really well.

  • User profile image
    Dr Herbie

    Maurits wrote:
    This is horribly, horribly wrong.  STOP.

    Your use of the term "sanitize" borders on ethnic cleansing. 


    No, no, tell us what you really think. Big Smile


    You are, of course completely right.  string safety should not be done centrally, but at the point of risk.  That's the sanity check I needed. Thanks.

  • User profile image
    Dr Herbie

    Wells wrote:

    For example, if you are prompting the user to enter their username, use a regex to enforce something like [A-Za-z0-9]+ as well as encoding it correctly.  By checking only for bad stuff, you are always going to miss something.  Like you did in your code


    Personally I have always found that using regex to limit user input is error prone (having to debug the regex string is a real pain) and that user's immediately request that they can add something I didn't think of.  I discovered this the hard way back when I used Borland C++ (which had regex classes way back in the bad old days of the early 1990s).

    Wells wrote:

    So once again, use the standard functions.  There are functions for everything you will need to do: html encode, url encode, SQL escape... everything!


    Does anyone know a standard function for encoding SQL Escapes? I found encoding methods for URI, Regex and HTML, but not SQL.

    Wells wrote:

    EDIT: And I see you're using the "us" prefix that Joel Spolsky has written about?  That's awsome, works really well.


    Yes, they make more sense this time around.  After 5 or 6 years of using them in C++ I gave up on the popular version of Hungarian notation when I moved to C#, but Joel's point about it being a b*stardisation (will that get past the language filter?) of the original conept made me want to look at it again.
    So far I have been unable to come up with another prefix other than re-using the 'c' prefix for numbers that are used as increamental counts. This was pretty much the only one in the popular Hungarian notation that made sense.



  • User profile image
    Maurits

    Dr Herbie wrote:
    Does anyone know a standard function for encoding SQL Escapes? I found encoding methods for URI, Regex and HTML, but not SQL.


    If you're building a SQL string from within a T-SQL routine, you can use the T-SQL function QUOTENAME(@t, '''')

    If you're building a SQL string from an external source, there are the various "sql parameter" library functions

    If the SQL parameter is a literal string, it really is as simple as replacing ' with '' everywhere.  If the SQL parameter is of some other type, casting  to that type should do it.

  • User profile image
    Rossj

    Wells, you must be thinking of System.Web.HttpUtility.HtmlDecode() and HtmlEncode()

  • User profile image
    obiefuna

    I am thinking of complex search forms requiring ANDing and ORing of all these text string parameters in a dynamic sql WHERE clause ... I suppose that someone may actually slip in something like "anytext OR 1 eq 1  /*", giving rise to a where clause like

    WHERE name = 'anytext OR 1 eq 1  /*'

    or other contrived variants of this. How do we encode to prevent it without blocking business value?

    Maybe it should be enough to encode all sql comparison operators? Just thinking.

  • User profile image
    figuerres

    obiefuna wrote:
    I am thinking of complex search forms requiring ANDing and ORing of all these text string parameters in a dynamic sql WHERE clause ... I suppose that someone may actually slip in something like "anytext OR 1 eq 1  /*", giving rise to a where clause like

    WHERE name = 'anytext OR 1 eq 1  /*'

    or other contrived variants of this. How do we encode to prevent it without blocking business value?

    Maybe it should be enough to encode all sql comparison operators? Just thinking.


    the safest answer: do not let the user do it.

    give the user a set of options to combine.

    then write the code to build your sql.

    so you limit what the user can enter ....

    most of the time users who ask for fancy serach options will never use them.

    often you can work with the user to find out what they want to find say 80-90% of the time, fix that first.
    then deal with the edge cases...

  • User profile image
    Matthew van Eerde

    obiefuna wrote:
    WHERE name = 'anytext OR 1 eq 1  /*'


    If you're adding text into a SQL literal, replace ' with '' and you're good to go.  Or use the middleware's "param" syntactic sugar to do it for you... usually something like

    sql = "WHERE colName = '@paramName'"

    Params.Add("@paramName", varName)

    If you're scrupulous about encoding, would-be hackers will generate some vary strange-looking dynamic SQL, but all of it harmless.

    http://www.xkcd.com/327/

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.