Tech Off Thread

11 posts

Comma Separated Value (CSV) file

Back to Forum: Tech Off
  • User profile image
    intel96

    To all,

    I have a Comma Separated Value (CSV) file that I am reading into my application.  Prior to reading the file in I would like to verify the format line by line to ensure it is correct.

    The file is read in using a FileStream and StreamReader.

    The format is like this:

    itemOne,itemTwo,itemThree,itemFour,itemFive,itemSix

    I want to verify that each line within the CSV has 6 ",".  If a line has less or more "," than a messageBox will be displayed and the CSV will not be read.

    Has anyone tried doing this before?  If so, do you have a code example I can use?


    Thanks,
    Intel96

  • User profile image
    Minh

    See if this'll work...



    private void Process()
    {
       string line;
       int patternOccurance;
       ...
      
       patternOccurance = Occurs(line, "\",\"");

       if (patternOccurance != 6)
       {
          ...
       }
    }

    private int Occurs(string main, string pattern)
    {
       if (main == null || pattern == null) return 0;

       int count = 0;
       int startPos = 0;

       while (main.IndexOf(pattern, startPos) >= 0)
       {
          count++;
          startPos += pattern.Length;
       }

       return count;
    }

  • User profile image
    JPeless

    I don't know what kind of processing you need to do with the row, but if you need to process each item separately you could just do Split() on the comma and see how many items are in your string[] and then process or not based on the count.

  • User profile image
    Matthew van Eerde

    CSV processing is trickier than it looks if you want to be feature complete.  You would have to worry about:

    * data values without quotes around it
    * data values with embedded commas (these need quotes)
    * data values with embedded quotes (which should be doubled)
    * data values with embedded newlines

  • User profile image
    JPeless

    Matthew van Eerde wrote:
    CSV processing is trickier than it looks if you want to be feature complete.  You would have to worry about:

    * data values without quotes around it
    * data values with embedded commas (these need quotes)
    * data values with embedded quotes (which should be doubled)
    * data values with embedded newlines


    I was going to point this out, but assumed the OP knows the source of the data and it won't be like this, but of course I shouldn't assume since we know where that leads....

    Yes, full CSVs allow for some tricky scenarios.

  • User profile image
    Red5

    This might work if there are no embedded commas in the data, like the before mentioned doublequote issue:

    Dim strReadLine As String = ""  ' this is the line you are reading from the streamreader
    strReadLine = objStreamReader.ReadLine
    Dim strData() As String
    strData = strReadLine.Split(CChar(","))

    If strData.Length <> 6 Then
       MessageBox.Show("This line does not have exactly 6 items in it.")
    Else
       'success stuff here
    End If

  • User profile image
    ScanIAm

    I found this on the ASP.NET forums....

    http://forums.asp.net/thread/471506.aspx

  • User profile image
    ronin1855

    Why don't you just use Regex? You can capture the 6 commas in a line plus a line break & return carriage. Just submit each line to the Regex object and it will either match or not. Just off the cuff here is a possible pattern for a line "^(?Sad?:[^'",\r\n\s]*|\"[^"\r\n]*\"),\s*){6}(?:\"[^"\r\n]*\"|[^'",\r\n]*)$".

    Comment:
    improved expression
     valid input:  1, , "foo", "foo bar", "foo,bar", 233, " 123,321 anydata"

    so(C#):

    Regex _csvExpression =
    New Regex(@"^(?Sad?:[^'",\r\n\s]*|\"[^"\r\n]*\"),\s*){6}(?:\"[^"\r\n]*\"|[^'",\r\n]*)$", RegexOptions.IgnoreCase);

    List<String> _lines = GetCSVLinesArray();
    int _lineNumber = 0;
    foreach(String _line in _lines)
    {
        _lineNumber++;
        if(!csvExpression.IsMatch(_line))
        {
           Console.WriteLine("Line {0}: {1} is not in correct format",                        lineNumber.ToString(), _line);
        }
    }

    Just an idea, some of details were ellided so u can get an idea of the validation function.

    Get Expresso, it's a .Net app that lets you play with regex. It's powerful and free, one of my favorite .Net language apps http://www.ultrapico.com/Expresso.htm. Disclaimer: I am not affiliated with Expresso or ultrapico, and this is information is provided without warranty and may be deeply flawed. Regards, ronin1855.

    ps. if you want to check more stuff Regex is still the way to go. You want to use String camparison except for trivial tasks. Regex was meant to do richer stringer camparisons and is a lot more efficient.

  • User profile image
    ronin1855

    This slightly improved expression
    "(?Sad?:[^'",\s]*|\"[^"]*\"),\s*){6}(?:\"[^"'\s]*\"|[^'",]*)"

    allows for empty values, " delimited data and embedded (,) commas, exactly 6 comma seperated values, all data with spaces in the middle must be delimited by quotes.

    Of course  it can be improved vastly, but its a start.

  • User profile image
    shuff1203

    Just loop through the text file and check the value of each line doing UBound(Split(strLine,","))  seeing if the array returns 6 items (5 with count starting at 0).

     

     

  • User profile image
    ronin1855

    So what happens when there is comma in quote delimited item - "foo, bar"? You get 6 items in your array

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.