To all,
I have a Comma Separated Value (CSV) file that I am reading into my application. Prior to reading the file in I would like to verify the format line by line to ensure it is correct.
The file is read in using a FileStream and StreamReader.
The format is like this:
itemOne,itemTwo,itemThree,itemFour,itemFive,itemSix
I want to verify that each line within the CSV has 6 ",". If a line has less or more "," than a messageBox will be displayed and the CSV will not be read.
Has anyone tried doing this before? If so, do you have a code example I can use?
Thanks,
Intel96
-
-
See if this'll work...
private void Process()
{
string line;
int patternOccurance;
...
patternOccurance = Occurs(line, "\",\"");
if (patternOccurance != 6)
{
...
}
}
private int Occurs(string main, string pattern)
{
if (main == null || pattern == null) return 0;
int count = 0;
int startPos = 0;
while (main.IndexOf(pattern, startPos) >= 0)
{
count++;
startPos += pattern.Length;
}
return count;
}
-
I don't know what kind of processing you need to do with the row, but if you need to process each item separately you could just do Split() on the comma and see how many items are in your string[] and then process or not based on the count.
-
CSV processing is trickier than it looks if you want to be feature complete. You would have to worry about:
* data values without quotes around it
* data values with embedded commas (these need quotes)
* data values with embedded quotes (which should be doubled)
* data values with embedded newlines -
Matthew van Eerde wrote:CSV processing is trickier than it looks if you want to be feature complete. You would have to worry about:
* data values without quotes around it
* data values with embedded commas (these need quotes)
* data values with embedded quotes (which should be doubled)
* data values with embedded newlines
I was going to point this out, but assumed the OP knows the source of the data and it won't be like this, but of course I shouldn't assume since we know where that leads....
Yes, full CSVs allow for some tricky scenarios. -
This might work if there are no embedded commas in the data, like the before mentioned doublequote issue:
Dim strReadLine As String = "" ' this is the line you are reading from the streamreader
strReadLine = objStreamReader.ReadLine
Dim strData() As String
strData = strReadLine.Split(CChar(","))
If strData.Length <> 6 Then
MessageBox.Show("This line does not have exactly 6 items in it.")
Else
'success stuff here
End If -
I found this on the ASP.NET forums....
http://forums.asp.net/thread/471506.aspx -
Why don't you just use Regex? You can capture the 6 commas in a line plus a line break & return carriage. Just submit each line to the Regex object and it will either match or not. Just off the cuff here is a possible pattern for a line "^(?
?:[^'",\r\n\s]*|\"[^"\r\n]*\"),\s*){6}(?:\"[^"\r\n]*\"|[^'",\r\n]*)$".
Comment:
improved expression
valid input: 1, , "foo", "foo bar", "foo,bar", 233, " 123,321 anydata"
so(C#):
Regex _csvExpression =
New Regex(@"^(?
?:[^'",\r\n\s]*|\"[^"\r\n]*\"),\s*){6}(?:\"[^"\r\n]*\"|[^'",\r\n]*)$", RegexOptions.IgnoreCase);
List<String> _lines = GetCSVLinesArray();
int _lineNumber = 0;
foreach(String _line in _lines)
{
_lineNumber++;
if(!csvExpression.IsMatch(_line))
{
Console.WriteLine("Line {0}: {1} is not in correct format", lineNumber.ToString(), _line);
}
}
Just an idea, some of details were ellided so u can get an idea of the validation function.
Get Expresso, it's a .Net app that lets you play with regex. It's powerful and free, one of my favorite .Net language apps http://www.ultrapico.com/Expresso.htm. Disclaimer: I am not affiliated with Expresso or ultrapico, and this is information is provided without warranty and may be deeply flawed. Regards, ronin1855.
ps. if you want to check more stuff Regex is still the way to go. You want to use String camparison except for trivial tasks. Regex was meant to do richer stringer camparisons and is a lot more efficient. -
This slightly improved expression
"(?
?:[^'",\s]*|\"[^"]*\"),\s*){6}(?:\"[^"'\s]*\"|[^'",]*)"
allows for empty values, " delimited data and embedded (,) commas, exactly 6 comma seperated values, all data with spaces in the middle must be delimited by quotes.
Of course it can be improved vastly, but its a start. -
Just loop through the text file and check the value of each line doing UBound(Split(strLine,",")) seeing if the array returns 6 items (5 with count starting at 0).
-
So what happens when there is comma in quote delimited item - "foo, bar"? You get 6 items in your array
Thread Closed
This thread is kinda stale and has been closed but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.