Page 2 of 2
Posted: Wed May 21, 2008 4:05 pm
by Kiffi
pdwyer wrote:okay, here is my first port attempt
whow! The code is quite fast.
My first test shows 0.1 Seconds per 1000 lines. Your first code takes 2
Seconds per 1000 lines. (tested with a 280 MB CSV file)
Unfortunately multiple lines are not supported. But the code
is a good starting point...
Thanks for your help!
Greetings ... Kiffi
Posted: Wed May 21, 2008 4:16 pm
by DoubleDutch
Here is the one I wrote a while back to solve the quotes problem.
Code: Select all
Procedure.s X_StringField(string$,no,seperator$=",")
done=#False
count=1
len=Len(string$)
pos=0
result$=""
quotes=#False
Repeat
pos+1
If pos>len
done=#True
Else
ch$=Mid(string$,pos,1)
If ch$=seperator$ And (Not quotes)
If count=no
done=#True
Else
result$=""
count+1
EndIf
Else
result$+ch$
If ch$=Chr(34)
quotes!#True
EndIf
EndIf
EndIf
Until done
result$=Trim(result$)
If count=no
If Left(result$,1)=Chr(34)
result$=Mid(result$,2)
EndIf
If Right(result$,1)=Chr(34)
result$=Left(result$,Len(result$)-1)
EndIf
Else
result$=""
EndIf
result$=Trim(result$)
ProcedureReturn result$
EndProcedure
Seems to work ok for me.

Posted: Thu May 22, 2008 12:17 am
by pdwyer
Kiffi wrote:whow! The code is quite fast.
My first test shows 0.1 Seconds per 1000 lines. Your first code takes 2
Seconds per 1000 lines. (tested with a 280 MB CSV file)

news to me too, I hadn't tested that far. not my proc though so I can't take the credit
Kiffi wrote:
Unfortunately multiple lines are not supported. But the code
is a good starting point...
I'm not so sure I want to support that. I saw that in the spec too. I have a fast split function that I would use on the crlf first and it wouldn't work if CSV had that in it. Personally, if I thought I was going to work with fields with crlfs in there I wouldn't use CSV, I'd use a DB (eg sqlite) or a format that doesn't use crlf as a line delimeter (ie has no sense of a line as such).
Either that or I'd have some escape char for a in-field crlf and put it back in later
@DoubleDutch, haven't looked at this yet, but theres some example csv on that page I linked, it looks like it would make a good basic test for complience, I try some of these tonight when I get home
Posted: Thu May 22, 2008 10:55 am
by pdwyer
There are two bugs in the code still (that I know of) and one bug I fixed in the previous code (an empty line would not clear the array if there was data so no 6 failed.
On the table at the top here
http://www.xbeat.net/vbspeed/c_ParseCSV.php it still fails to correctly format no 18 and 22
Here is the test csv sheet
Code: Select all
a,b,c
"a",b,c
'a',b,c
a , b , c
aa,bb;cc
a
,b,
,,c
,,
"",b
" ",b
"a,b"
"a,b",c
" a , b ", c
a b,c
a"b,c
"a""b",c
a""b,c
a,b",c
a,b"",c
a,"B: ""Hi, I'm B""",c
I'm still not quite sure how easy this is to fix, the proc works differently in PB due to the original use of lenb and midb in VB with unicode
Posted: Fri May 23, 2008 2:16 pm
by pdwyer
I think I have the bugs out now, it seems to do what I want it to do, there's still some room for performance increase so I might have another look at that later.
The code at the very start of the thread has been updated with the latest version as it was starting to get confusing for people coming later as to what version I was talking about

Posted: Tue Jul 01, 2008 11:33 am
by peterb
Enjoy,
peterb
Code: Select all
;- Author : Petr Vavrin (peterb)
;- Location : Czech Republic
;- Email : pb.pb (at) centrum (dot) cz
Global characters = 0
#CSV_PARSER_COLUMNS = 200
Structure _CSVParseGlobals
numberOfColumns.l
Column$[#CSV_PARSER_COLUMNS]
EndStructure
Global CSVParse._CSVParseGlobals
Procedure ParseCSV ( csv_line.s, delimiter )
numberOfColumns = 0
in_column = #False
column_string.s = ""
CSVParse\Column$[0] = ""
line_length = Len ( csv_line ) - 1
For c = 0 To line_length
char = PeekB ( @csv_line + c )
characters + 1 ; remove this line - for testing only
If char = delimiter
If c = 0
numberOfColumns + 1
ElseIf in_column And prev_char = '"'
in_column = #False
CSVParse\Column$[ numberOfColumns ] = Left ( column_string, Len ( column_string ) - 1 )
numberOfColumns + 1
column_string = ""
ElseIf in_column = #False
CSVParse\Column$[ numberOfColumns ] = column_string
numberOfColumns + 1
column_string = ""
Else
column_string + Chr ( char )
EndIf
ElseIf char = '"'
If c = 0
in_column = #True
ElseIf in_column And c = line_length
in_column = #False
ElseIf in_column = #False And prev_char = delimiter
in_column = #True
Else
column_string + Chr ( char )
EndIf
Else
column_string + Chr ( char )
EndIf
If char <> 32
prev_char = char
EndIf
Next
CSVParse\Column$[ numberOfColumns ] = column_string
CSVParse\numberOfColumns = numberOfColumns + 1
EndProcedure
; --- speed test ---
start = GetTickCount_()
For x = 1 To 10000
pointer = ?start_data
While pointer < ?end_data
text.s = PeekS ( pointer )
pointer + StringByteLength ( text ) + 1
ParseCSV ( text, ',' )
Wend
Next
time = GetTickCount_() - start
MessageRequester("", "time: " + Str ( time ) + " ms" + Chr( 10 ) + "characters: " + Str ( characters ) + Chr(10) + Str ( characters/time ) + " chr / ms ")
Debug "time: " + Str ( time ) + " ms"
Debug "characters: " + Str ( characters )
Debug Str ( characters/time ) + " chr / ms "
Debug ""
; --- show results ---
pointer = ?start_data
While pointer < ?end_data
text.s = PeekS ( pointer )
pointer + StringByteLength ( text ) + 1
ParseCSV ( text, ',' )
OUT.s = ""
For i = 0 To CSVParse\numberOfColumns - 1
OUT + CSVParse\Column$[i] + " | "
Next
Debug text
Debug OUT
Debug ""
Wend
; --- source data ---
DataSection
start_data:
Data.s "a,b,c"
Data.s Chr(34) + "a" + Chr(34) + ",b,c"
Data.s "'a',b,c"
Data.s "a , b , c"
Data.s "aa,bb;cc"
Data.s ""
Data.s "a"
Data.s ",b,"
Data.s ",,c"
Data.s ",,"
Data.s Chr(34) + Chr(34) + ",b"
Data.s Chr(34) + " " + Chr(34) + ",b"
Data.s Chr(34) + "a,b" + Chr(34)
Data.s Chr(34) + "a,b" + Chr(34) + ",c"
Data.s Chr(34) + " a , b " + Chr(34) + ", c"
Data.s "a b,c"
Data.s "a" + Chr(34) + "b,c"
Data.s Chr(34) + "a" + Chr(34) + Chr(34) + "b" + Chr(34) + ",c"
Data.s "a" + Chr(34) + Chr(34) + "b,c"
Data.s "a,b" + Chr(34) + ",c"
Data.s "a,b"+ Chr(34) + Chr(34) + ",c"
Data.s "a," + Chr(34) + "B: " + Chr(34) + Chr(34) + "Hi, I'm B" + Chr(34) + Chr(34) + Chr(34) +",c"
end_data:
EndDataSection
Re: CSV and Quotes
Posted: Sat Aug 07, 2021 1:27 pm
by pdwyer
I'm looking for a way to speed this up (a lot)
One thing I did find which seems to be a bit faster and a lot less code is a regex version...
If anyone knows of anything significantly faster, please share. In these days of bigger data this need is getting more common.
The magic of the regex for this code comes from the same link in the original post
Code: Select all
Declare.l ParseCSV02(sExpr.s, Array CSVFieldVals.s(1))
If CreateRegularExpression(0,"(\s*"+Chr(34)+"[^"+Chr(34)+"]*"+Chr(34)+"\s*,)|(\s*[^,]*\s*,)")
Dim Vals.s(0)
OpenFile(1,"F:\Programming\PureBasicCode\csv.csv") ;change this!
While Not Eof(1)
CSVString.s = ReadString(1)
ValCount = ParseCSV02(CSVString,Vals())
Debug "Column Count: " + Str(ValCount) + " " + CSVString
For i = 0 To valcount -1
Debug vals(i)
Next
Wend
CloseFile(1)
Else
Debug RegularExpressionError()
EndIf
Procedure.l ParseCSV02(sExpr.s, Array CSVFieldVals.s(1))
mc.l = ExtractRegularExpression(0,sExpr + ",",CSVFieldVals())
For i = 0 To mc - 1
CSVFieldVals(i) = Left(CSVFieldVals(i),Len(CSVFieldVals(i))-1)
Next
ProcedureReturn mc
EndProcedure
Re: CSV and Quotes
Posted: Sat Aug 07, 2021 2:36 pm
by mk-soft
I don't know ist faster, but i use this
Link:
viewtopic.php?f=12&t=69557
Re: CSV and Quotes
Posted: Sun Aug 08, 2021 2:48 am
by pdwyer
it is faster, thanks