Page 2 of 2
					
				
				Posted: Wed May 21, 2008 4:05 pm
				by Kiffi
				pdwyer wrote:okay, here is my first port attempt
whow! The code is quite fast. 
 
My first test shows 0.1 Seconds per 1000 lines. Your first code takes 2 
Seconds per 1000 lines. (tested with a 280 MB CSV file)
Unfortunately multiple lines are not supported. But the code
is a good starting point... 
Thanks for your help!
Greetings ... Kiffi
 
			 
			
					
				
				Posted: Wed May 21, 2008 4:16 pm
				by DoubleDutch
				Here is the one I wrote a while back to solve the quotes problem.
Code: Select all
Procedure.s X_StringField(string$,no,seperator$=",")
	done=#False
	count=1
	len=Len(string$)
	pos=0
	result$=""
	quotes=#False
	Repeat
		pos+1
		If pos>len
			done=#True
		Else
			ch$=Mid(string$,pos,1)
			If ch$=seperator$ And (Not quotes)
				If count=no
					done=#True
				Else
					result$=""
					count+1
				EndIf
			Else
				result$+ch$
				If ch$=Chr(34)
					quotes!#True
				EndIf
			EndIf
		EndIf
	Until done
	result$=Trim(result$)
	If count=no
		If Left(result$,1)=Chr(34)
			result$=Mid(result$,2)
		EndIf
		If Right(result$,1)=Chr(34)
			result$=Left(result$,Len(result$)-1)
		EndIf
	Else
		result$=""
	EndIf
	result$=Trim(result$)
	ProcedureReturn result$
EndProcedure
Seems to work ok for me. 

 
			 
			
					
				
				Posted: Thu May 22, 2008 12:17 am
				by pdwyer
				Kiffi wrote:whow! The code is quite fast. 
 
My first test shows 0.1 Seconds per 1000 lines. Your first code takes 2 
Seconds per 1000 lines. (tested with a 280 MB CSV file)
 
 
  news to me too, I hadn't tested that far. not my proc though so I can't take the credit  
 
Kiffi wrote:
Unfortunately multiple lines are not supported. But the code
is a good starting point... 
 
I'm not so sure I want to support that. I saw that in the spec too. I have a fast split function that I would use on the crlf first and it wouldn't work if CSV had that in it. Personally, if I thought I was going to work with fields with crlfs in there I wouldn't use CSV, I'd use a DB (eg sqlite) or a format that doesn't use crlf as a line delimeter (ie has no sense of a line as such).
Either that or I'd have some escape char for a in-field crlf and put it back in later
@DoubleDutch,  haven't looked at this yet, but theres some example csv on that page I linked, it looks like it would make a good basic test for complience, I try some of these tonight when I get home
 
			 
			
					
				
				Posted: Thu May 22, 2008 10:55 am
				by pdwyer
				There are two bugs in the code still (that I know of) and one bug I fixed in the previous code (an empty line would not clear the array if there was data so no 6 failed.
On the table at the top here 
http://www.xbeat.net/vbspeed/c_ParseCSV.php it still fails to correctly format no 18 and 22
Here is the test csv sheet
Code: Select all
a,b,c
"a",b,c
'a',b,c
 a , b , c 
aa,bb;cc
a
,b,
,,c
,,
"",b
" ",b
"a,b"
"a,b",c
" a , b ", c 
a b,c
a"b,c
"a""b",c
a""b,c
a,b",c
a,b"",c
a,"B: ""Hi, I'm B""",c
I'm still not quite sure how easy this is to fix, the proc works differently in PB due to the original use of lenb and midb in VB with unicode
 
			 
			
					
				
				Posted: Fri May 23, 2008 2:16 pm
				by pdwyer
				I think I have the bugs out now, it seems to do what I want it to do, there's still some room for performance increase so I might have another look at that later.
The code at the very start of the thread has been updated with the latest version as it was starting to get confusing for people coming later as to what version I was talking about  

 
			 
			
					
				
				Posted: Tue Jul 01, 2008 11:33 am
				by peterb
				Enjoy,
peterb 
Code: Select all
;- Author   : Petr Vavrin (peterb)
;- Location : Czech Republic
;- Email    : pb.pb (at) centrum (dot) cz 
Global characters = 0
#CSV_PARSER_COLUMNS = 200
 
Structure _CSVParseGlobals
  numberOfColumns.l
  Column$[#CSV_PARSER_COLUMNS]
EndStructure
Global CSVParse._CSVParseGlobals
Procedure ParseCSV ( csv_line.s, delimiter )
  numberOfColumns       = 0
  in_column             = #False
  column_string.s       = ""
  CSVParse\Column$[0]   = ""
 
  line_length = Len ( csv_line ) - 1
  
  For c = 0 To line_length
    char  = PeekB ( @csv_line + c )
    characters + 1 ; remove this line - for testing only
    If char = delimiter
      If c = 0
        numberOfColumns + 1
      
      ElseIf in_column And prev_char = '"'
        in_column     = #False
        CSVParse\Column$[ numberOfColumns ] = Left ( column_string, Len ( column_string ) - 1 )
        numberOfColumns + 1
        column_string = ""
      
      ElseIf in_column = #False
        CSVParse\Column$[ numberOfColumns ] = column_string
        numberOfColumns + 1
        column_string = ""
      Else
        column_string + Chr ( char )
      
      EndIf
    
    ElseIf char = '"'
      If c = 0
        in_column = #True
      
      ElseIf in_column And c = line_length
        in_column = #False
        
      ElseIf in_column = #False And prev_char = delimiter
        in_column = #True
      Else
        column_string + Chr ( char )
      EndIf
     
    Else
      column_string + Chr ( char )
    EndIf
    
    If char <> 32
      prev_char = char
    EndIf
  
  Next
  CSVParse\Column$[ numberOfColumns ] = column_string
  CSVParse\numberOfColumns            = numberOfColumns + 1
EndProcedure
; --- speed test ---
start = GetTickCount_()
For x = 1 To 10000
  pointer = ?start_data
  While pointer < ?end_data
    text.s = PeekS ( pointer )
    pointer + StringByteLength ( text ) + 1
    ParseCSV ( text, ',' )
  Wend
Next
time = GetTickCount_() - start
MessageRequester("", "time: " + Str ( time ) + " ms" + Chr( 10 ) + "characters: " + Str ( characters ) + Chr(10) + Str ( characters/time ) + " chr / ms ")
Debug "time: " + Str ( time ) + " ms"
Debug "characters: " + Str ( characters )
Debug Str ( characters/time ) + " chr / ms "
Debug ""
; --- show results ---
pointer = ?start_data
While pointer < ?end_data
  text.s = PeekS ( pointer )
  pointer + StringByteLength ( text ) + 1
  
  ParseCSV ( text, ',' )
  
  OUT.s = ""
  For i = 0 To CSVParse\numberOfColumns - 1
    OUT + CSVParse\Column$[i] + " | "
  Next 
  Debug text
  Debug OUT
  Debug ""
 
Wend
; --- source data ---
DataSection
  start_data:
    Data.s "a,b,c"
    Data.s Chr(34) + "a" + Chr(34) + ",b,c"
    Data.s "'a',b,c"
    Data.s "a , b , c"
    Data.s "aa,bb;cc"
    Data.s ""
    Data.s "a"
    Data.s ",b,"
    Data.s ",,c"
    Data.s ",,"
    Data.s Chr(34) + Chr(34) + ",b"
    Data.s Chr(34) + " " + Chr(34) + ",b"
    Data.s Chr(34) + "a,b" + Chr(34)
    Data.s Chr(34) + "a,b" + Chr(34) + ",c"
    Data.s Chr(34) + " a , b " + Chr(34) + ", c"
    Data.s "a b,c"
    Data.s "a" + Chr(34) + "b,c"
    Data.s Chr(34) + "a" + Chr(34) + Chr(34) + "b" + Chr(34) + ",c"
    Data.s "a" + Chr(34) + Chr(34) + "b,c"
    Data.s "a,b" + Chr(34) + ",c"
    Data.s "a,b"+ Chr(34) + Chr(34) + ",c"
    Data.s "a," + Chr(34) + "B: " + Chr(34) + Chr(34) + "Hi, I'm B" + Chr(34) + Chr(34) + Chr(34) +",c"
  end_data:
EndDataSection    
 
			 
			
					
				Re: CSV and Quotes
				Posted: Sat Aug 07, 2021 1:27 pm
				by pdwyer
				I'm looking for a way to speed this up (a lot)
One thing I did find which seems to be a bit faster and a lot less code is a regex version...
If anyone knows of anything significantly faster, please share. In these days of bigger data this need is getting more common.
The magic of the regex for this code comes from the same link in the original post
Code: Select all
Declare.l ParseCSV02(sExpr.s, Array CSVFieldVals.s(1)) 
If CreateRegularExpression(0,"(\s*"+Chr(34)+"[^"+Chr(34)+"]*"+Chr(34)+"\s*,)|(\s*[^,]*\s*,)")
    
    Dim Vals.s(0)
    OpenFile(1,"F:\Programming\PureBasicCode\csv.csv")  ;change this!
    While Not Eof(1)
    
        CSVString.s = ReadString(1)
        ValCount = ParseCSV02(CSVString,Vals())
        Debug "Column Count: " + Str(ValCount) + "      " + CSVString
        
        For i = 0 To valcount -1
            Debug vals(i)
        Next
        
    Wend    
    CloseFile(1)
Else
    Debug RegularExpressionError()
EndIf
Procedure.l ParseCSV02(sExpr.s, Array CSVFieldVals.s(1)) 
    mc.l = ExtractRegularExpression(0,sExpr + ",",CSVFieldVals())
    
    For i = 0 To mc - 1
        CSVFieldVals(i) = Left(CSVFieldVals(i),Len(CSVFieldVals(i))-1)    
    Next
    ProcedureReturn mc
EndProcedure
 
			 
			
					
				Re: CSV and Quotes
				Posted: Sat Aug 07, 2021 2:36 pm
				by mk-soft
				I don't know ist faster, but i use this 
Link: 
viewtopic.php?f=12&t=69557 
			 
			
					
				Re: CSV and Quotes
				Posted: Sun Aug 08, 2021 2:48 am
				by pdwyer
				it is faster, thanks