Page 1 of 2
Parsing a Line
Posted: Mon Feb 23, 2009 7:25 pm
by tbohon
I do a lot of work with files where each line contains fields delimited by a pipe character ( | ). In self-defense, I wrote a quick routine to break these lines up into an array so that I can work with each value and I thought the routine might be useful for others.
Here's the code:
Code: Select all
l$ = "Some|line|separated|by|pipes"
Dim ary$(5) ; set the dimension of the array for the # of fields in the line
For i=1 To Len(l$)
If Mid(l$,i,1) <> "|"
s$ = s$ + Mid(l$,i,1)
Else
ary$(ndx) = s$
ndx = ndx + 1
s$ = ""
EndIf
Next i
ary$(ndx) = s$
Note that I've used this same algorithm in a variety of languages and it's easily transportable.
Enjoy - hope it helps someone.
Best,
Tom
Posted: Mon Feb 23, 2009 8:11 pm
by Comtois
or you can use StringField()
Code: Select all
l$ = "Some|line|separated|by|pipes"
Nb = CountString(l$,"|")
Dim ary$(Nb) ; set the dimension of the array for the # of fields in the line
For i=0 To Nb
ary$(i)=StringField(l$,i+1,"|")
Debug ary$(i)
Next
Posted: Mon Feb 23, 2009 8:32 pm
by rsts
or perhaps
Code: Select all
l$ = "Some|line|separated|by|pipes"
Structure Chartype
c.b
EndStructure
*ptr.CharType
*ptr=@l$
While *ptr < @l$+len(l$)
If (*ptr\c) = $7C
debug "pipe@ " + str(*ptr-@l$)
endif
*ptr+1
wend
cheers
Posted: Mon Feb 23, 2009 9:21 pm
by tbohon
Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.
rsts - pointers??? Yuch!!!

Seriously, good point ... most folks find pointers to be confusing and, after 13 years of teaching programming in various languages at the college level, I tend to think simply as well.
Appreciate the comments!
Best,
Tom
Posted: Mon Feb 23, 2009 9:28 pm
by Comtois
tbohon wrote:Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.
Tom
oups, sorry

Posted: Mon Feb 23, 2009 9:47 pm
by srod
Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.
Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :
Code: Select all
l$ = "Some|line|separated|by|pipes"
*ptr.CHARACTER
*ptr=@l$
While *ptr\c
If (*ptr\c) = $7C
Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1))
EndIf
*ptr+SizeOf(CHARACTER)
Wend
Posted: Mon Feb 23, 2009 11:36 pm
by tbohon
Clever, srod ... thanks for sharing.
Tom
Posted: Tue Feb 24, 2009 12:06 am
by Little John
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again?
I've asked this myself, too. Well, I hope that each call starts where the previous call has finished.
I think we could give an answer by doing some speed tests, but so far I didn't do so ...
Regards, Lazy John
Posted: Tue Feb 24, 2009 12:49 am
by pdwyer
This is one of the reasons I created a split function. I was using stringfield to go through very long strings and it wasn't pretty. likewise rewriting the stringfield to handle more than one char but having the search using the same logic and findstring wasn't much better.
In the end I wrote a split() function that populated an array that you pass it (it's here in the tips somewhere) and on the text I was working with (1mb+) it was thousands of times faster. On small strings though, parsestring() works just fine and is simpler though.
As per the story from JoelOnSoftware, "Shlemiel the painter"
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"
"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"
It's an interesting article
http://www.joelonsoftware.com/articles/ ... 00319.html
Posted: Sun Mar 01, 2009 10:04 am
by Little John

Very interesting indeed, thank you!
I found
Schlemiel the painter's Algorithm also on
Wikipedia:
Coined in 2001 [by Joel Spolsky], the term has since becoming part of the vernacular
Regards, Little John
Posted: Sun Mar 01, 2009 12:39 pm
by Trond
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.
No, because Mid() also starts from the beginning each time.
Posted: Sun Mar 01, 2009 2:36 pm
by ricardo
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.
Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :
Code: Select all
l$ = "Some|line|separated|by|pipes"
*ptr.CHARACTER
*ptr=@l$
While *ptr\c
If (*ptr\c) = $7C
Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1))
EndIf
*ptr+SizeOf(CHARACTER)
Wend
StringField allows me to go directly to the one i need like:
l$ = "Some|line|separated|by|pipes"
Debug StringField(l$,3,"|")
How to achive this by this way?
Posted: Sun Mar 01, 2009 6:29 pm
by rsts
Guess you could add a matches counter and check it.
cheers
Posted: Sun Mar 01, 2009 9:35 pm
by ricardo
rsts wrote:Guess you could add a matches counter and check it.
cheers
Yes, you are right, I dont realise (i dont even think about it) that StringField dont go straight foward to the match.
Sometimes im a little naive

Posted: Sun Mar 01, 2009 10:25 pm
by Hroudtwolf
Not fast but regex XD
Code: Select all
Define.s sSource = "Some|line|separated|by|pipes"
Define.i nMatches
Define *RegEx
Dim sMatch.s ( 0 )
*RegEx = CreateRegularExpression ( #PB_Any , "[a-zA-Z_0-9,\.;\\\/\+\*\-\#\~´`'<>\&\%\§!\(\)=\}\{\^]+" )
If Not *RegEx
Debug "Error."
End
EndIf
nMatches = ExtractRegularExpression( *RegEx , sSource , sMatch () )
If Not nMatches
Debug "Error."
FreeRegularExpression ( *RegEx )
End
EndIf
For nI = 1 To nMatches
Debug sMatch ( nI - 1 )
Next nI
FreeRegularExpression ( *RegEx )
Regards
Wolf