Page 1 of 2

Parsing a Line

Posted: Mon Feb 23, 2009 7:25 pm
by tbohon
I do a lot of work with files where each line contains fields delimited by a pipe character ( | ). In self-defense, I wrote a quick routine to break these lines up into an array so that I can work with each value and I thought the routine might be useful for others.

Here's the code:

Code: Select all

l$ = "Some|line|separated|by|pipes"

Dim ary$(5)    ; set the dimension of the array for the # of fields in the line

For i=1 To Len(l$)
  If Mid(l$,i,1) <> "|"
    s$ = s$ + Mid(l$,i,1)
  Else
    ary$(ndx) = s$
    ndx = ndx + 1
    s$ = ""
  EndIf
Next i

ary$(ndx) = s$
Note that I've used this same algorithm in a variety of languages and it's easily transportable.

Enjoy - hope it helps someone.

Best,

Tom

Posted: Mon Feb 23, 2009 8:11 pm
by Comtois
or you can use StringField() :)

Code: Select all

l$ = "Some|line|separated|by|pipes"

Nb = CountString(l$,"|")
Dim ary$(Nb)    ; set the dimension of the array for the # of fields in the line

For i=0 To Nb
 ary$(i)=StringField(l$,i+1,"|")
 Debug ary$(i)
Next  

Posted: Mon Feb 23, 2009 8:32 pm
by rsts
or perhaps

Code: Select all

l$ = "Some|line|separated|by|pipes" 
Structure Chartype   
  c.b     
EndStructure
*ptr.CharType
*ptr=@l$
While *ptr < @l$+len(l$)    
  If (*ptr\c) = $7C
    debug "pipe@ " + str(*ptr-@l$)
    endif
  *ptr+1 
wend
cheers

Posted: Mon Feb 23, 2009 9:21 pm
by tbohon
Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.

rsts - pointers??? Yuch!!! :) Seriously, good point ... most folks find pointers to be confusing and, after 13 years of teaching programming in various languages at the college level, I tend to think simply as well.

Appreciate the comments!

Best,

Tom

Posted: Mon Feb 23, 2009 9:28 pm
by Comtois
tbohon wrote:Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.

Tom
oups, sorry :P

Posted: Mon Feb 23, 2009 9:47 pm
by srod
Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.

Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :

Code: Select all

l$ = "Some|line|separated|by|pipes" 
*ptr.CHARACTER
*ptr=@l$ 
While *ptr\c
  If (*ptr\c) = $7C 
    Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1)) 
    EndIf 
  *ptr+SizeOf(CHARACTER) 
Wend

Posted: Mon Feb 23, 2009 11:36 pm
by tbohon
Clever, srod ... thanks for sharing.

Tom

Posted: Tue Feb 24, 2009 12:06 am
by Little John
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again?
I've asked this myself, too. Well, I hope that each call starts where the previous call has finished.
I think we could give an answer by doing some speed tests, but so far I didn't do so ...

Regards, Lazy John

Posted: Tue Feb 24, 2009 12:49 am
by pdwyer
This is one of the reasons I created a split function. I was using stringfield to go through very long strings and it wasn't pretty. likewise rewriting the stringfield to handle more than one char but having the search using the same logic and findstring wasn't much better.

In the end I wrote a split() function that populated an array that you pass it (it's here in the tips somewhere) and on the text I was working with (1mb+) it was thousands of times faster. On small strings though, parsestring() works just fine and is simpler though.

As per the story from JoelOnSoftware, "Shlemiel the painter" :lol:
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"

"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"


It's an interesting article http://www.joelonsoftware.com/articles/ ... 00319.html

Posted: Sun Mar 01, 2009 10:04 am
by Little John
:D
Very interesting indeed, thank you!

I found Schlemiel the painter's Algorithm also on Wikipedia:
Coined in 2001 [by Joel Spolsky], the term has since becoming part of the vernacular
Regards, Little John

Posted: Sun Mar 01, 2009 12:39 pm
by Trond
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.
No, because Mid() also starts from the beginning each time.

Posted: Sun Mar 01, 2009 2:36 pm
by ricardo
srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.

Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :

Code: Select all

l$ = "Some|line|separated|by|pipes" 
*ptr.CHARACTER
*ptr=@l$ 
While *ptr\c
  If (*ptr\c) = $7C 
    Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1)) 
    EndIf 
  *ptr+SizeOf(CHARACTER) 
Wend
StringField allows me to go directly to the one i need like:

l$ = "Some|line|separated|by|pipes"

Debug StringField(l$,3,"|")

How to achive this by this way?

Posted: Sun Mar 01, 2009 6:29 pm
by rsts
Guess you could add a matches counter and check it.

cheers

Posted: Sun Mar 01, 2009 9:35 pm
by ricardo
rsts wrote:Guess you could add a matches counter and check it.

cheers
Yes, you are right, I dont realise (i dont even think about it) that StringField dont go straight foward to the match.
Sometimes im a little naive :P

Posted: Sun Mar 01, 2009 10:25 pm
by Hroudtwolf
Not fast but regex XD

Code: Select all

Define.s sSource     = "Some|line|separated|by|pipes" 
Define.i nMatches
Define   *RegEx

Dim sMatch.s ( 0 )

*RegEx = CreateRegularExpression ( #PB_Any , "[a-zA-Z_0-9,\.;\\\/\+\*\-\#\~´`'<>\&\%\§!\(\)=\}\{\^]+" )
If Not *RegEx
   Debug "Error."
   End
EndIf 

nMatches = ExtractRegularExpression( *RegEx , sSource , sMatch () )
If Not nMatches
  Debug "Error."
  FreeRegularExpression ( *RegEx )
  End
EndIf
 
For nI = 1 To nMatches
   Debug sMatch ( nI - 1 )
Next nI
 
FreeRegularExpression ( *RegEx )
Regards

Wolf