Parsing a Line

Share your advanced PureBasic knowledge/code with the community.
tbohon
User
User
Posts: 42
Joined: Sat Nov 22, 2008 4:22 am
Location: Olympia, WA USA

Parsing a Line

Post by tbohon »

I do a lot of work with files where each line contains fields delimited by a pipe character ( | ). In self-defense, I wrote a quick routine to break these lines up into an array so that I can work with each value and I thought the routine might be useful for others.

Here's the code:

Code: Select all

l$ = "Some|line|separated|by|pipes"

Dim ary$(5)    ; set the dimension of the array for the # of fields in the line

For i=1 To Len(l$)
  If Mid(l$,i,1) <> "|"
    s$ = s$ + Mid(l$,i,1)
  Else
    ary$(ndx) = s$
    ndx = ndx + 1
    s$ = ""
  EndIf
Next i

ary$(ndx) = s$
Note that I've used this same algorithm in a variety of languages and it's easily transportable.

Enjoy - hope it helps someone.

Best,

Tom
User avatar
Comtois
Addict
Addict
Posts: 1431
Joined: Tue Aug 19, 2003 11:36 am
Location: Doubs - France

Post by Comtois »

or you can use StringField() :)

Code: Select all

l$ = "Some|line|separated|by|pipes"

Nb = CountString(l$,"|")
Dim ary$(Nb)    ; set the dimension of the array for the # of fields in the line

For i=0 To Nb
 ary$(i)=StringField(l$,i+1,"|")
 Debug ary$(i)
Next  
Please correct my english
http://purebasic.developpez.com/
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Post by rsts »

or perhaps

Code: Select all

l$ = "Some|line|separated|by|pipes" 
Structure Chartype   
  c.b     
EndStructure
*ptr.CharType
*ptr=@l$
While *ptr < @l$+len(l$)    
  If (*ptr\c) = $7C
    debug "pipe@ " + str(*ptr-@l$)
    endif
  *ptr+1 
wend
cheers
tbohon
User
User
Posts: 42
Joined: Sat Nov 22, 2008 4:22 am
Location: Olympia, WA USA

Post by tbohon »

Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.

rsts - pointers??? Yuch!!! :) Seriously, good point ... most folks find pointers to be confusing and, after 13 years of teaching programming in various languages at the college level, I tend to think simply as well.

Appreciate the comments!

Best,

Tom
User avatar
Comtois
Addict
Addict
Posts: 1431
Joined: Tue Aug 19, 2003 11:36 am
Location: Doubs - France

Post by Comtois »

tbohon wrote:Comtois, that's obviously the more efficient way for PureBasic ... I was trying to keep it 'international' in terms of language.

Tom
oups, sorry :P
Please correct my english
http://purebasic.developpez.com/
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.

Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :

Code: Select all

l$ = "Some|line|separated|by|pipes" 
*ptr.CHARACTER
*ptr=@l$ 
While *ptr\c
  If (*ptr\c) = $7C 
    Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1)) 
    EndIf 
  *ptr+SizeOf(CHARACTER) 
Wend
I may look like a mule, but I'm not a complete ass.
tbohon
User
User
Posts: 42
Joined: Sat Nov 22, 2008 4:22 am
Location: Olympia, WA USA

Post by tbohon »

Clever, srod ... thanks for sharing.

Tom
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again?
I've asked this myself, too. Well, I hope that each call starts where the previous call has finished.
I think we could give an answer by doing some speed tests, but so far I didn't do so ...

Regards, Lazy John
User avatar
pdwyer
Addict
Addict
Posts: 2813
Joined: Tue May 08, 2007 1:27 pm
Location: Chiba, Japan

Post by pdwyer »

This is one of the reasons I created a split function. I was using stringfield to go through very long strings and it wasn't pretty. likewise rewriting the stringfield to handle more than one char but having the search using the same logic and findstring wasn't much better.

In the end I wrote a split() function that populated an array that you pass it (it's here in the tips somewhere) and on the text I was working with (1mb+) it was thousands of times faster. On small strings though, parsestring() works just fine and is simpler though.

As per the story from JoelOnSoftware, "Shlemiel the painter" :lol:
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.

The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.

The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"

"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"


It's an interesting article http://www.joelonsoftware.com/articles/ ... 00319.html
Paul Dwyer

“In nature, it’s not the strongest nor the most intelligent who survives. It’s the most adaptable to change” - Charles Darwin
“If you can't explain it to a six-year old you really don't understand it yourself.” - Albert Einstein
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Post by Little John »

:D
Very interesting indeed, thank you!

I found Schlemiel the painter's Algorithm also on Wikipedia:
Coined in 2001 [by Joel Spolsky], the term has since becoming part of the vernacular
Regards, Little John
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.
No, because Mid() also starts from the beginning each time.
ricardo
Addict
Addict
Posts: 2438
Joined: Fri Apr 25, 2003 7:06 pm
Location: Argentina

Post by ricardo »

srod wrote:Has the StringField() function been optimised in any way, or does each call start at the beginning of the string again? If there has been no optimising then of course using StringField() will be the slowest of the options thus far given.

Of course using pointers is always goingt to be quickest. Here's rsts' code refined just a wee bit. It will also run in Ascii and Unicode :

Code: Select all

l$ = "Some|line|separated|by|pipes" 
*ptr.CHARACTER
*ptr=@l$ 
While *ptr\c
  If (*ptr\c) = $7C 
    Debug "pipe@ " + Str((*ptr-@l$)>>(SizeOf(CHARACTER)-1)) 
    EndIf 
  *ptr+SizeOf(CHARACTER) 
Wend
StringField allows me to go directly to the one i need like:

l$ = "Some|line|separated|by|pipes"

Debug StringField(l$,3,"|")

How to achive this by this way?
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Post by rsts »

Guess you could add a matches counter and check it.

cheers
ricardo
Addict
Addict
Posts: 2438
Joined: Fri Apr 25, 2003 7:06 pm
Location: Argentina

Post by ricardo »

rsts wrote:Guess you could add a matches counter and check it.

cheers
Yes, you are right, I dont realise (i dont even think about it) that StringField dont go straight foward to the match.
Sometimes im a little naive :P
User avatar
Hroudtwolf
Addict
Addict
Posts: 803
Joined: Sat Feb 12, 2005 3:35 am
Location: Germany(Hessen)
Contact:

Post by Hroudtwolf »

Not fast but regex XD

Code: Select all

Define.s sSource     = "Some|line|separated|by|pipes" 
Define.i nMatches
Define   *RegEx

Dim sMatch.s ( 0 )

*RegEx = CreateRegularExpression ( #PB_Any , "[a-zA-Z_0-9,\.;\\\/\+\*\-\#\~´`'<>\&\%\§!\(\)=\}\{\^]+" )
If Not *RegEx
   Debug "Error."
   End
EndIf 

nMatches = ExtractRegularExpression( *RegEx , sSource , sMatch () )
If Not nMatches
  Debug "Error."
  FreeRegularExpression ( *RegEx )
  End
EndIf
 
For nI = 1 To nMatches
   Debug sMatch ( nI - 1 )
Next nI
 
FreeRegularExpression ( *RegEx )
Regards

Wolf
Post Reply