Page 1 of 1

String tokenisation

Posted: Tue Feb 17, 2004 11:45 am
by Kris_a
Code updated for 5.20+ (same as StringField())

I ported this Blitz function (that I made a while ago) to PB. It splits a string into several pieces (seperated by a deliminator of your choice) then gets a particular one. Really useful for things like 'plain English' protocols (HTTP for example).

Pretty fast too (I hope). The test I included does 1000000 iterations in 2.1 seconds. Enjoy :D

Code: Select all

Procedure.s tok(txt.s, delim.s, tok)
  start = 1
  l = Len(delim)
  For a = 1 To tok
    If a > 1 
      start = found + l
    EndIf
    found = FindString(txt, delim, start)
    length = found - start
  Next
  ProcedureReturn Mid(txt, start, length)
EndProcedure

#NUMLOOPS = 1000000

st.s = ""

t1 = GetTickCount_()

For a = 1 To #NUMLOOPS
  st = tok("string tokeniser test", " ", 2)
Next

MessageRequester("Result", Str(#NUMLOOPS) + " in " + Str(GetTickCount_() - t1) + "ms", 0)
PS. This runs about 70% faster in PB than it does in BB : 8)

Re: String tokenisation

Posted: Tue Feb 17, 2004 1:01 pm
by PB
PureBasic has its own string parser, which gives faster results than yours. :)

Code: Select all

#NUMLOOPS = 1000000

st.s = ""

t1 = gettickcount_()

For a = 1 To #NUMLOOPS
  st = StringField("string tokeniser test",2," ")
Next

MessageRequester("Result",Str(#NUMLOOPS)+" in "+Str(gettickcount_()-t1)+"ms",0) 

Posted: Tue Feb 17, 2004 1:19 pm
by Kris_a
oh damn :/

that's what I get for not reading the docs

Yes but...

Posted: Tue Feb 17, 2004 6:56 pm
by Iria
The PB function does not handle spaces very well though, i.e. if space is a delimiter and you have double spaces between text guess what ... you get null items parsed, so be wary.

Also just noticed that the forum preview gives me this :)

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 69480 bytes) in /home/apache/p/ph/phpbb.myforums.net/includes/topic_review.php on line 95

WP the forum :)

Re: Yes but...

Posted: Wed Feb 18, 2004 1:20 am
by PB
> The PB function does not handle spaces very well though, i.e. if space is a
> delimiter and you have double spaces between text guess what ... you get
> null items parsed

That's to be expected, so it's not a bug or anything. The function simply
looks for every space and splits the string, so if you have double spaces
then naturally it'll split them up.