How to quickly decompose big string? StringField too slow...

Just starting out? Need help? Post your questions and find answers here.
firace
Addict
Addict
Posts: 946
Joined: Wed Nov 09, 2011 8:58 am

How to quickly decompose big string? StringField too slow...

Post by firace »

The first part of the code is just to populate fulltext$ with some pseudo data. (Normally I get fulltext$ from user input in an EditorGadget).

How can I speed up the StringField line? Any optimizations or alternatives?

Code: Select all


Procedure.s MD5(r$)  : 
  r$ = MD5Fingerprint(@r$, StringByteLength(r$)) 
  ProcedureReturn r$  
EndProcedure   


g$ = "abcd"

For t = 1 To 3000
  fulltext$ + Left(md5(g$) , Random(21)+3 ) + #CRLF$
  g$ = md5(g$)
Next
For u = 1 To 8
  
  fulltext$ + fulltext$
Next

; fulltext$ = getgadgettext(5)

Debug "t0 --> " + ElapsedMilliseconds()

Repeat 
  lineindex + 1
  line$ =  StringField(fulltext$,lineindex,#CRLF$)  ;;;;   <- too slow - how to speed up?
  ;  i then need to do some processing for each line$
 Until lineindex = 24000

Debug "t1 --> " + ElapsedMilliseconds()

infratec
Always Here
Always Here
Posts: 7599
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: How to quickly decompose big string? StringField too slo

Post by infratec »

Hi,

asked many times before...

The problem is that each time the whole string has to be searched from start again.
To avoid this you have to use pointers and walk continiously through the string.

There are already examples here in the forum.

As one example:
http://www.purebasic.fr/english/viewtop ... 12&t=61796

Bernd
firace
Addict
Addict
Posts: 946
Joined: Wed Nov 09, 2011 8:58 am

Re: How to quickly decompose big string? StringField too slo

Post by firace »

Thanks a lot infratec. I will study this.
infratec
Always Here
Always Here
Posts: 7599
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: How to quickly decompose big string? StringField too slo

Post by infratec »

Hi,

I just wrote a solution for more then one character separator,
but I did no time check.

Code: Select all

Procedure.s GetNextStringField(*String, Separator$="", Init.i=#False)
  
  Static *LastPosition, *EndPosition, Dim Separator.c(255), SeparatorLength.i
  Protected *CurrentPosition, State.i, i.i, Result$
  
  If Init
    SeparatorLength = Len(Separator$)
    *LastPosition = *String
    For i = 0 To SeparatorLength - 1
      Separator(i) = PeekC(@Separator$ + i * SizeOf(Character))
    Next i
  EndIf
  
  Dim Char.c(255)
  
  *CurrentPosition = *LastPosition
  Repeat
    For i = 0 To SeparatorLength - 1
      Char(i) = PeekC(*CurrentPosition + i * SizeOf(Character))
      If Char(i) <> Separator(i)
        i = 0
        Break
      EndIf
    Next i
    *CurrentPosition + (i + 1) * SizeOf(Character)
  Until i = SeparatorLength Or PeekC(*CurrentPosition) = 0
  
  If i = SeparatorLength
    Result$ = PeekS(*LastPosition, (*CurrentPosition - *LastPosition - (SeparatorLength * SizeOf(Character))) / SizeOf(Character))
  Else
    Result$ = #EOT$
  EndIf
  
  *LastPosition = *CurrentPosition
  
  ProcedureReturn Result$
  
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
  For u = 1 To 8
    fulltext$ + "abcdefghijklm" + Chr(Random(70, 65)) + #CRLF$
  Next
  
  
  Debug fulltext$
  Debug "-------"
  
  Debug "First: " + GetNextStringField(@fulltext$, #CRLF$, #True)
  Repeat
    Result$ = GetNextStringField(@fulltext$)
    If Result$ <> #EOT$
      Debug "Next: " + Result$
    EndIf
  Until Result$ = #EOT$
CompilerEndIf
Bernd
User avatar
Michael Vogel
Addict
Addict
Posts: 2799
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: How to quickly decompose big string? StringField too slo

Post by Michael Vogel »

Here's a fast code when you need to get all fields from the beginning...

Code: Select all

DisableDebugger

Global i,n,z
Global s.s,t.s

For n=0 To 999
	s+"This is the string at position "+Str(n)+"."
Next n

Procedure StringFieldInitialize(*string.string,char)

	Global *StringFieldMem.Character
	Global StringFieldChar

	*StringFieldMem=*string
	StringFieldChar=char

EndProcedure
Procedure.s StringFieldNext()

	Protected *StringFieldStart

	*StringFieldStart=*StringFieldMem
	While *StringFieldMem\c<>StringFieldChar And *StringFieldMem\c
		*StringFieldMem+SizeOf(Character)
	Wend

	If *StringFieldMem\c
		*StringFieldMem+SizeOf(Character)
	EndIf

	ProcedureReturn PeekS(*StringFieldStart,(*StringFieldMem-*StringFieldStart)/SizeOf(Character)-1)

EndProcedure

z-ElapsedMilliseconds()
For i=0 To 99
	StringFieldInitialize(@s,'.')
	For n=0 To 999
		t=StringFieldNext()
	Next n
Next i
z+ElapsedMilliseconds()
EnableDebugger

MessageRequester("Stringfield","Time: "+Str(z)+"ms"+#CR$+"String: "+t)
infratec
Always Here
Always Here
Posts: 7599
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: How to quickly decompose big string? StringField too slo

Post by infratec »

Hi Michael,

he searches for #CRLF$ which are 2 characters and not one.
I overseen this at first too.

But he can use trim as a workaround.

Bernd
firace
Addict
Addict
Posts: 946
Joined: Wed Nov 09, 2011 8:58 am

Re: How to quickly decompose big string? StringField too slo

Post by firace »

Thanks for the tips, much appreciated.

In the meantime I have been experimenting with ideas described in that thread:
http://forums.purebasic.com/english/vie ... 12&t=21495
User avatar
Zebuddi123
Enthusiast
Enthusiast
Posts: 796
Joined: Wed Feb 01, 2012 3:30 pm
Location: Nottinghamshire UK
Contact:

Re: How to quickly decompose big string? StringField too slo

Post by Zebuddi123 »

Hi an alternative using regular expression drops all the searches into an a array ready to parse took on a i3 350m cpu takes about 150 ms for 800k string 25000 | as delimiter

zebuddi.

Code: Select all


Procedure.s RegexFieldSeperator(sStringToSearch.s, sSeperator.s, Array sLine$(1))
	Protected  iAsciiNbr.i  = Asc(Right(sSeperator,1))
	; 	{--  add \ escape character
	Select iAsciiNbr
		Case  33, 34, 36 To 38, 40 To 43, 46, 47, 58, 60, 62, 63,94, 123 To 126
			sSeperator = Left(sSeperator, Len(sSeperator)-1) + Chr(92) + Chr(iAsciiNbr)
	EndSelect
; 	}
	pattern$ = ".+?" + sSeperator
	Protected iRegexAny . i = CreateRegularExpression(#PB_Any, pattern$,  #PB_RegularExpression_AnyNewLine|#PB_RegularExpression_MultiLine), sReturnString.s, iNbr.i

	If MatchRegularExpression(iRegexAny, sStringToSearch)
		iNbr = ExtractRegularExpression(iRegexAny, sStringToSearch, sLine$())
		FreeRegularExpression(iRegexAny)
		ProcedureReturn Str(iNbr)
		Else
		ProcedureReturn  "No Matches Found"
	EndIf
EndProcedure

For i=1 To 25000  ; make a big string
	For f  = 1 To 25 
		q.s + Chr(Random(122, 97))
	Next	
	fulltext$ + Str(i) + ". " + q + "|"
	q = ""
Next	

If CreateFile(0, GetTemporaryDirectory()+ " rubbish.txt") ; around 800k
	WriteString(0, fulltext$)
	CloseFile(0)
EndIf	

Debug " Started"

Dim line$(0)
s = ElapsedMilliseconds()

Debug RegexFieldSeperator(fulltext$, "|", Line$()) ; search  for all |  and drop in an array

ms = ElapsedMilliseconds()-s
If Ms > 1000
	Debug " took: " + FormatNumber(ms/1000) + " secs"
Else
	Debug " took: " + FormatNumber(ms) + " ms"
EndIf

CallDebugger
malleo, caput, bang. Ego, comprehendunt in tempore
said
Enthusiast
Enthusiast
Posts: 342
Joined: Thu Apr 14, 2011 6:07 pm

Re: How to quickly decompose big string? StringField too slo

Post by said »

Hi,

You can also have a look at MySplitString() in that thread:
http://www.purebasic.fr/english/viewtop ... plitstring

Said
Post Reply