Page 1 of 1

How to quickly decompose big string? StringField too slow...

Posted: Sun Feb 19, 2017 5:56 pm
by firace
The first part of the code is just to populate fulltext$ with some pseudo data. (Normally I get fulltext$ from user input in an EditorGadget).

How can I speed up the StringField line? Any optimizations or alternatives?

Code: Select all


Procedure.s MD5(r$)  : 
  r$ = MD5Fingerprint(@r$, StringByteLength(r$)) 
  ProcedureReturn r$  
EndProcedure   


g$ = "abcd"

For t = 1 To 3000
  fulltext$ + Left(md5(g$) , Random(21)+3 ) + #CRLF$
  g$ = md5(g$)
Next
For u = 1 To 8
  
  fulltext$ + fulltext$
Next

; fulltext$ = getgadgettext(5)

Debug "t0 --> " + ElapsedMilliseconds()

Repeat 
  lineindex + 1
  line$ =  StringField(fulltext$,lineindex,#CRLF$)  ;;;;   <- too slow - how to speed up?
  ;  i then need to do some processing for each line$
 Until lineindex = 24000

Debug "t1 --> " + ElapsedMilliseconds()


Re: How to quickly decompose big string? StringField too slo

Posted: Sun Feb 19, 2017 6:02 pm
by infratec
Hi,

asked many times before...

The problem is that each time the whole string has to be searched from start again.
To avoid this you have to use pointers and walk continiously through the string.

There are already examples here in the forum.

As one example:
http://www.purebasic.fr/english/viewtop ... 12&t=61796

Bernd

Re: How to quickly decompose big string? StringField too slo

Posted: Sun Feb 19, 2017 6:15 pm
by firace
Thanks a lot infratec. I will study this.

Re: How to quickly decompose big string? StringField too slo

Posted: Sun Feb 19, 2017 7:06 pm
by infratec
Hi,

I just wrote a solution for more then one character separator,
but I did no time check.

Code: Select all

Procedure.s GetNextStringField(*String, Separator$="", Init.i=#False)
  
  Static *LastPosition, *EndPosition, Dim Separator.c(255), SeparatorLength.i
  Protected *CurrentPosition, State.i, i.i, Result$
  
  If Init
    SeparatorLength = Len(Separator$)
    *LastPosition = *String
    For i = 0 To SeparatorLength - 1
      Separator(i) = PeekC(@Separator$ + i * SizeOf(Character))
    Next i
  EndIf
  
  Dim Char.c(255)
  
  *CurrentPosition = *LastPosition
  Repeat
    For i = 0 To SeparatorLength - 1
      Char(i) = PeekC(*CurrentPosition + i * SizeOf(Character))
      If Char(i) <> Separator(i)
        i = 0
        Break
      EndIf
    Next i
    *CurrentPosition + (i + 1) * SizeOf(Character)
  Until i = SeparatorLength Or PeekC(*CurrentPosition) = 0
  
  If i = SeparatorLength
    Result$ = PeekS(*LastPosition, (*CurrentPosition - *LastPosition - (SeparatorLength * SizeOf(Character))) / SizeOf(Character))
  Else
    Result$ = #EOT$
  EndIf
  
  *LastPosition = *CurrentPosition
  
  ProcedureReturn Result$
  
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
  For u = 1 To 8
    fulltext$ + "abcdefghijklm" + Chr(Random(70, 65)) + #CRLF$
  Next
  
  
  Debug fulltext$
  Debug "-------"
  
  Debug "First: " + GetNextStringField(@fulltext$, #CRLF$, #True)
  Repeat
    Result$ = GetNextStringField(@fulltext$)
    If Result$ <> #EOT$
      Debug "Next: " + Result$
    EndIf
  Until Result$ = #EOT$
CompilerEndIf
Bernd

Re: How to quickly decompose big string? StringField too slo

Posted: Sun Feb 19, 2017 8:11 pm
by Michael Vogel
Here's a fast code when you need to get all fields from the beginning...

Code: Select all

DisableDebugger

Global i,n,z
Global s.s,t.s

For n=0 To 999
	s+"This is the string at position "+Str(n)+"."
Next n

Procedure StringFieldInitialize(*string.string,char)

	Global *StringFieldMem.Character
	Global StringFieldChar

	*StringFieldMem=*string
	StringFieldChar=char

EndProcedure
Procedure.s StringFieldNext()

	Protected *StringFieldStart

	*StringFieldStart=*StringFieldMem
	While *StringFieldMem\c<>StringFieldChar And *StringFieldMem\c
		*StringFieldMem+SizeOf(Character)
	Wend

	If *StringFieldMem\c
		*StringFieldMem+SizeOf(Character)
	EndIf

	ProcedureReturn PeekS(*StringFieldStart,(*StringFieldMem-*StringFieldStart)/SizeOf(Character)-1)

EndProcedure

z-ElapsedMilliseconds()
For i=0 To 99
	StringFieldInitialize(@s,'.')
	For n=0 To 999
		t=StringFieldNext()
	Next n
Next i
z+ElapsedMilliseconds()
EnableDebugger

MessageRequester("Stringfield","Time: "+Str(z)+"ms"+#CR$+"String: "+t)

Re: How to quickly decompose big string? StringField too slo

Posted: Sun Feb 19, 2017 9:02 pm
by infratec
Hi Michael,

he searches for #CRLF$ which are 2 characters and not one.
I overseen this at first too.

But he can use trim as a workaround.

Bernd

Re: How to quickly decompose big string? StringField too slo

Posted: Mon Feb 20, 2017 12:41 am
by firace
Thanks for the tips, much appreciated.

In the meantime I have been experimenting with ideas described in that thread:
http://forums.purebasic.com/english/vie ... 12&t=21495

Re: How to quickly decompose big string? StringField too slo

Posted: Mon Feb 20, 2017 2:01 am
by Zebuddi123
Hi an alternative using regular expression drops all the searches into an a array ready to parse took on a i3 350m cpu takes about 150 ms for 800k string 25000 | as delimiter

zebuddi.

Code: Select all


Procedure.s RegexFieldSeperator(sStringToSearch.s, sSeperator.s, Array sLine$(1))
	Protected  iAsciiNbr.i  = Asc(Right(sSeperator,1))
	; 	{--  add \ escape character
	Select iAsciiNbr
		Case  33, 34, 36 To 38, 40 To 43, 46, 47, 58, 60, 62, 63,94, 123 To 126
			sSeperator = Left(sSeperator, Len(sSeperator)-1) + Chr(92) + Chr(iAsciiNbr)
	EndSelect
; 	}
	pattern$ = ".+?" + sSeperator
	Protected iRegexAny . i = CreateRegularExpression(#PB_Any, pattern$,  #PB_RegularExpression_AnyNewLine|#PB_RegularExpression_MultiLine), sReturnString.s, iNbr.i

	If MatchRegularExpression(iRegexAny, sStringToSearch)
		iNbr = ExtractRegularExpression(iRegexAny, sStringToSearch, sLine$())
		FreeRegularExpression(iRegexAny)
		ProcedureReturn Str(iNbr)
		Else
		ProcedureReturn  "No Matches Found"
	EndIf
EndProcedure

For i=1 To 25000  ; make a big string
	For f  = 1 To 25 
		q.s + Chr(Random(122, 97))
	Next	
	fulltext$ + Str(i) + ". " + q + "|"
	q = ""
Next	

If CreateFile(0, GetTemporaryDirectory()+ " rubbish.txt") ; around 800k
	WriteString(0, fulltext$)
	CloseFile(0)
EndIf	

Debug " Started"

Dim line$(0)
s = ElapsedMilliseconds()

Debug RegexFieldSeperator(fulltext$, "|", Line$()) ; search  for all |  and drop in an array

ms = ElapsedMilliseconds()-s
If Ms > 1000
	Debug " took: " + FormatNumber(ms/1000) + " secs"
Else
	Debug " took: " + FormatNumber(ms) + " ms"
EndIf

CallDebugger

Re: How to quickly decompose big string? StringField too slo

Posted: Mon Feb 20, 2017 6:26 am
by said
Hi,

You can also have a look at MySplitString() in that thread:
http://www.purebasic.fr/english/viewtop ... plitstring

Said