Page 1 of 1
How to quickly decompose big string? StringField too slow...
Posted: Sun Feb 19, 2017 5:56 pm
by firace
The first part of the code is just to populate fulltext$ with some pseudo data. (Normally I get fulltext$ from user input in an EditorGadget).
How can I speed up the StringField line? Any optimizations or alternatives?
Code: Select all
Procedure.s MD5(r$) :
r$ = MD5Fingerprint(@r$, StringByteLength(r$))
ProcedureReturn r$
EndProcedure
g$ = "abcd"
For t = 1 To 3000
fulltext$ + Left(md5(g$) , Random(21)+3 ) + #CRLF$
g$ = md5(g$)
Next
For u = 1 To 8
fulltext$ + fulltext$
Next
; fulltext$ = getgadgettext(5)
Debug "t0 --> " + ElapsedMilliseconds()
Repeat
lineindex + 1
line$ = StringField(fulltext$,lineindex,#CRLF$) ;;;; <- too slow - how to speed up?
; i then need to do some processing for each line$
Until lineindex = 24000
Debug "t1 --> " + ElapsedMilliseconds()
Re: How to quickly decompose big string? StringField too slo
Posted: Sun Feb 19, 2017 6:02 pm
by infratec
Hi,
asked many times before...
The problem is that each time the whole string has to be searched from start again.
To avoid this you have to use pointers and walk continiously through the string.
There are already examples here in the forum.
As one example:
http://www.purebasic.fr/english/viewtop ... 12&t=61796
Bernd
Re: How to quickly decompose big string? StringField too slo
Posted: Sun Feb 19, 2017 6:15 pm
by firace
Thanks a lot infratec. I will study this.
Re: How to quickly decompose big string? StringField too slo
Posted: Sun Feb 19, 2017 7:06 pm
by infratec
Hi,
I just wrote a solution for more then one character separator,
but I did no time check.
Code: Select all
Procedure.s GetNextStringField(*String, Separator$="", Init.i=#False)
Static *LastPosition, *EndPosition, Dim Separator.c(255), SeparatorLength.i
Protected *CurrentPosition, State.i, i.i, Result$
If Init
SeparatorLength = Len(Separator$)
*LastPosition = *String
For i = 0 To SeparatorLength - 1
Separator(i) = PeekC(@Separator$ + i * SizeOf(Character))
Next i
EndIf
Dim Char.c(255)
*CurrentPosition = *LastPosition
Repeat
For i = 0 To SeparatorLength - 1
Char(i) = PeekC(*CurrentPosition + i * SizeOf(Character))
If Char(i) <> Separator(i)
i = 0
Break
EndIf
Next i
*CurrentPosition + (i + 1) * SizeOf(Character)
Until i = SeparatorLength Or PeekC(*CurrentPosition) = 0
If i = SeparatorLength
Result$ = PeekS(*LastPosition, (*CurrentPosition - *LastPosition - (SeparatorLength * SizeOf(Character))) / SizeOf(Character))
Else
Result$ = #EOT$
EndIf
*LastPosition = *CurrentPosition
ProcedureReturn Result$
EndProcedure
CompilerIf #PB_Compiler_IsMainFile
For u = 1 To 8
fulltext$ + "abcdefghijklm" + Chr(Random(70, 65)) + #CRLF$
Next
Debug fulltext$
Debug "-------"
Debug "First: " + GetNextStringField(@fulltext$, #CRLF$, #True)
Repeat
Result$ = GetNextStringField(@fulltext$)
If Result$ <> #EOT$
Debug "Next: " + Result$
EndIf
Until Result$ = #EOT$
CompilerEndIf
Bernd
Re: How to quickly decompose big string? StringField too slo
Posted: Sun Feb 19, 2017 8:11 pm
by Michael Vogel
Here's a fast code when you need to get all fields from the beginning...
Code: Select all
DisableDebugger
Global i,n,z
Global s.s,t.s
For n=0 To 999
s+"This is the string at position "+Str(n)+"."
Next n
Procedure StringFieldInitialize(*string.string,char)
Global *StringFieldMem.Character
Global StringFieldChar
*StringFieldMem=*string
StringFieldChar=char
EndProcedure
Procedure.s StringFieldNext()
Protected *StringFieldStart
*StringFieldStart=*StringFieldMem
While *StringFieldMem\c<>StringFieldChar And *StringFieldMem\c
*StringFieldMem+SizeOf(Character)
Wend
If *StringFieldMem\c
*StringFieldMem+SizeOf(Character)
EndIf
ProcedureReturn PeekS(*StringFieldStart,(*StringFieldMem-*StringFieldStart)/SizeOf(Character)-1)
EndProcedure
z-ElapsedMilliseconds()
For i=0 To 99
StringFieldInitialize(@s,'.')
For n=0 To 999
t=StringFieldNext()
Next n
Next i
z+ElapsedMilliseconds()
EnableDebugger
MessageRequester("Stringfield","Time: "+Str(z)+"ms"+#CR$+"String: "+t)
Re: How to quickly decompose big string? StringField too slo
Posted: Sun Feb 19, 2017 9:02 pm
by infratec
Hi Michael,
he searches for #CRLF$ which are 2 characters and not one.
I overseen this at first too.
But he can use trim as a workaround.
Bernd
Re: How to quickly decompose big string? StringField too slo
Posted: Mon Feb 20, 2017 12:41 am
by firace
Thanks for the tips, much appreciated.
In the meantime I have been experimenting with ideas described in that thread:
http://forums.purebasic.com/english/vie ... 12&t=21495
Re: How to quickly decompose big string? StringField too slo
Posted: Mon Feb 20, 2017 2:01 am
by Zebuddi123
Hi an alternative using regular expression drops all the searches into an a array ready to parse took on a i3 350m cpu takes about 150 ms for 800k string 25000 | as delimiter
zebuddi.
Code: Select all
Procedure.s RegexFieldSeperator(sStringToSearch.s, sSeperator.s, Array sLine$(1))
Protected iAsciiNbr.i = Asc(Right(sSeperator,1))
; {-- add \ escape character
Select iAsciiNbr
Case 33, 34, 36 To 38, 40 To 43, 46, 47, 58, 60, 62, 63,94, 123 To 126
sSeperator = Left(sSeperator, Len(sSeperator)-1) + Chr(92) + Chr(iAsciiNbr)
EndSelect
; }
pattern$ = ".+?" + sSeperator
Protected iRegexAny . i = CreateRegularExpression(#PB_Any, pattern$, #PB_RegularExpression_AnyNewLine|#PB_RegularExpression_MultiLine), sReturnString.s, iNbr.i
If MatchRegularExpression(iRegexAny, sStringToSearch)
iNbr = ExtractRegularExpression(iRegexAny, sStringToSearch, sLine$())
FreeRegularExpression(iRegexAny)
ProcedureReturn Str(iNbr)
Else
ProcedureReturn "No Matches Found"
EndIf
EndProcedure
For i=1 To 25000 ; make a big string
For f = 1 To 25
q.s + Chr(Random(122, 97))
Next
fulltext$ + Str(i) + ". " + q + "|"
q = ""
Next
If CreateFile(0, GetTemporaryDirectory()+ " rubbish.txt") ; around 800k
WriteString(0, fulltext$)
CloseFile(0)
EndIf
Debug " Started"
Dim line$(0)
s = ElapsedMilliseconds()
Debug RegexFieldSeperator(fulltext$, "|", Line$()) ; search for all | and drop in an array
ms = ElapsedMilliseconds()-s
If Ms > 1000
Debug " took: " + FormatNumber(ms/1000) + " secs"
Else
Debug " took: " + FormatNumber(ms) + " ms"
EndIf
CallDebugger
Re: How to quickly decompose big string? StringField too slo
Posted: Mon Feb 20, 2017 6:26 am
by said
Hi,
You can also have a look at MySplitString() in that thread:
http://www.purebasic.fr/english/viewtop ... plitstring
Said