Page 7 of 8
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 10:33 am
by Fred
srod wrote:Freebasic passes strings by reference so only passes the address to a function as opposed to PB in which the function itself makes a copy of the string which has been passed. This will inevitably slow things down.
This true for Procedure, but it's not for built-in functions, so there is no penalities here.
About the performance, some time is lost to StrLen() computation, which is needed to ensure the startposition is not above the end of string (it's done even with a 0 start position which is dumb, so I changed that and it resulted to a 60% gain). About the comparison to other languages, some use 'tricks' to perform well in 'real wold' scenario. It's the Python case, which looks at the end of the string first (which is exactly your case). It's indeed a special case. PB uses the 'strstr' C function which always starts to the start of the string, so your benchmark is basically the worste ever for PB, as the result is at the very end. Try to find a a string which is not found in other language, I'm curious to see the results:
Code: Select all
s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
x = FindString(s, "x")
Next
MessageRequester("", "" + Str(ElapsedMilliseconds() - t))
t = ElapsedMilliseconds()
For i = 1 To 1000
x = FindString(s, "x", 0, #PB_String_NoCase)
Next
MessageRequester("", "" + Str(ElapsedMilliseconds() - t))
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 10:50 am
by ccode
That is interesting.
"And it hits the nail on the head."
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 12:20 pm
by Mijikai
Fred wrote:About the performance, some time is lost to StrLen() computation...
String length could be determined on the fly unless is its needed for conversion (NoCase)
or if the function would offer the option to extract a part between two signatures.
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 12:51 pm
by wilbert
Fred wrote:This true for Procedure, but it's not for built-in functions, so there is no penalities here.
It would be a great addition if this would also be possible for Procedure with an additional keyword to specify the string should not be copied.
Fred wrote:PB uses the 'strstr' C function
Just out of curiosity ...
How does PB handle unicode ?
As far as I understand strstr is for Ascii strings and wcsstr is only working on Windows.
On MacOS and Linux wide char seems to be 4 bytes per character instead of 2.

Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 1:52 pm
by ccode
wilbert wrote:Just out of curiosity ...
How does PB handle unicode ?
As far as I understand strstr is for Ascii strings and wcsstr is only working on Windows.
On MacOS and Linux wide char seems to be 4 bytes per character instead of 2.

"strstr" work with unicode (UTF8)
Code: Select all
EnableExplicit
Global *s, *s2, l, x, t, i, p
CompilerIf #PB_Compiler_OS = #PB_OS_Windows
PrototypeC pstrstr(a, b)
OpenLibrary(0, "ucrtbase.dll")
Global strstr.pstrstr = GetFunction(0, "strstr")
CompilerElseIf #PB_Compiler_OS = #PB_OS_Linux
Macro strstr(s, s2)
strstr_(s, s2)
EndMacro
CompilerEndIf
OpenConsole("FS-Unicode-Test")
t = ElapsedMilliseconds()
*s = AllocateMemory(Len("Find the Ω Char.")*2)
*s2 = AllocateMemory(Len("Ω")*2)
PokeS(*s, "Find the Ω Char.", #PB_Any, #PB_UTF8) ;Unicode (UTF-8) work, with strstr !!! (It must not contain a null character!)
PokeS(*s2, "Ω", #PB_Any, #PB_UTF8)
For i = 1 To 1000
;x = FindString("Find the Ω Char.", "Ω")
x = strstr(*s, *s2)
p = (x-*s)+1
Next
PrintN(Str(ElapsedMilliseconds() - t)+" ms")
;PrintN(Str(x))
PrintN("Char: "+PeekS(x, 1, #PB_UTF8))
PrintN("Pos: "+Str(p))
FreeMemory(*s) : FreeMemory(*s2)
Input()
CloseConsole()
It needs some adjustment.
If you set all characters to 2/4 / ... Bytes (Unicode), there are no problems with the 2 bytes of "Ω". (Easier to convert)
A search for "C" would otherwise spend 13 instead of 12. (As in my example)
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 3:53 pm
by Sicro
wilbert wrote:Fred wrote:This true for Procedure, but it's not for built-in functions, so there is no penalities here.
It would be a great addition if this would also be possible for Procedure with an additional keyword to specify the string should not be copied.
You can avoid copying with pointer variables:
Code: Select all
Procedure Test(*string.Character)
While *string\c <> 0
Debug Chr(*string\c)
*string + SizeOf(Character)
Wend
EndProcedure
Define myString$ = "This is a test string"
Test(@myString$)
Test(@"Hello world")
Unfortunately this doesn't work (invalid memory access):
Code: Select all
Procedure Test(*string.String)
Debug *string\s
EndProcedure
Define myString$ = "This is a test string"
Test(@myString$)
Test(@"Hello world")
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 3:54 pm
by Mijikai
That UTF8 stuff...
Code: Select all
EnableExplicit
Import "ucrtd.lib";Windows SDK
strstr.i(*StrA,*StrB)
EndImport
Global TestString.s
Global Signature.s
Global Timer.q
Global Index.i
Global Position.i
TestString = Space(1000000) + "!"
Signature = "x"
Procedure.i cstrstr(Input.s,Signatur.s)
Protected *Offset
Protected *U1 = UTF8(Input)
Protected *U2 = UTF8(Signatur)
*Offset = strstr(*U1,*U2)
If *Offset
*Offset - *U1 + 1
EndIf
FreeMemory(*U1)
FreeMemory(*U2)
ProcedureReturn *Offset
EndProcedure
Timer = ElapsedMilliseconds()
For Index = 1 To 1000
Position = cstrstr(TestString,Signature)
Next
Timer = ElapsedMilliseconds() - Timer
MessageRequester("Performance","Time: " + Str(Timer) + #CR$ + "Position: " + Str(Position))
End
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 3:59 pm
by Mijikai
Sicro wrote:Unfortunately this doesn't work (invalid memory access):
Code: Select all
Procedure Test(*string.String)
Debug *string\s
EndProcedure
Define myString$ = "This is a test string"
Test(@myString$)
Test(@"Hello world")
You need one more step:
Code: Select all
EnableExplicit
Global TestString.s
Procedure.i Test(*StrPtr)
Protected *Str.String = @*StrPtr
Debug *Str\s
EndProcedure
TestString = "This is a test string"
Test(@TestString)
Test(@"Hello world")
End
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 4:54 pm
by fryquez
@Mijikai, you ca not just link to ucrt with Purebasic.
Linker search and found the symbol in msvcrt.lib before ucrtd.lib
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 5:09 pm
by Mijikai
fryquez wrote:@Mijikai, you ca not just link to ucrt with Purebasic.
Linker search and found the symbol in msvcrt.lib before ucrtd.lib
Any way to remove the other import?
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 5:13 pm
by skywalk
@Mijikai - Big time lost if you are creating memory within your strstr() function. Just use wcsstr().
@ccode - strstr() searches bytes. No respect for multibyte characters. And your speed test includes the build of the search string?
Excited to see the next beta and FindString() speed!
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 5:22 pm
by Mijikai
skywalk wrote:@Mijikai - Big time lost if you are creating memory within your strstr() function. Just use wcsstr().
@ccode - strstr() searches bytes. No respect for multibyte characters. And your speed test includes the build of the search string?
Excited to see the next beta and FindString() speed!
Yep im also looking forward to a possible change in FindString()
Just wanted to show a different way to use strstr() but my code seems to not correctly resolve imports.
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 5:53 pm
by ccode
The search for a string can be optimized individually for the search string. (If known)
You can replace all Unicode characters with non-contained ASCII characters and then search only with ASCII characters.
But I stay with FindString(), without parsing the string first.
It's really fast enough for me.
Re: Why FindString() is so terribly slow?
Posted: Sun Dec 02, 2018 6:39 pm
by Sicro
@Mijikai:
Thank you for reminding me that a string variable needs a pointer to the address of the string.
The address of the string can change if the memory manager needs to find a new position in RAM for the string.
Re: Why FindString() is so terribly slow?
Posted: Mon Dec 17, 2018 8:22 pm
by mestnyi
Maybe there was no need to write here, but I think it is somehow connected.
Can you help me speed up the process?
On macos I could not adapt your options.
Code: Select all
EnableExplicit
Global Text.s, String.s, a,f,f1,l,time.i, c.s = #LF$, LN.i=50000
Global NewList String.s()
Global *Buffer = AllocateMemory(LN*100)
Global *Pointer = *Buffer
time = ElapsedMilliseconds()
For a = 0 To LN
Text.s = "Item "+Str(a) + c
CopyMemoryString(PeekS(@Text), @*Pointer) ; 3
Next
Text = PeekS(*Buffer)
Text.s = Trim(Text.s, c)
Debug Str(ElapsedMilliseconds()-time) + " text collection time"
time = ElapsedMilliseconds()
For a = 0 To LN
; f=f1 : f1 = FindString(Text.s, c, f1+1) : l=f1-f
;
; If a = LN
; String.s = Trim(Mid(Text.s, f1-l), c)
; Else
; String.s = Mid(Text.s, f1-l+1, l)
; EndIf
String.s = StringField(Text.s, a+1, c) + c
AddElement(String())
String() = String.s
Next
Debug Str(ElapsedMilliseconds()-time) + " text parse time "
Debug "all count "+ListSize(String())