Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
Fred
Administrator
Administrator
Posts: 18274
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Why FindString() is so terribly slow?

Post by Fred »

srod wrote:Freebasic passes strings by reference so only passes the address to a function as opposed to PB in which the function itself makes a copy of the string which has been passed. This will inevitably slow things down.
This true for Procedure, but it's not for built-in functions, so there is no penalities here.

About the performance, some time is lost to StrLen() computation, which is needed to ensure the startposition is not above the end of string (it's done even with a 0 start position which is dumb, so I changed that and it resulted to a 60% gain). About the comparison to other languages, some use 'tricks' to perform well in 'real wold' scenario. It's the Python case, which looks at the end of the string first (which is exactly your case). It's indeed a special case. PB uses the 'strstr' C function which always starts to the start of the string, so your benchmark is basically the worste ever for PB, as the result is at the very end. Try to find a a string which is not found in other language, I'm curious to see the results:

Code: Select all

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "x")
Next
MessageRequester("", "" + Str(ElapsedMilliseconds() - t))

t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "x", 0, #PB_String_NoCase)
Next
MessageRequester("", "" + Str(ElapsedMilliseconds() - t))
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

That is interesting.

"And it hits the nail on the head."
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Fred wrote:About the performance, some time is lost to StrLen() computation...
String length could be determined on the fly unless is its needed for conversion (NoCase)
or if the function would offer the option to extract a part between two signatures.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Why FindString() is so terribly slow?

Post by wilbert »

Fred wrote:This true for Procedure, but it's not for built-in functions, so there is no penalities here.
It would be a great addition if this would also be possible for Procedure with an additional keyword to specify the string should not be copied.
Fred wrote:PB uses the 'strstr' C function
Just out of curiosity ...
How does PB handle unicode ?
As far as I understand strstr is for Ascii strings and wcsstr is only working on Windows.
On MacOS and Linux wide char seems to be 4 bytes per character instead of 2. :?
Windows (x64)
Raspberry Pi OS (Arm64)
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

wilbert wrote:Just out of curiosity ...
How does PB handle unicode ?
As far as I understand strstr is for Ascii strings and wcsstr is only working on Windows.
On MacOS and Linux wide char seems to be 4 bytes per character instead of 2. :?
"strstr" work with unicode (UTF8)

Code: Select all

EnableExplicit

Global *s, *s2, l, x, t, i, p

CompilerIf #PB_Compiler_OS = #PB_OS_Windows
  PrototypeC pstrstr(a, b)
  OpenLibrary(0, "ucrtbase.dll")
  Global strstr.pstrstr = GetFunction(0, "strstr")
CompilerElseIf #PB_Compiler_OS = #PB_OS_Linux
  Macro strstr(s, s2)
    strstr_(s, s2)
  EndMacro
CompilerEndIf

OpenConsole("FS-Unicode-Test")

t = ElapsedMilliseconds()

*s = AllocateMemory(Len("Find the Ω Char.")*2)
*s2 = AllocateMemory(Len("Ω")*2)
PokeS(*s, "Find the Ω Char.", #PB_Any, #PB_UTF8) ;Unicode (UTF-8) work, with strstr !!! (It must not contain a null character!)
PokeS(*s2, "Ω", #PB_Any, #PB_UTF8)

For i = 1 To 1000
  ;x = FindString("Find the Ω Char.", "Ω")
  x = strstr(*s, *s2)
  p = (x-*s)+1
Next

PrintN(Str(ElapsedMilliseconds() - t)+" ms")
;PrintN(Str(x))
PrintN("Char: "+PeekS(x, 1, #PB_UTF8))
PrintN("Pos: "+Str(p))

FreeMemory(*s) : FreeMemory(*s2)

Input()
CloseConsole()
It needs some adjustment.

If you set all characters to 2/4 / ... Bytes (Unicode), there are no problems with the 2 bytes of "Ω". (Easier to convert)
A search for "C" would otherwise spend 13 instead of 12. (As in my example)
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 561
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Why FindString() is so terribly slow?

Post by Sicro »

wilbert wrote:
Fred wrote:This true for Procedure, but it's not for built-in functions, so there is no penalities here.
It would be a great addition if this would also be possible for Procedure with an additional keyword to specify the string should not be copied.
You can avoid copying with pointer variables:

Code: Select all

Procedure Test(*string.Character)
  While *string\c <> 0
    Debug Chr(*string\c)
    *string + SizeOf(Character)
  Wend
EndProcedure

Define myString$ = "This is a test string"
Test(@myString$)

Test(@"Hello world")
Unfortunately this doesn't work (invalid memory access):

Code: Select all

Procedure Test(*string.String)
  Debug *string\s
EndProcedure

Define myString$ = "This is a test string"
Test(@myString$)

Test(@"Hello world")
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

That UTF8 stuff...

Code: Select all

EnableExplicit

Import "ucrtd.lib";Windows SDK
  strstr.i(*StrA,*StrB)
EndImport

Global TestString.s
Global Signature.s
Global Timer.q
Global Index.i
Global Position.i

TestString = Space(1000000) + "!"
Signature = "x"

Procedure.i cstrstr(Input.s,Signatur.s)
  Protected *Offset
  Protected *U1 = UTF8(Input)
  Protected *U2 = UTF8(Signatur)
  *Offset = strstr(*U1,*U2)
  If *Offset
    *Offset - *U1 + 1
  EndIf 
  FreeMemory(*U1)
  FreeMemory(*U2)
  ProcedureReturn *Offset
EndProcedure

Timer = ElapsedMilliseconds()
For Index = 1 To 1000
  Position = cstrstr(TestString,Signature)
Next
Timer = ElapsedMilliseconds() - Timer

MessageRequester("Performance","Time: " + Str(Timer) + #CR$ + "Position: " + Str(Position))

End
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Sicro wrote:Unfortunately this doesn't work (invalid memory access):

Code: Select all

Procedure Test(*string.String)
  Debug *string\s
EndProcedure

Define myString$ = "This is a test string"
Test(@myString$)

Test(@"Hello world")
You need one more step:

Code: Select all

EnableExplicit

Global TestString.s

Procedure.i Test(*StrPtr)
  Protected *Str.String = @*StrPtr
  Debug *Str\s
EndProcedure

TestString = "This is a test string"

Test(@TestString)
Test(@"Hello world")

End
fryquez
Enthusiast
Enthusiast
Posts: 391
Joined: Mon Dec 21, 2015 8:12 pm

Re: Why FindString() is so terribly slow?

Post by fryquez »

@Mijikai, you ca not just link to ucrt with Purebasic.
Linker search and found the symbol in msvcrt.lib before ucrtd.lib
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

fryquez wrote:@Mijikai, you ca not just link to ucrt with Purebasic.
Linker search and found the symbol in msvcrt.lib before ucrtd.lib
Any way to remove the other import?
User avatar
skywalk
Addict
Addict
Posts: 4224
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Why FindString() is so terribly slow?

Post by skywalk »

@Mijikai - Big time lost if you are creating memory within your strstr() function. Just use wcsstr().
@ccode - strstr() searches bytes. No respect for multibyte characters. And your speed test includes the build of the search string?

Excited to see the next beta and FindString() speed!
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

skywalk wrote:@Mijikai - Big time lost if you are creating memory within your strstr() function. Just use wcsstr().
@ccode - strstr() searches bytes. No respect for multibyte characters. And your speed test includes the build of the search string?

Excited to see the next beta and FindString() speed!
Yep im also looking forward to a possible change in FindString()

Just wanted to show a different way to use strstr() but my code seems to not correctly resolve imports.
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

The search for a string can be optimized individually for the search string. (If known)
You can replace all Unicode characters with non-contained ASCII characters and then search only with ASCII characters.
But I stay with FindString(), without parsing the string first.
It's really fast enough for me.
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 561
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Why FindString() is so terribly slow?

Post by Sicro »

@Mijikai:
Thank you for reminding me that a string variable needs a pointer to the address of the string.
The address of the string can change if the memory manager needs to find a new position in RAM for the string.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
mestnyi
Addict
Addict
Posts: 1098
Joined: Mon Nov 25, 2013 6:41 am

Re: Why FindString() is so terribly slow?

Post by mestnyi »

Maybe there was no need to write here, but I think it is somehow connected. :)
Can you help me speed up the process?
On macos I could not adapt your options. :oops:

Code: Select all

EnableExplicit

Global Text.s, String.s, a,f,f1,l,time.i, c.s = #LF$, LN.i=50000
Global NewList String.s()

Global *Buffer = AllocateMemory(LN*100)
Global *Pointer = *Buffer


time = ElapsedMilliseconds()
For a = 0 To LN
  Text.s = "Item "+Str(a) + c
  CopyMemoryString(PeekS(@Text), @*Pointer) ; 3
Next
Text = PeekS(*Buffer)
Text.s = Trim(Text.s, c)
Debug Str(ElapsedMilliseconds()-time) + " text collection time"



time = ElapsedMilliseconds()
For a = 0 To LN
;   f=f1 : f1 = FindString(Text.s, c, f1+1) : l=f1-f
;   
;   If a = LN
;     String.s = Trim(Mid(Text.s, f1-l), c)
;   Else
;     String.s = Mid(Text.s, f1-l+1, l)
;   EndIf
  
  
  String.s = StringField(Text.s, a+1, c) + c
  
  AddElement(String()) 
  String() = String.s
Next
Debug Str(ElapsedMilliseconds()-time) + " text parse time "
Debug "all count "+ListSize(String())
Post Reply