Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
uwekel
Enthusiast
Enthusiast
Posts: 740
Joined: Sat Dec 03, 2011 5:54 pm
Location: Oldenburg (Germany)

Why FindString() is so terribly slow?

Post by uwekel »

Hi,

if i run this code, it takes around 2000ms to complete, or 3900ms with #PB_String_NoCase. With Python or FreeBASIC, the same code runs over 100x faster :o

Code: Select all

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!")
Next
Debug ElapsedMilliseconds() - t
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!", 0, #PB_String_NoCase)
Next
Debug ElapsedMilliseconds() - t
Regards
Uwe
PB 5.70 LTS (x64) - Debian Testing, Gnome 3.30.2
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Iirc. thats because PureBasic does not store the string size so it needs to calculate it every time.

Helle a member in the german forum released a fast search function (sadly its only for Ascii):

Code: Select all

http://forums.purebasic.com/german/viewtopic.php?f=3&t=28159
__________________________________________________
SID deleted
25.11.2018
RSBasic
Last edited by Mijikai on Sun Nov 25, 2018 9:08 pm, edited 2 times in total.
User avatar
RSBasic
Moderator
Moderator
Posts: 1228
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Why FindString() is so terribly slow?

Post by RSBasic »

You have to turn off the debugger!
Image
Image
uwekel
Enthusiast
Enthusiast
Posts: 740
Joined: Sat Dec 03, 2011 5:54 pm
Location: Oldenburg (Germany)

Re: Why FindString() is so terribly slow?

Post by uwekel »

It's not the debugger. Without it takes 1900ms and 3300ms. FreeBASIC take around 30ms, Python 34ms.
PB 5.70 LTS (x64) - Debian Testing, Gnome 3.30.2
User avatar
RSBasic
Moderator
Moderator
Posts: 1228
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Why FindString() is so terribly slow?

Post by RSBasic »

Okay, I was just wondering because you wrote "Debug" in the code, because it only works with Debugger. :D
Image
Image
#NULL
Addict
Addict
Posts: 1499
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: Why FindString() is so terribly slow?

Post by #NULL »

Why would FindString() calculate the length beforehand?
I think copying the string for the function argument takes quite a bit of the time.
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Findstring calls wcslen_() mutliple times.
infratec
Always Here
Always Here
Posts: 7625
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Why FindString() is so terribly slow?

Post by infratec »

Can you show me your FreeBasic code?
User avatar
RSBasic
Moderator
Moderator
Posts: 1228
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Why FindString() is so terribly slow?

Post by RSBasic »

Is it faster if you compile this code?

Code: Select all

t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(LCase(s), LCase("!"))
Next
Debug ElapsedMilliseconds() - t
(not tested)
Image
Image
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

@RSBasic

x = FindString(LCase(s), LCase("!"))

is slower!!!
Mijikai wrote:thats because PureBasic does not store the string size so it needs to calculate it every time.
- Optimized: FindString() is up to twice as fast

And this is slower:

Code: Select all

Procedure.i FindStringMems(offset.l, *String, LenStr.i, *Srch, LenSrch.i, Enc.i=#PB_Ascii)
  ; REV:  150621, skywalk   
  ;       + changes by keya: 1) added 'offset', 2) changed from 0 To 1 base As per the other algos (0=not found)
  ;       Return byte location(1-based) of *Srch in *String.
  ;       Useful to search an Ascii string in Unicode app(default as of v5.4+).
  Protected.i lenChar
  If Enc = #PB_Unicode  ; 2 bytes/char
    lenChar = 2
  Else                  ; #PB_Ascii = 1 bytes/char, #PB_UTF8 = variable bytes/char
    lenChar = 1
  EndIf
  If *Srch
    Protected.i *pos = *String + (offset - 1)
    lenStr * lenchar + *String
    While *pos <= lenStr
      If CompareMemory(*pos, *Srch, lenSrch)
        ProcedureReturn *pos - *String + 1
      EndIf
      *pos + lenChar
    Wend
  Else
    ProcedureReturn 0
  EndIf
EndProcedure
Or

Code: Select all

Procedure findMemory(*pString.Character, stringToFind.s)
   Protected *pFind.Character = @stringToFind
   Protected *start = *pString
   
   If (*pFind\c = 0)
      ProcedureReturn 1
   EndIf
   
   While *pString\c
      If (*pFind\c = *pString\c)
         *pFind + SizeOf(Character)
         *pString + SizeOf(Character)
         If (*pFind\c = 0)
            Break
         EndIf
      Else
         *pString + SizeOf(Character)
         *pFind = @stringToFind
      EndIf
   Wend
   
   If (*pFind\c = 0)
      ProcedureReturn 1 + (*pString - (*pFind - @stringToFind) - *start) / SizeOf(Character)
   Else
      ProcedureReturn 0
   EndIf
EndProcedure
We need a faster assembler routine :shock: :)
User avatar
STARGÅTE
Addict
Addict
Posts: 2237
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Why FindString() is so terribly slow?

Post by STARGÅTE »

uwekel wrote:It's not the debugger. Without it takes 1900ms and 3300ms. FreeBASIC take around 30ms, Python 34ms.
Can you show the FreeBASIC and Python code?
I can not believe that!

Even if I just run a ASM loop with 1000000 steps (your string length) it needs 629ms (compared to 2787 with FindString)

Code: Select all

OpenConsole()

t = ElapsedMilliseconds()
For i = 1 To 1000
  ! MOV rcx, 1000000
  Loop:
  ! DEC rcx
  ! JNZ l_loop
Next
PrintN(Str(ElapsedMilliseconds()-t))

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!")
Next
PrintN(Str(ElapsedMilliseconds()-t))

Input()
Even with 8th-steps (Integer steps) it needs 80ms.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

I have tested it.
FreeBasic is faster!

Code: Select all

Dim teststring As String
DIM i AS INTEGER
Dim x As Integer
Dim t As Double

t = TIMER

teststring = Space(1000000) + "!"
For i = 0 to 1000
	x = InStr(teststring, "!")
next

Print TIMER - t

Sleep
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Re: Why FindString() is so terribly slow?

Post by srod »

Freebasic passes strings by reference so only passes the address to a function as opposed to PB in which the function itself makes a copy of the string which has been passed. This will inevitably slow things down.
I may look like a mule, but I'm not a complete ass.
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

Code: Select all

;No optimized C strings are slow.

OpenConsole("FindString-Test")
s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = strcspn_(Ascii(s), @"!")
Next
PrintN(Str(x+1)) ;Index is 0
PrintN(Str(ElapsedMilliseconds() - t))
Input()
CloseConsole()
The function must be designed for optimized C strings.
For example, an optimized string:
s.s{1000001}

FreeBasic-Strings:
Strings of variable length are handled internally via a descriptor (dt: identifier). This identifier contains a pointer to the actual string and the length of the string.
Command reference entry VARPTR returns a pointer to the identifier, while command reference entry STRPTR returns a pointer to the actual string.
At the end of a string, an instruction reference entry CHR (0) is automatically appended. This can be dispensed with the transfer to external functions on time-consuming copying.
The identifier of a variable-length string contains three things: the address of the string, its length, and the space reserved for the string. For this reason, Command Reference entry outputs SIZEOF (STRING) 12 (for a 32-bit compiler) or 24 (for a 64-bit compiler). This value results from a pointer address of 4 or 8 bytes and two instruction reference entries INTEGER, also 4 and 8 bytes, respectively. The reason that the length of the string and the amount of space reserved usually do not match is simply that it does not require costly reservation and copying of the memory each time the string content is changed
https://www.freebasic-portal.de/befehls ... p-412.html
uwekel
Enthusiast
Enthusiast
Posts: 740
Joined: Sat Dec 03, 2011 5:54 pm
Location: Oldenburg (Germany)

Re: Why FindString() is so terribly slow?

Post by uwekel »

As requested, here are the other code examples.

FreeBASIC:

Code: Select all

var s = Space(1000000) + "!"
var t = Timer
for i as integer = 1 to 1000
    var x = instr(s, "!")
next
print int((Timer - t) * 1000)  & "ms"
Python:

Code: Select all

from time import time

s = ' ' * 1000000 + '!'
t = time()
for i in range(1000):
    x = s.find('!')
print(int((time()-t)*1000), 'ms')
Regards
Uwe
PB 5.70 LTS (x64) - Debian Testing, Gnome 3.30.2
Post Reply