Page 1 of 8

Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 8:57 pm
by uwekel
Hi,

if i run this code, it takes around 2000ms to complete, or 3900ms with #PB_String_NoCase. With Python or FreeBASIC, the same code runs over 100x faster :o

Code: Select all

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!")
Next
Debug ElapsedMilliseconds() - t
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!", 0, #PB_String_NoCase)
Next
Debug ElapsedMilliseconds() - t
Regards
Uwe

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:06 pm
by Mijikai
Iirc. thats because PureBasic does not store the string size so it needs to calculate it every time.

Helle a member in the german forum released a fast search function (sadly its only for Ascii):

Code: Select all

http://forums.purebasic.com/german/viewtopic.php?f=3&t=28159
__________________________________________________
SID deleted
25.11.2018
RSBasic

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:06 pm
by RSBasic
You have to turn off the debugger!

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:37 pm
by uwekel
It's not the debugger. Without it takes 1900ms and 3300ms. FreeBASIC take around 30ms, Python 34ms.

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:41 pm
by RSBasic
Okay, I was just wondering because you wrote "Debug" in the code, because it only works with Debugger. :D

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:49 pm
by #NULL
Why would FindString() calculate the length beforehand?
I think copying the string for the function argument takes quite a bit of the time.

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 9:56 pm
by Mijikai
Findstring calls wcslen_() mutliple times.

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 10:06 pm
by infratec
Can you show me your FreeBasic code?

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 10:17 pm
by RSBasic
Is it faster if you compile this code?

Code: Select all

t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(LCase(s), LCase("!"))
Next
Debug ElapsedMilliseconds() - t
(not tested)

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 10:29 pm
by ccode
@RSBasic

x = FindString(LCase(s), LCase("!"))

is slower!!!
Mijikai wrote:thats because PureBasic does not store the string size so it needs to calculate it every time.
- Optimized: FindString() is up to twice as fast

And this is slower:

Code: Select all

Procedure.i FindStringMems(offset.l, *String, LenStr.i, *Srch, LenSrch.i, Enc.i=#PB_Ascii)
  ; REV:  150621, skywalk   
  ;       + changes by keya: 1) added 'offset', 2) changed from 0 To 1 base As per the other algos (0=not found)
  ;       Return byte location(1-based) of *Srch in *String.
  ;       Useful to search an Ascii string in Unicode app(default as of v5.4+).
  Protected.i lenChar
  If Enc = #PB_Unicode  ; 2 bytes/char
    lenChar = 2
  Else                  ; #PB_Ascii = 1 bytes/char, #PB_UTF8 = variable bytes/char
    lenChar = 1
  EndIf
  If *Srch
    Protected.i *pos = *String + (offset - 1)
    lenStr * lenchar + *String
    While *pos <= lenStr
      If CompareMemory(*pos, *Srch, lenSrch)
        ProcedureReturn *pos - *String + 1
      EndIf
      *pos + lenChar
    Wend
  Else
    ProcedureReturn 0
  EndIf
EndProcedure
Or

Code: Select all

Procedure findMemory(*pString.Character, stringToFind.s)
   Protected *pFind.Character = @stringToFind
   Protected *start = *pString
   
   If (*pFind\c = 0)
      ProcedureReturn 1
   EndIf
   
   While *pString\c
      If (*pFind\c = *pString\c)
         *pFind + SizeOf(Character)
         *pString + SizeOf(Character)
         If (*pFind\c = 0)
            Break
         EndIf
      Else
         *pString + SizeOf(Character)
         *pFind = @stringToFind
      EndIf
   Wend
   
   If (*pFind\c = 0)
      ProcedureReturn 1 + (*pString - (*pFind - @stringToFind) - *start) / SizeOf(Character)
   Else
      ProcedureReturn 0
   EndIf
EndProcedure
We need a faster assembler routine :shock: :)

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 10:55 pm
by STARGĂ…TE
uwekel wrote:It's not the debugger. Without it takes 1900ms and 3300ms. FreeBASIC take around 30ms, Python 34ms.
Can you show the FreeBASIC and Python code?
I can not believe that!

Even if I just run a ASM loop with 1000000 steps (your string length) it needs 629ms (compared to 2787 with FindString)

Code: Select all

OpenConsole()

t = ElapsedMilliseconds()
For i = 1 To 1000
  ! MOV rcx, 1000000
  Loop:
  ! DEC rcx
  ! JNZ l_loop
Next
PrintN(Str(ElapsedMilliseconds()-t))

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!")
Next
PrintN(Str(ElapsedMilliseconds()-t))

Input()
Even with 8th-steps (Integer steps) it needs 80ms.

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 11:04 pm
by ccode
I have tested it.
FreeBasic is faster!

Code: Select all

Dim teststring As String
DIM i AS INTEGER
Dim x As Integer
Dim t As Double

t = TIMER

teststring = Space(1000000) + "!"
For i = 0 to 1000
	x = InStr(teststring, "!")
next

Print TIMER - t

Sleep

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 11:31 pm
by srod
Freebasic passes strings by reference so only passes the address to a function as opposed to PB in which the function itself makes a copy of the string which has been passed. This will inevitably slow things down.

Re: Why FindString() is so terribly slow?

Posted: Sun Nov 25, 2018 11:45 pm
by ccode

Code: Select all

;No optimized C strings are slow.

OpenConsole("FindString-Test")
s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = strcspn_(Ascii(s), @"!")
Next
PrintN(Str(x+1)) ;Index is 0
PrintN(Str(ElapsedMilliseconds() - t))
Input()
CloseConsole()
The function must be designed for optimized C strings.
For example, an optimized string:
s.s{1000001}

FreeBasic-Strings:
Strings of variable length are handled internally via a descriptor (dt: identifier). This identifier contains a pointer to the actual string and the length of the string.
Command reference entry VARPTR returns a pointer to the identifier, while command reference entry STRPTR returns a pointer to the actual string.
At the end of a string, an instruction reference entry CHR (0) is automatically appended. This can be dispensed with the transfer to external functions on time-consuming copying.
The identifier of a variable-length string contains three things: the address of the string, its length, and the space reserved for the string. For this reason, Command Reference entry outputs SIZEOF (STRING) 12 (for a 32-bit compiler) or 24 (for a 64-bit compiler). This value results from a pointer address of 4 or 8 bytes and two instruction reference entries INTEGER, also 4 and 8 bytes, respectively. The reason that the length of the string and the amount of space reserved usually do not match is simply that it does not require costly reservation and copying of the memory each time the string content is changed
https://www.freebasic-portal.de/befehls ... p-412.html

Re: Why FindString() is so terribly slow?

Posted: Mon Nov 26, 2018 6:33 am
by uwekel
As requested, here are the other code examples.

FreeBASIC:

Code: Select all

var s = Space(1000000) + "!"
var t = Timer
for i as integer = 1 to 1000
    var x = instr(s, "!")
next
print int((Timer - t) * 1000)  & "ms"
Python:

Code: Select all

from time import time

s = ' ' * 1000000 + '!'
t = time()
for i in range(1000):
    x = s.find('!')
print(int((time()-t)*1000), 'ms')
Regards
Uwe