Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Here is my attempt :)

Only Pointers need to be supplied, the string size will be determined on the fly.
Theres a Unicode and Ascii version.

Its pretty fast already (it beats FindString()) :D

Code (run without Debugger!):

Code: Select all

EnableExplicit

Global TestString.s
Global Signature.s
Global Index.i
Global T1.q, R1.i, *A1
Global T2.q, R2.i, *A2

Procedure.i LocStrW(*Input,*Signature);Unicode
  !mov rcx,[p.p_Input]
  !mov rdx,[p.p_Signature]
  !xor rax,rax
  !mov r10,rcx
  !cmp word[rdx],0h
  !je LocStr_Return
  !LocStr_Loop:
  !cmp word[rcx],0h
  !je LocStr_Return
  !mov bx,word[rcx]
  !cmp bx,word[rdx]
  !jne LocStr_Next
  !lea r8,[rcx+2h]
  !lea r9,[rdx+2h]
  !LocStr_Match:
  !cmp word[r9],0h
  !je LocStr_Result 
  !cmp word[r8],0h
  !je LocStr_Next
  !mov bx,word[r8]
  !cmp bx,word[r9]
  !jne LocStr_Next
  !lea r8,[r8+2h]
  !lea r9,[r9+2h]
  !jmp LocStr_Match
  !LocStr_Result:
  !sub rcx,r10
  !lea rax,[rcx+2h]
  !shr rax,1h
  !jmp LocStr_Return
  !LocStr_Next:
  !lea rcx,[rcx+2h]
  !jmp LocStr_Loop
  !LocStr_Return:
  ProcedureReturn
EndProcedure

Procedure.i LocStrA(*Input,*Signature);Ascii
  !mov rcx,[p.p_Input]
  !mov rdx,[p.p_Signature]
  !xor rax,rax
  !mov r10,rcx
  !cmp byte[rdx],0h
  !je LocStr_Return
  !LocStr_Loop:
  !cmp byte[rcx],0h
  !je LocStr_Return
  !mov bl,byte[rcx]
  !cmp bl,byte[rdx]
  !jne LocStr_Next
  !lea r8,[rcx+1h]
  !lea r9,[rdx+1h]
  !LocStr_Match:
  !cmp byte[r9],0h
  !je LocStr_Result 
  !cmp byte[r8],0h
  !je LocStr_Next
  !mov bl,byte[r8]
  !cmp bl,byte[r9]
  !jne LocStr_Next
  !inc r8
  !inc r9
  !jmp LocStr_Match
  !LocStr_Result:
  !sub rcx,r10
  !lea rax,[rcx+1h]
  !jmp LocStr_Return
  !LocStr_Next:
  !inc rcx
  !jmp LocStr_Loop
  !LocStr_Return:
  ProcedureReturn
EndProcedure

#Tries = 2e2 

TestString = Space(1000000) + "!#"
Signature = "!"

*A1 = Ascii(TestString)
*A2 = Ascii(Signature)
  
T1 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  R1 = FindString(TestString,Signature)
Next
T1 = ElapsedMilliseconds() - T1

T2 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  R2 = LocStrW(@TestString,@Signature);LocStrA(*A1,*A2)
Next
T2 = ElapsedMilliseconds() - T2

MessageRequester("Result",
                 "T1: " + Str(T1) + #CR$ + "R1: " + Str(R1) + #CR$ + #CR$ + 
                 "T2: " + Str(T2) + #CR$ + "R2: " + Str(R2))

End
Last edited by Mijikai on Thu Nov 29, 2018 6:06 pm, edited 2 times in total.
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Re: Why FindString() is so terribly slow?

Post by forumuser »

Nice work, Mijikai!
It needs only half the time that FindString() needs :mrgreen:

Would a 32-bit version viable as well?
Wolfram
Enthusiast
Enthusiast
Posts: 607
Joined: Thu May 30, 2013 4:39 pm

Re: Why FindString() is so terribly slow?

Post by Wolfram »

Nice, it seams to be 4 times faster

(OSX High Sierra)

+1 for a 32bit solution ;-)
macOS Catalina 10.15.7
User avatar
skywalk
Addict
Addict
Posts: 4224
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Why FindString() is so terribly slow?

Post by skywalk »

Mijikai wrote:Here is my attempt :)
Its pretty fast already (it beats FindString()) :D
Good work, but, no significant advantage over the existing C call, wccstr(*sIn,*srch), which is available as x86/x64. :wink:

Code: Select all

; Count(n),                                            200
; findstring_+case(ms),                                325
; findstringmem_+case(ms),                             1372
; findstring_-case(ms),                                4012
; findstringmem_-case(ms),                             2007
; findstringC_+case(ms),                               224
; findstringASMU_+case(ms),                            221
; findstring_+case : findstringmem_+case(%),           322
; findstring_-case : findstringmem_-case(%),           -50
; findstring_+case : findstring_-case(%),              1134
; findstringmem_+case : findstringmem_-case(%),        46
; findstring_+case : findstringC_+case(%),             -31
; findstring_+case : findstringASMU_+case(%),          -32
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

forumuser wrote:Nice work, Mijikai!
It needs only half the time that FindString() needs :mrgreen:

Would a 32-bit version viable as well?
Thanks :)

Here are the 32bit & 64bit Versions:

Code: Select all

;LocStrA/W by mijikai

Procedure.i LocStrW(*Input,*Signature)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx,[p.p_Input]
    !mov rdx,[p.p_Signature]
    !xor rax,rax
    !mov r10,rcx
    !cmp word[rdx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp word[rcx],0h
    !je LocStr_Return
    !mov bx,word[rcx]
    !cmp bx,word[rdx]
    !jne LocStr_Next
    !lea r8,[rcx+2h]
    !lea r9,[rdx+2h]
    !LocStr_Match:
    !cmp word[r9],0h
    !je LocStr_Result 
    !cmp word[r8],0h
    !je LocStr_Next
    !mov bx,word[r8]
    !cmp bx,word[r9]
    !jne LocStr_Next
    !lea r8,[r8+2h]
    !lea r9,[r9+2h]
    !jmp LocStr_Match
    !LocStr_Result:
    !sub rcx,r10
    !lea rax,[rcx+2h]
    !shr rax,1h
    !jmp LocStr_Return
    !LocStr_Next:
    !lea rcx,[rcx+2h]
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerElse
    !mov ecx,[p.p_Input]
    !mov edx,[p.p_Signature]
    !xor eax,eax
    !cmp word[edx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp word[ecx],0h
    !je LocStr_Return
    !mov bx,word[ecx]
    !cmp bx,word[edx]
    !jne LocStr_Next
    !lea esi,[ecx+2h]
    !lea edi,[edx+2h]
    !LocStr_Match:
    !cmp word[edi],0h
    !je LocStr_Result 
    !cmp word[esi],0h
    !je LocStr_Next
    !mov bx,word[esi]
    !cmp bx,word[edi]
    !jne LocStr_Next
    !lea esi,[esi+2h]
    !lea edi,[edi+2h]
    !jmp LocStr_Match
    !LocStr_Result:
    !sub ecx,[p.p_Input]
    !lea eax,[ecx+2h]
    !shr eax,1h
    !jmp LocStr_Return
    !LocStr_Next:
    !lea ecx,[ecx+2h]
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerEndIf
EndProcedure

Procedure.i LocStrA(*Input,*Signature)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx,[p.p_Input]
    !mov rdx,[p.p_Signature]
    !xor rax,rax
    !mov r10,rcx
    !cmp byte[rdx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp byte[rcx],0h
    !je LocStr_Return
    !mov bl,byte[rcx]
    !cmp bl,byte[rdx]
    !jne LocStr_Next
    !lea r8,[rcx+1h]
    !lea r9,[rdx+1h]
    !LocStr_Match:
    !cmp byte[r9],0h
    !je LocStr_Result 
    !cmp byte[r8],0h
    !je LocStr_Next
    !mov bl,byte[r8]
    !cmp bl,byte[r9]
    !jne LocStr_Next
    !inc r8
    !inc r9
    !jmp LocStr_Match
    !LocStr_Result:
    !sub rcx,r10
    !lea rax,[rcx+1h]
    !jmp LocStr_Return
    !LocStr_Next:
    !inc rcx
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerElse
    !mov ecx,[p.p_Input]
    !mov edx,[p.p_Signature]
    !xor eax,eax
    !cmp byte[edx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp byte[ecx],0h
    !je LocStr_Return
    !mov bl,byte[ecx]
    !cmp bl,byte[edx]
    !jne LocStr_Next
    !lea esi,[ecx+1h]
    !lea edi,[edx+1h]
    !LocStr_Match:
    !cmp byte[edi],0h
    !je LocStr_Result 
    !cmp byte[esi],0h
    !je LocStr_Next
    !mov bl,byte[esi]
    !cmp bl,byte[edi]
    !jne LocStr_Next
    !inc esi
    !inc edi
    !jmp LocStr_Match
    !LocStr_Result:
    !sub ecx,[p.p_Input]
    !lea eax,[ecx+1h]
    !jmp LocStr_Return
    !LocStr_Next:
    !inc ecx
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerEndIf
EndProcedure
All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.
Last edited by Mijikai on Thu Nov 29, 2018 6:15 pm, edited 4 times in total.
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

skywalk wrote:
Mijikai wrote:Here is my attempt :)
Its pretty fast already (it beats FindString()) :D
Good work, but, no significant advantage over the existing C call, wccstr(*sIn,*srch), which is available as x86/x64. :wink:
Yeah those are pretty good solutions as well. :)
User avatar
skywalk
Addict
Addict
Posts: 4224
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Why FindString() is so terribly slow?

Post by skywalk »

Mijikai wrote:All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.
The StrStrIW() function sets the inputs to lower case? Were you thinking of a different approach?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

skywalk wrote:
Mijikai wrote:All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.
The StrStrIW() function sets the inputs to lower case? Were you thinking of a different approach?
No, i was talking about LocStrA/W().
Implementing lower case mapping is difficult and cant be done on the fly
so calls to memory functions are not avoidable.
Im not able to do it.

Anyhow, i personally would love to see a Unicode version of Helles Code as it is the fastest so far.
User avatar
skywalk
Addict
Addict
Posts: 4224
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Why FindString() is so terribly slow?

Post by skywalk »

Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

skywalk wrote:Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.
Sure ill post it again: https://www.purebasic.fr/german/viewtop ... 02#p323802

__________________________________________________
URL changed and SID deleted
29.11.2018
RSBasic
User avatar
RSBasic
Moderator
Moderator
Posts: 1228
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: Why FindString() is so terribly slow?

Post by RSBasic »

@Mijikai
Please use the url https://www.purebasic.fr/german because it is encrypted with SSL certificate.
Image
Image
User avatar
idle
Always Here
Always Here
Posts: 5929
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Why FindString() is so terribly slow?

Post by idle »

Mijikai wrote:
skywalk wrote:Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.
Sure ill post it again: https://www.purebasic.fr/german/viewtop ... 02#p323802

__________________________________________________
URL changed and SID deleted
29.11.2018
RSBasic
just be aware that it's SSE 4.2 safer would be SSE2, ask wilbert for his findstring module
Windows 11, Manjaro, Raspberry Pi OS
Image
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4955
Joined: Sun Apr 12, 2009 6:27 am

Re: Why FindString() is so terribly slow?

Post by RASHAD »

Great topic
Very big thanks to all contributors
And thanks idle for pointing to wilbert staff
Up to now the winner is wilbert

Code: Select all

EnableASM

Global Pos
 
CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
  Macro rax : eax : EndMacro
  Macro rbx : ebx : EndMacro   
  Macro rcx : ecx : EndMacro
  Macro rdx : edx : EndMacro
  Macro rsi : esi : EndMacro
  Macro rdi : edi : EndMacro
  Macro rbp : ebp : EndMacro
  Macro rsp : esp : EndMacro
CompilerEndIf

Macro M_movdqa(arg1, arg2)
  !movdqa arg1, arg2
EndMacro

Procedure SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize, Pos=0, Count=#False)
   
    ; backup some registers
    mov [rsp -  8], rbx
    mov [rsp - 16], rsi
    mov [rsp - 24], rdi
   
    ; init count
    mov rax, [p.v_Count]
    sub rax, 1
    sbb rax, rax
    mov [rsp - 40], rax
   
    ; perform some checks
    mov rcx, [p.v_Pos]
    sub [p.v_HayStackSize], rcx
    !jbe .l_sse2_search8            ; exit when HayStackSize <= Pos
    add [p.p_HayStack], rcx   
    mov rax, [p.v_HayStackSize]
    mov rbx, [p.v_NeedleSize]
    sub rax, rbx
    !jc .l_sse2_search8             ; exit if NeedleSize > HaystackSize
    add rax, [p.p_HayStack] 
    mov [rsp - 32], rax             ; rsp - 32 = *SearchEnd
    cmp rbx, 1
    !jl .l_sse2_search8             ; exit if NeedleSize < 1

    ; load first two needle bytes
    !pcmpeqb xmm4, xmm4
    mov rdi, [p.p_Needle]
    movzx eax, byte [rdi]
    !je .l_sse2_search0
    mov ah, [rdi + 1]
    !pslldq xmm4, 15
    !.l_sse2_search0:
    !movd xmm2, eax
    !punpcklbw xmm2, xmm2
    !punpcklwd xmm2, xmm2
    !pshufd xmm3, xmm2, 01010101b   ; xmm3 = 16 times second needle byte
    !pshufd xmm2, xmm2, 0           ; xmm2 = 16 times first needle byte
   
    ; start search
    mov rsi, [p.p_HayStack]
    mov rcx, rsi
    shr rsi, 4                      ; align Haystack to 16 bytes
    shl rsi, 4
    sub rcx, rsi
    M_movdqa(xmm0, [rsi])           ; handle first 16 bytes
    !movdqa xmm1, xmm0
    !pcmpeqb xmm0, xmm2             ; compare against first needle byte
    !pmovmskb eax, xmm0
    !shr eax, cl                    ; shift off unwanted bytes
    !shl eax, cl
    !test eax, eax
    !jnz .l_sse2_search2
   
    ; main search loop
    !.l_sse2_search1:
    add rsi, 16                     ; next 16 bytes
    cmp rsi, [rsp - 32]
    !ja .l_sse2_search8
    M_movdqa(xmm0, [rsi])
    !movdqa xmm1, xmm0
    !pcmpeqb xmm0, xmm2             ; compare against first needle byte
    !pmovmskb eax, xmm0
    !test eax, eax
    !jz .l_sse2_search1             ; no match ? => search1
    !.l_sse2_search2:
    !pcmpeqb xmm1, xmm3             ; compare against second needle byte
    !psrldq xmm1, 1
    !por xmm1, xmm4
    !pmovmskb ecx, xmm1
    !and eax, ecx                   ; combine both searches
    !jz .l_sse2_search1             ; no match ? => search1
   
    ; compare rest of bytes
    !.l_sse2_search3:
    !bsf ecx, eax                   ; get index of first match
    !jz .l_sse2_search1
    !btr eax, ecx
    lea rdx, [rsi + rcx]            ; create a pointer to it
    cmp rdx, [rsp - 32]
    mov rcx, [p.v_NeedleSize]
    !ja .l_sse2_search8
    sub rcx, 2
    !jb .l_sse2_search5             ; NeedleSize < 2 ? => search5 (already done)
   
    !.l_sse2_search4:
    movzx ebx, word [rdx + rcx]     ; compare rest of needle right-to-left
    cmp bx, [rdi + rcx]             ; two bytes at a time
    !jne .l_sse2_search3
    sub rcx, 2
    !jae .l_sse2_search4
   
    !.l_sse2_search5:
    mov rbx, [rsp - 40]
    cmp rbx, -1
    !je .l_sse2_search6
    add rbx, 1                      ; increase count
    mov [rsp - 40], rbx
    !jmp .l_sse2_search3
    !.l_sse2_search6:
    mov rax, rdx                    ; return result
    sub rax, [p.p_HayStack]
    add rax, [p.v_Pos]   
    !.l_sse2_search7:
    mov rbx, [rsp -  8]
    mov rsi, [rsp - 16]
    mov rdi, [rsp - 24] 
    ProcedureReturn
   
    ; not found / return count
    !.l_sse2_search8:
    mov rax, [rsp - 40]
    !jmp .l_sse2_search7
   
EndProcedure 
 
HayStackstring$ = Space(1000000) + "!"
HayStackSize = Len(HayStackstring$)
*HayStack = AllocateMemory(HayStackSize)
PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)

Needlestring$ = "!"
NeedleSize = Len(Needlestring$)
*Needle = AllocateMemory(NeedleSize)
PokeS(*Needle,Needlestring$,-1,#PB_Ascii)

t1 = ElapsedMilliseconds()
For i = 1 To 1000
  Pos = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize)
Next
t2 = ElapsedMilliseconds() - t1

S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position : "+ Str(Pos)
MessageRequester("Test results", S)
Last edited by RASHAD on Fri Nov 30, 2018 12:50 am, edited 1 time in total.
Egypt my love
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4955
Joined: Sun Apr 12, 2009 6:27 am

Re: Why FindString() is so terribly slow?

Post by RASHAD »

Find repeated string

Code: Select all

EnableASM
Global Pos,count
.
.
Procedure SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize, Pos, Count)
.
.
EndProcedure 
 
HayStackstring$ = Space(10000) + "!#$" + Space(10000) + "!#$"
HayStackSize = Len(HayStackstring$)
*HayStack = AllocateMemory(HayStackSize)
PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)

Needlestring$ = "!#$"
NeedleSize = Len(Needlestring$)
*Needle = AllocateMemory(NeedleSize)
PokeS(*Needle,Needlestring$,-1,#PB_Ascii)

count = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize,0,10)
For i = 0 To count-1
  Pos = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize,0,0)
  S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position #"+Str(i+1)+" : "+ Str(Pos) + #CRLF$ + "Count : "+ Str(Count)
  MessageRequester("Test results", S)
  PokeS(*HayStack+pos,Space(NeedleSize),-1,#PB_Ascii)
Next
Egypt my love
User avatar
gurj
Enthusiast
Enthusiast
Posts: 693
Joined: Thu Jan 22, 2009 3:48 am
Location: china
Contact:

Re: Why FindString() is so terribly slow?

Post by gurj »

; my cpu is 1G
; out:
; 0
; 0
; 4 ms
; !

Code: Select all

ElapsedMilliseconds()
s.s=Space(10000)+"!"
Debug ElapsedMilliseconds()
t=ElapsedMilliseconds()
ss.s="!"
l=StringByteLength(ss)
ll=StringByteLength(s)
m=ll-l+1
Debug ElapsedMilliseconds()-t
t=ElapsedMilliseconds()
re:
k=CompareMemoryString(@ss,@s+n,#PB_String_CaseSensitive,l)
If k<>#PB_String_Equal
 If n<m
  n+l
  Goto re
 EndIf
Else
 Debug ElapsedMilliseconds()-t
 Debug PeekS(@s+n,l)
EndIf
; my cpu is 1G
; out:
; 0
; 0
; 4
; !
my pb for chinese:
http://ataorj.ys168.com
Post Reply