Page 4 of 8

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 1:26 pm
by Mijikai
Here is my attempt :)

Only Pointers need to be supplied, the string size will be determined on the fly.
Theres a Unicode and Ascii version.

Its pretty fast already (it beats FindString()) :D

Code (run without Debugger!):

Code: Select all

EnableExplicit

Global TestString.s
Global Signature.s
Global Index.i
Global T1.q, R1.i, *A1
Global T2.q, R2.i, *A2

Procedure.i LocStrW(*Input,*Signature);Unicode
  !mov rcx,[p.p_Input]
  !mov rdx,[p.p_Signature]
  !xor rax,rax
  !mov r10,rcx
  !cmp word[rdx],0h
  !je LocStr_Return
  !LocStr_Loop:
  !cmp word[rcx],0h
  !je LocStr_Return
  !mov bx,word[rcx]
  !cmp bx,word[rdx]
  !jne LocStr_Next
  !lea r8,[rcx+2h]
  !lea r9,[rdx+2h]
  !LocStr_Match:
  !cmp word[r9],0h
  !je LocStr_Result 
  !cmp word[r8],0h
  !je LocStr_Next
  !mov bx,word[r8]
  !cmp bx,word[r9]
  !jne LocStr_Next
  !lea r8,[r8+2h]
  !lea r9,[r9+2h]
  !jmp LocStr_Match
  !LocStr_Result:
  !sub rcx,r10
  !lea rax,[rcx+2h]
  !shr rax,1h
  !jmp LocStr_Return
  !LocStr_Next:
  !lea rcx,[rcx+2h]
  !jmp LocStr_Loop
  !LocStr_Return:
  ProcedureReturn
EndProcedure

Procedure.i LocStrA(*Input,*Signature);Ascii
  !mov rcx,[p.p_Input]
  !mov rdx,[p.p_Signature]
  !xor rax,rax
  !mov r10,rcx
  !cmp byte[rdx],0h
  !je LocStr_Return
  !LocStr_Loop:
  !cmp byte[rcx],0h
  !je LocStr_Return
  !mov bl,byte[rcx]
  !cmp bl,byte[rdx]
  !jne LocStr_Next
  !lea r8,[rcx+1h]
  !lea r9,[rdx+1h]
  !LocStr_Match:
  !cmp byte[r9],0h
  !je LocStr_Result 
  !cmp byte[r8],0h
  !je LocStr_Next
  !mov bl,byte[r8]
  !cmp bl,byte[r9]
  !jne LocStr_Next
  !inc r8
  !inc r9
  !jmp LocStr_Match
  !LocStr_Result:
  !sub rcx,r10
  !lea rax,[rcx+1h]
  !jmp LocStr_Return
  !LocStr_Next:
  !inc rcx
  !jmp LocStr_Loop
  !LocStr_Return:
  ProcedureReturn
EndProcedure

#Tries = 2e2 

TestString = Space(1000000) + "!#"
Signature = "!"

*A1 = Ascii(TestString)
*A2 = Ascii(Signature)
  
T1 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  R1 = FindString(TestString,Signature)
Next
T1 = ElapsedMilliseconds() - T1

T2 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  R2 = LocStrW(@TestString,@Signature);LocStrA(*A1,*A2)
Next
T2 = ElapsedMilliseconds() - T2

MessageRequester("Result",
                 "T1: " + Str(T1) + #CR$ + "R1: " + Str(R1) + #CR$ + #CR$ + 
                 "T2: " + Str(T2) + #CR$ + "R2: " + Str(R2))

End

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 2:59 pm
by forumuser
Nice work, Mijikai!
It needs only half the time that FindString() needs :mrgreen:

Would a 32-bit version viable as well?

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 5:41 pm
by Wolfram
Nice, it seams to be 4 times faster

(OSX High Sierra)

+1 for a 32bit solution ;-)

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 5:45 pm
by skywalk
Mijikai wrote:Here is my attempt :)
Its pretty fast already (it beats FindString()) :D
Good work, but, no significant advantage over the existing C call, wccstr(*sIn,*srch), which is available as x86/x64. :wink:

Code: Select all

; Count(n),                                            200
; findstring_+case(ms),                                325
; findstringmem_+case(ms),                             1372
; findstring_-case(ms),                                4012
; findstringmem_-case(ms),                             2007
; findstringC_+case(ms),                               224
; findstringASMU_+case(ms),                            221
; findstring_+case : findstringmem_+case(%),           322
; findstring_-case : findstringmem_-case(%),           -50
; findstring_+case : findstring_-case(%),              1134
; findstringmem_+case : findstringmem_-case(%),        46
; findstring_+case : findstringC_+case(%),             -31
; findstring_+case : findstringASMU_+case(%),          -32

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 5:47 pm
by Mijikai
forumuser wrote:Nice work, Mijikai!
It needs only half the time that FindString() needs :mrgreen:

Would a 32-bit version viable as well?
Thanks :)

Here are the 32bit & 64bit Versions:

Code: Select all

;LocStrA/W by mijikai

Procedure.i LocStrW(*Input,*Signature)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx,[p.p_Input]
    !mov rdx,[p.p_Signature]
    !xor rax,rax
    !mov r10,rcx
    !cmp word[rdx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp word[rcx],0h
    !je LocStr_Return
    !mov bx,word[rcx]
    !cmp bx,word[rdx]
    !jne LocStr_Next
    !lea r8,[rcx+2h]
    !lea r9,[rdx+2h]
    !LocStr_Match:
    !cmp word[r9],0h
    !je LocStr_Result 
    !cmp word[r8],0h
    !je LocStr_Next
    !mov bx,word[r8]
    !cmp bx,word[r9]
    !jne LocStr_Next
    !lea r8,[r8+2h]
    !lea r9,[r9+2h]
    !jmp LocStr_Match
    !LocStr_Result:
    !sub rcx,r10
    !lea rax,[rcx+2h]
    !shr rax,1h
    !jmp LocStr_Return
    !LocStr_Next:
    !lea rcx,[rcx+2h]
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerElse
    !mov ecx,[p.p_Input]
    !mov edx,[p.p_Signature]
    !xor eax,eax
    !cmp word[edx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp word[ecx],0h
    !je LocStr_Return
    !mov bx,word[ecx]
    !cmp bx,word[edx]
    !jne LocStr_Next
    !lea esi,[ecx+2h]
    !lea edi,[edx+2h]
    !LocStr_Match:
    !cmp word[edi],0h
    !je LocStr_Result 
    !cmp word[esi],0h
    !je LocStr_Next
    !mov bx,word[esi]
    !cmp bx,word[edi]
    !jne LocStr_Next
    !lea esi,[esi+2h]
    !lea edi,[edi+2h]
    !jmp LocStr_Match
    !LocStr_Result:
    !sub ecx,[p.p_Input]
    !lea eax,[ecx+2h]
    !shr eax,1h
    !jmp LocStr_Return
    !LocStr_Next:
    !lea ecx,[ecx+2h]
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerEndIf
EndProcedure

Procedure.i LocStrA(*Input,*Signature)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rcx,[p.p_Input]
    !mov rdx,[p.p_Signature]
    !xor rax,rax
    !mov r10,rcx
    !cmp byte[rdx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp byte[rcx],0h
    !je LocStr_Return
    !mov bl,byte[rcx]
    !cmp bl,byte[rdx]
    !jne LocStr_Next
    !lea r8,[rcx+1h]
    !lea r9,[rdx+1h]
    !LocStr_Match:
    !cmp byte[r9],0h
    !je LocStr_Result 
    !cmp byte[r8],0h
    !je LocStr_Next
    !mov bl,byte[r8]
    !cmp bl,byte[r9]
    !jne LocStr_Next
    !inc r8
    !inc r9
    !jmp LocStr_Match
    !LocStr_Result:
    !sub rcx,r10
    !lea rax,[rcx+1h]
    !jmp LocStr_Return
    !LocStr_Next:
    !inc rcx
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerElse
    !mov ecx,[p.p_Input]
    !mov edx,[p.p_Signature]
    !xor eax,eax
    !cmp byte[edx],0h
    !je LocStr_Return
    !LocStr_Loop:
    !cmp byte[ecx],0h
    !je LocStr_Return
    !mov bl,byte[ecx]
    !cmp bl,byte[edx]
    !jne LocStr_Next
    !lea esi,[ecx+1h]
    !lea edi,[edx+1h]
    !LocStr_Match:
    !cmp byte[edi],0h
    !je LocStr_Result 
    !cmp byte[esi],0h
    !je LocStr_Next
    !mov bl,byte[esi]
    !cmp bl,byte[edi]
    !jne LocStr_Next
    !inc esi
    !inc edi
    !jmp LocStr_Match
    !LocStr_Result:
    !sub ecx,[p.p_Input]
    !lea eax,[ecx+1h]
    !jmp LocStr_Return
    !LocStr_Next:
    !inc ecx
    !jmp LocStr_Loop
    !LocStr_Return:
    ProcedureReturn
  CompilerEndIf
EndProcedure
All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 5:59 pm
by Mijikai
skywalk wrote:
Mijikai wrote:Here is my attempt :)
Its pretty fast already (it beats FindString()) :D
Good work, but, no significant advantage over the existing C call, wccstr(*sIn,*srch), which is available as x86/x64. :wink:
Yeah those are pretty good solutions as well. :)

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 7:01 pm
by skywalk
Mijikai wrote:All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.
The StrStrIW() function sets the inputs to lower case? Were you thinking of a different approach?

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 7:17 pm
by Mijikai
skywalk wrote:
Mijikai wrote:All functions still miss the offset and case insensitivity option.
Offset is not complicated but case insensitivity seems to be difficult.
The StrStrIW() function sets the inputs to lower case? Were you thinking of a different approach?
No, i was talking about LocStrA/W().
Implementing lower case mapping is difficult and cant be done on the fly
so calls to memory functions are not avoidable.
Im not able to do it.

Anyhow, i personally would love to see a Unicode version of Helles Code as it is the fastest so far.

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 8:29 pm
by skywalk
Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 9:06 pm
by Mijikai
skywalk wrote:Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.
Sure ill post it again: https://www.purebasic.fr/german/viewtop ... 02#p323802

__________________________________________________
URL changed and SID deleted
29.11.2018
RSBasic

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 9:40 pm
by RSBasic
@Mijikai
Please use the url https://www.purebasic.fr/german because it is encrypted with SSL certificate.

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 9:44 pm
by idle
Mijikai wrote:
skywalk wrote:Do you have a link to "Helle's code"? The C string functions appear to work nearly as well as the asm. My guess is the same applies to Helle's code. So much more code to do, I cannot reinvent wheels that are not broken.
Sure ill post it again: https://www.purebasic.fr/german/viewtop ... 02#p323802

__________________________________________________
URL changed and SID deleted
29.11.2018
RSBasic
just be aware that it's SSE 4.2 safer would be SSE2, ask wilbert for his findstring module

Re: Why FindString() is so terribly slow?

Posted: Thu Nov 29, 2018 11:16 pm
by RASHAD
Great topic
Very big thanks to all contributors
And thanks idle for pointing to wilbert staff
Up to now the winner is wilbert

Code: Select all

EnableASM

Global Pos
 
CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
  Macro rax : eax : EndMacro
  Macro rbx : ebx : EndMacro   
  Macro rcx : ecx : EndMacro
  Macro rdx : edx : EndMacro
  Macro rsi : esi : EndMacro
  Macro rdi : edi : EndMacro
  Macro rbp : ebp : EndMacro
  Macro rsp : esp : EndMacro
CompilerEndIf

Macro M_movdqa(arg1, arg2)
  !movdqa arg1, arg2
EndMacro

Procedure SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize, Pos=0, Count=#False)
   
    ; backup some registers
    mov [rsp -  8], rbx
    mov [rsp - 16], rsi
    mov [rsp - 24], rdi
   
    ; init count
    mov rax, [p.v_Count]
    sub rax, 1
    sbb rax, rax
    mov [rsp - 40], rax
   
    ; perform some checks
    mov rcx, [p.v_Pos]
    sub [p.v_HayStackSize], rcx
    !jbe .l_sse2_search8            ; exit when HayStackSize <= Pos
    add [p.p_HayStack], rcx   
    mov rax, [p.v_HayStackSize]
    mov rbx, [p.v_NeedleSize]
    sub rax, rbx
    !jc .l_sse2_search8             ; exit if NeedleSize > HaystackSize
    add rax, [p.p_HayStack] 
    mov [rsp - 32], rax             ; rsp - 32 = *SearchEnd
    cmp rbx, 1
    !jl .l_sse2_search8             ; exit if NeedleSize < 1

    ; load first two needle bytes
    !pcmpeqb xmm4, xmm4
    mov rdi, [p.p_Needle]
    movzx eax, byte [rdi]
    !je .l_sse2_search0
    mov ah, [rdi + 1]
    !pslldq xmm4, 15
    !.l_sse2_search0:
    !movd xmm2, eax
    !punpcklbw xmm2, xmm2
    !punpcklwd xmm2, xmm2
    !pshufd xmm3, xmm2, 01010101b   ; xmm3 = 16 times second needle byte
    !pshufd xmm2, xmm2, 0           ; xmm2 = 16 times first needle byte
   
    ; start search
    mov rsi, [p.p_HayStack]
    mov rcx, rsi
    shr rsi, 4                      ; align Haystack to 16 bytes
    shl rsi, 4
    sub rcx, rsi
    M_movdqa(xmm0, [rsi])           ; handle first 16 bytes
    !movdqa xmm1, xmm0
    !pcmpeqb xmm0, xmm2             ; compare against first needle byte
    !pmovmskb eax, xmm0
    !shr eax, cl                    ; shift off unwanted bytes
    !shl eax, cl
    !test eax, eax
    !jnz .l_sse2_search2
   
    ; main search loop
    !.l_sse2_search1:
    add rsi, 16                     ; next 16 bytes
    cmp rsi, [rsp - 32]
    !ja .l_sse2_search8
    M_movdqa(xmm0, [rsi])
    !movdqa xmm1, xmm0
    !pcmpeqb xmm0, xmm2             ; compare against first needle byte
    !pmovmskb eax, xmm0
    !test eax, eax
    !jz .l_sse2_search1             ; no match ? => search1
    !.l_sse2_search2:
    !pcmpeqb xmm1, xmm3             ; compare against second needle byte
    !psrldq xmm1, 1
    !por xmm1, xmm4
    !pmovmskb ecx, xmm1
    !and eax, ecx                   ; combine both searches
    !jz .l_sse2_search1             ; no match ? => search1
   
    ; compare rest of bytes
    !.l_sse2_search3:
    !bsf ecx, eax                   ; get index of first match
    !jz .l_sse2_search1
    !btr eax, ecx
    lea rdx, [rsi + rcx]            ; create a pointer to it
    cmp rdx, [rsp - 32]
    mov rcx, [p.v_NeedleSize]
    !ja .l_sse2_search8
    sub rcx, 2
    !jb .l_sse2_search5             ; NeedleSize < 2 ? => search5 (already done)
   
    !.l_sse2_search4:
    movzx ebx, word [rdx + rcx]     ; compare rest of needle right-to-left
    cmp bx, [rdi + rcx]             ; two bytes at a time
    !jne .l_sse2_search3
    sub rcx, 2
    !jae .l_sse2_search4
   
    !.l_sse2_search5:
    mov rbx, [rsp - 40]
    cmp rbx, -1
    !je .l_sse2_search6
    add rbx, 1                      ; increase count
    mov [rsp - 40], rbx
    !jmp .l_sse2_search3
    !.l_sse2_search6:
    mov rax, rdx                    ; return result
    sub rax, [p.p_HayStack]
    add rax, [p.v_Pos]   
    !.l_sse2_search7:
    mov rbx, [rsp -  8]
    mov rsi, [rsp - 16]
    mov rdi, [rsp - 24] 
    ProcedureReturn
   
    ; not found / return count
    !.l_sse2_search8:
    mov rax, [rsp - 40]
    !jmp .l_sse2_search7
   
EndProcedure 
 
HayStackstring$ = Space(1000000) + "!"
HayStackSize = Len(HayStackstring$)
*HayStack = AllocateMemory(HayStackSize)
PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)

Needlestring$ = "!"
NeedleSize = Len(Needlestring$)
*Needle = AllocateMemory(NeedleSize)
PokeS(*Needle,Needlestring$,-1,#PB_Ascii)

t1 = ElapsedMilliseconds()
For i = 1 To 1000
  Pos = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize)
Next
t2 = ElapsedMilliseconds() - t1

S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position : "+ Str(Pos)
MessageRequester("Test results", S)

Re: Why FindString() is so terribly slow?

Posted: Fri Nov 30, 2018 12:07 am
by RASHAD
Find repeated string

Code: Select all

EnableASM
Global Pos,count
.
.
Procedure SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize, Pos, Count)
.
.
EndProcedure 
 
HayStackstring$ = Space(10000) + "!#$" + Space(10000) + "!#$"
HayStackSize = Len(HayStackstring$)
*HayStack = AllocateMemory(HayStackSize)
PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)

Needlestring$ = "!#$"
NeedleSize = Len(Needlestring$)
*Needle = AllocateMemory(NeedleSize)
PokeS(*Needle,Needlestring$,-1,#PB_Ascii)

count = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize,0,10)
For i = 0 To count-1
  Pos = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize,0,0)
  S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position #"+Str(i+1)+" : "+ Str(Pos) + #CRLF$ + "Count : "+ Str(Count)
  MessageRequester("Test results", S)
  PokeS(*HayStack+pos,Space(NeedleSize),-1,#PB_Ascii)
Next

Re: Why FindString() is so terribly slow?

Posted: Fri Nov 30, 2018 1:53 am
by gurj
; my cpu is 1G
; out:
; 0
; 0
; 4 ms
; !

Code: Select all

ElapsedMilliseconds()
s.s=Space(10000)+"!"
Debug ElapsedMilliseconds()
t=ElapsedMilliseconds()
ss.s="!"
l=StringByteLength(ss)
ll=StringByteLength(s)
m=ll-l+1
Debug ElapsedMilliseconds()-t
t=ElapsedMilliseconds()
re:
k=CompareMemoryString(@ss,@s+n,#PB_String_CaseSensitive,l)
If k<>#PB_String_Equal
 If n<m
  n+l
  Goto re
 EndIf
Else
 Debug ElapsedMilliseconds()-t
 Debug PeekS(@s+n,l)
EndIf
; my cpu is 1G
; out:
; 0
; 0
; 4
; !