Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

@gurj

This must be repeated 1000 times!
User avatar
Michael Vogel
Addict
Addict
Posts: 2811
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Why FindString() is so terribly slow?

Post by Michael Vogel »

RASHAD wrote:Great topic
Very big thanks to all contributors
And thanks idle for pointing to wilbert staff
Up to now the winner is wilbert

Code: Select all

:
+1 Because it works for 32/64 bit as well as for Ascii and Unicode!

Two remarks:
+1 should be added to the result (not found=-1, first position=0,...)
33359272 and other results are seen here when debugger is active?!
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

@RASHAD

Code: Select all

HayStackSize = Len(HayStackstring$)
*HayStack = AllocateMemory(HayStackSize)
PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)
Making that outside the (1000) For-Loop does not make any meaningful time-comparison.


@Michael Vogel
Michael Vogel wrote:Because it works ... as well as for Ascii and Unicode!
No, the code ignores Unicode simply by converting everything to Ascii (do this also outside the For-Loop).
sorry for my bad english
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

RASHAD wrote:Great topic
Very big thanks to all contributors
And thanks idle for pointing to wilbert staff
Up to now the winner is wilbert
The code u posted does not work -> Pos = 0!

I assume Wilberts original code is SwarmiSearchSSE2() which does work.
Its just amazing what Wilbert and Helle can do. :)
But both functions are Ascii & x64 only!


Code: Select all

;SwarmiSearchSSE2 -> 
;~ 9 ms naked (all sting lengths are known) 
;~ 12 - 90 ms (if 1 string length is known) 
;~ 92 ms (both string lengths are unknown)

;FindString_SSE42_64Bit -> ~ 85 ms string length is not needed
;LocStrA -> ~ 128 ms string length is not needed
Heres the testcode:

Code: Select all

EnableExplicit

Global TestString.s
Global Signature.s
Global Index.i
Global T1.q, R1.i, *A1
Global T2.q, R2.i, *A2
Global L1.i, L2.i
  
Procedure.q FindString_SSE42_64Bit(pStr1, pStr2, StartPos)
  !mov rcx,[p.v_pStr1]
  !mov rdx,[p.v_pStr2]
  !mov rax,[p.v_StartPos]  
  
  !mov rax,r8                               ;need RAX for return-value ProcedureReturn, R8=StartPos
  !cmp rax,1
  !jge @f
  !mov rax,1                                ;or End or Error
  !@@:
  !mov r8,rcx                               ;need RCX for return-value PCMPISTRI, RCX=pStr1 
  !sub r8,1
  !add r8,rax                               ;rax=StartPos
  !movdqu xmm0,[rdx]                        ;first max.16 Bytes of Find$, RDX=pStr2
  !xor rcx,rcx
  !Search:
  !add rax,rcx
  !add r8,rcx
  !pcmpistri xmm0,dqword[r8],00001100b      ;Least Significant Index, Positive Polarity (IntRes2=IntRes1), Equal Ordered, Unsigned Bytes (=No Unicode)  
  !jz Base$End                              ;Zero-Flag is set if found end of Base$
  !js Find$End                              ;Sign-Flag is set if found end of Find$  
  !jrcxz @f                                 ;jump if RCX is zero 
  !jmp Search
  !@@:
  !movdqa xmm1,xmm0                         ;save xmm0
  !mov r9,r8
  !mov r10,rdx
  !@@:
  !add r10,16
  !add r9,16                                ;next 16 Bytes of Base$
  !movdqu xmm1,[r10]                        ;next 16 Bytes of Find$
  !pcmpistri xmm1,dqword[r9],00001100b      ;compare the strings 
  !jz Base$End
  !js Find$End
  !jrcxz @b
  !jmp Search
  !Base$End:
  !cmp rcx,16
  !jne Found
  !xor rax,rax
  !jmp NoFound
  !Find$End:
  !or rcx,rcx                               ;RCX Zero? Conventional
  !jnz Search
  !Found:
  !add rax,rcx
  !NoFound:
  ProcedureReturn                            ;rax 
EndProcedure 

Procedure SwarmiSearchSSE2(*haystack,len,*needle,lenN,pos=0);Result is wrong by 1 Character!
  ;Swarmi search SSE2 bit parallel search algorithm for for 1 to 16 byte needles   
  ;performs ~ 1:1 With booyer moore And other variants For a 16 byte needle With a 0-255 alphabet range 
  ;efficiency increases with parallelism. 2:1 for 8 byte needle, 4:1 for 4 byte needle, 8:1 for 2 byte needle and 16:1 1 byte needle 
  
  Protected offset
  !align 32
  DataSection 
  !m16:
  !dq 0xffffffffffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffffffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffffff
  !dq 0xffffffffffffffff
  !dq 0xffffff
  !dq 0xffffffffffffffff
  !dq 0xffff
  !dq 0xffffffffffffffff
  !dq 0xff
  !dq 0xffffffffffffffff
  !dq 0x0
  !dq 0xffffffffffffff
  !dq 0x0
  !dq 0xffffffffffff
  !dq 0x0
  !dq 0xffffffffff
  !dq 0x0
  !dq 0xffffffff
  !dq 0x0
  !dq 0xffffff
  !dq 0x0
  !dq 0xfff
  !dq 0x0
  !dq 0xff
  !dq 0x0
  EndDataSection   
  
  !mov rcx,16 
  !sub rcx,[p.v_lenN]
  !imul rcx,16
  
  !lea r8, [m16]
  !movdqu xmm8, [r8+rcx]   ;load trailing byte mask
     
  !mov r9, [p.p_needle]   ;build First char mask eg "a0a0a0a0a0a0a0"
  !movzx eax,byte [r9]  
  !movd xmm7, eax
  !punpcklbw xmm7, xmm7
  !punpcklwd xmm7, xmm7
  !pshufd xmm7, xmm7, 0    ;FirstCharMask
   
  !mov r8,[p.p_haystack]   ;align haysack 
  !mov rax,r8 
  !and rax ,15
  !mov [p.v_offset],rax    ;store offset 
  !sub r8,rax
  
  !movdqu xmm5,[r9]        ;store needle 
  !mov r14,[p.v_len]          
  !mov r13,[p.v_pos]
  !sub r13,16
        
  !Swarmi_While_Len:
  !cmp r13, r14
  !jge Swarmi_Wend
     !add r13,16  
     !movdqa xmm0,[r8+r13]  ;Get Haystack and check against mask chars"
     !pcmpeqb xmm0,xmm7     ;FirstCharMask 
     !pmovmskb eax,xmm0
     !cmp eax,0
  !je Swarmi_While_Len     
     !mov r11,r13       ;current position 
     !Swarmi_Redo:
     !bsf rdx,rax       ;if there was a match rax holds 1 at the byte positions where a match is found in mask   
     !btr rax,rdx
     !add r11,rdx       ;align the haystack step forward x bytes    
     !movdqu xmm0,[r8+r11] ;haystack 
     !pcmpeqb xmm0,xmm5   ;check equal haystack vs needle   
     !pandn xmm0,xmm8
     !pmovmskb edx,xmm0
     !cmp edx,0  
     !je Swarmi_Found
     !Swarmi_Test_Redo:
     !cmp eax,0           ;test if any remaining positions to process    
   !jne Swarmi_Redo       ;  
  !jmp Swarmi_While_Len 
   !Swarmi_Found:
     !cmp r13,r14
     !jge Swarmi_Wend 
     !mov rax, r11 
     !sub rax,[p.v_offset]
     ProcedureReturn 
  !Swarmi_Wend:  
  !mov rax, -1
  ProcedureReturn 
  
EndProcedure

Procedure.i LocStrA(*Input,*Signature);Ascii
  !mov rcx,[p.p_Input]
  !mov rdx,[p.p_Signature]
  !xor rax,rax
  !mov r10,rcx
  !cmp byte[rdx],0h
  !je LocStr_Return
  !LocStr_Loop:
  !cmp byte[rcx],0h
  !je LocStr_Return
  !mov bl,byte[rcx]
  !cmp bl,byte[rdx]
  !jne LocStr_Next
  !lea r8,[rcx+1h]
  !lea r9,[rdx+1h]
  !LocStr_Match:
  !cmp byte[r9],0h
  !je LocStr_Result 
  !cmp byte[r8],0h
  !je LocStr_Next
  !mov bl,byte[r8]
  !cmp bl,byte[r9]
  !jne LocStr_Next
  !inc r8
  !inc r9
  !jmp LocStr_Match
  !LocStr_Result:
  !sub rcx,r10
  !lea rax,[rcx+1h]
  !jmp LocStr_Return
  !LocStr_Next:
  !inc rcx
  !jmp LocStr_Loop
  !LocStr_Return:
  ProcedureReturn
EndProcedure

#Tries = 2e2 

TestString = Space(1000000) + "8!#"
Signature = "8!"

*A1 = Ascii(TestString)
*A2 = Ascii(Signature)

L1 = Len(TestString)
L2 = Len(Signature)

T1 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  ;L1 = Len(TestString);<- needs to be here unless u only work with known strings!
  ;L2 = Len(Signature)
  R1 = SwarmiSearchSSE2(*A1,L1,*A2,L2)
Next
T1 = ElapsedMilliseconds() - T1

T2 = ElapsedMilliseconds()
For Index = 1 To #Tries;10000000
  R2 = FindString_SSE42_64Bit(*A1,*A2,0);LocStrA(*A1,*A2);LocStrA(*A1,*A2)
Next
T2 = ElapsedMilliseconds() - T2

MessageRequester("Result",
                 "T1: " + Str(T1) + #CR$ + "R1: " + Str(R1) + #CR$ + #CR$ + 
                 "T2: " + Str(T2) + #CR$ + "R2: " + Str(R2))

End
User avatar
idle
Always Here
Always Here
Posts: 5929
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Why FindString() is so terribly slow?

Post by idle »

Mijikai wrote:
RASHAD wrote:Great topic
Very big thanks to all contributors
And thanks idle for pointing to wilbert staff
Up to now the winner is wilbert
The code u posted does not work -> Pos = 0!

I assume Wilberts original code is SwarmiSearchSSE2() which does work.
Its just amazing what Wilbert and Helle can do (but both are ascii only!).
No SwarmiSearchSSE2() was my creation but Wilberts SSE2_Find is a little better and it also supports 32 bit.
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

idle wrote: No SwarmiSearchSSE2() was my creation but Wilberts SSE2_Find is a little better and it also supports 32 bit.
My mistake then all credits go to you, thanks :)
(Where can i find Wilberts code?)
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

@Mijikai
Same problems as with Rashads code.

The test is irrelevant if you make preparings outside the for loop.

Code: Select all

EnableExplicit

Global TestString$
Global Signature$
Global i
Global T1.q, R1, *A1
Global T2.q, R2, *A2
Global T3.q, R3, *A3
Global L1, L2
  
  ...
  ...
  ...
  ...

#Tries = 1000

TestString$ = Space(1000000) + "8!#"
Signature$  = "8!"

T1 = ElapsedMilliseconds()
For i = 1 To #Tries
  L1 = Len(TestString$)
  L2 = Len(Signature$)
  *A1 = Ascii(TestString$)
  *A2 = Ascii(Signature$)
  R1 = SwarmiSearchSSE2(*A1,L1,*A2,L2)
  FreeMemory (*A1)
  FreeMemory (*A2)
Next
T1 = ElapsedMilliseconds() - T1

T2 = ElapsedMilliseconds()
For i = 1 To #Tries
  *A1 = Ascii(TestString$)
  *A2 = Ascii(Signature$)
  R2 = FindString_SSE42_64Bit(*A1,*A2,0)
  FreeMemory (*A1)
  FreeMemory (*A2)
Next
T2 = ElapsedMilliseconds() - T2

T3 = ElapsedMilliseconds()
For i = 1 To #Tries
  R3 = FindString(TestString$, Signature$)
Next
T3 = ElapsedMilliseconds() - T3


MessageRequester("Result",
                 "T1: " + T1 + #CR$ + "R1: " + R1 + #CR$ + #CR$ + 
                 "T2: " + T2 + #CR$ + "R2: " + R2 + #CR$ + #CR$ + 
                 "T3: " + T3 + #CR$ + "R3: " + R3)
In addition, there is no real Unicode search in your procedures.
sorry for my bad english
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Why FindString() is so terribly slow?

Post by davido »

@Mijikai,

wilbert's FindData Module:
viewtopic.php?p=467148
DE AA EB
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4955
Joined: Sun Apr 12, 2009 6:27 am

Re: Why FindString() is so terribly slow?

Post by RASHAD »

@Josh
You are absolutely right
A real comparison between wilbert and PB gave me the same result
Conclusion : I love PB :)

Code: Select all

.
.
.
t1 = ElapsedMilliseconds()
  HayStackstring$ = Space(1000000) + "!"
  HayStackSize = Len(HayStackstring$)
  *HayStack = AllocateMemory(HayStackSize)
  PokeS(*HayStack,HayStackstring$,-1,#PB_Ascii)
 
  Needlestring$ = "!"
  NeedleSize = Len(Needlestring$)
  *Needle = AllocateMemory(NeedleSize)
  PokeS(*Needle,Needlestring$,-1,#PB_Ascii)

  Pos = SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize)
t2 = ElapsedMilliseconds() - t1

S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position : "+ Str(Pos+NeedleSize)
MessageRequester("Test results", S)

HayStackstring$ = ""
Needlestring$ = ""

t1 = ElapsedMilliseconds()
  Stackstring$ = Space(1000000) + "!"
  Nstring$ = "!"
  pos = FindString(Stackstring$,Nstring$,1)
t2 = ElapsedMilliseconds() - t1
S.s = "SSE2 Find : " + Str(t2) + "ms" + #CRLF$ + "At Position : "+ Str(Pos)
MessageRequester("Test results", S)
Egypt my love
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Josh wrote:@Mijikai
The test is irrelevant if you make preparings outside the for loop.
In addition, there is no real Unicode search in your procedures.
The preparations u refering to are done for both functions before they enter the loop so i dont see you point.
I posted Unicode and Ascii versions of my functions for both x86 and x64.
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

davido wrote:@Mijikai,

wilbert's FindData Module
Thanks :)
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

Mijikai wrote:The preparations u refering to are done for both functions before they enter the loop so i dont see you point.
The 1000 loops here are to get better comparisons in the times. In practice, you will hardly try to find the same searchstring in a string 1000 times in a row.

So, all your preparations have to be done in the loop. Then you have comparable results and you will notice that there are only marginale time differences between your code and the native Pb FindString().
sorry for my bad english
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Josh wrote:
Mijikai wrote:The preparations u refering to are done for both functions before they enter the loop so i dont see you point.
The 1000 loops here are probably only there to get better comparisons in the times. In practice, you will hardly try to find the same searchstring in a string 1000 times in a row.

So, all your preparations have to be done in the loop. Then you have comparable results and you will notice that there are only marginale time differences between your code and the native Pb FindString().
I thought you are interested in compring the speeds betweend different search routines.
Including the Ascii() function makes no sense unless u like to make the result less realistic.
You can always measure the timing of Ascii() but then also dont include other calls.

Ur example makes further no sense since FindString() will operate in unicode mode.
What obviously will result in bogus results.
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

Mijikai wrote:I thought you are interested in compring the speeds betweend different search routines.
Yes, i want nothing else then comparing the speeds between different search routines. In my code here, I even included the native Pb FindString ().
Mijikai wrote:Including the Ascii() function makes no sense unless u like to make the result less realistic. You can always measure the timing of Ascii() but then also dont include other calls.
You don't understand, the Asc() functions must be in the loop!!! Your code does not bring realistic conditions. The times your code spits out are completely irrelevant for realistic comparisons with the native Pb FindString().
Mijikai wrote:Ur example makes further no sense since FindString() will operate in unicode mode.
What obviously will result in bogus results.
For some versions, Pb works only in Unicode mode and FindString () logically too. Please tell me, why this will result in bogus results.
sorry for my bad english
User avatar
skywalk
Addict
Addict
Posts: 4224
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Why FindString() is so terribly slow?

Post by skywalk »

Once PB went natively unicode, scanning ascii fast had to be done in memory, either using C calls or ASM.
Adding case sensitivity and other options to the ASM code brings them closer to straight C calls.
It is excellent to see the ASM and this topic in general, but the majority of my parsing requirements work with straight C calls or PB calls in threads.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Post Reply