Why FindString() is so terribly slow?

Just starting out? Need help? Post your questions and find answers here.
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Josh wrote:Yes, i want nothing else then comparing the speeds between different search routines. In my code here, I even included the native Pb FindString ().
You cant just call multiple functions and expect to get a realistic timing on one - thats just crazy.
At most u cover a single case condition and then still you dont have a accurate timing.
Josh wrote:For some versions, Pb works only in Unicode mode and FindString () logically too. Please tell me, why this will result in bogus results.
Really so theres no difference for you?
You want compare two functions that have the same conditions vs. something completely different?
Mby. im just insane and thats why i compared FindString() to the unicode version of my function...
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

skywalk wrote:Once PB went natively unicode, scanning ascii fast had to be done in memory, either using C calls or ASM.
Adding case sensitivity and other options to the ASM code brings them closer to straight C calls.
I agree, but i still would like to see some more unicode asm versions :)
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

Mijikai wrote:You cant just call multiple functions and expect to get a realistic timing on one - thats just crazy.
At most u cover a single case condition and then still you dont have a accurate timing.
Sorry, I can not follow you. Your time measurements are simply irrelevant and useless. But no matter, everyone should do what he wants.

Mijikai wrote:Ur example makes further no sense since FindString() will operate in unicode mode.
What obviously will result in bogus results.
What is the 'Ur example' example for you? For me, the 'Ur example' is the first code in this thread:

Code: Select all

s.s = Space(1000000) + "!"
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!")
Next
Debug ElapsedMilliseconds() - t
t = ElapsedMilliseconds()
For i = 1 To 1000
  x = FindString(s, "!", 0, #PB_String_NoCase)
Next
Debug ElapsedMilliseconds() - t
I can't see, why this will result in bogus results.
sorry for my bad english
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Josh wrote: What is the 'Ur example' example for you?
Well...
Josh wrote:In my code here, I even included the native Pb FindString ().

Code: Select all

EnableExplicit

Global TestString$
Global Signature$
Global i
Global T1.q, R1, *A1
Global T2.q, R2, *A2
Global T3.q, R3, *A3
Global L1, L2
  
  ...
  ...
  ...
  ...

#Tries = 1000

TestString$ = Space(1000000) + "8!#"
Signature$  = "8!"

T1 = ElapsedMilliseconds()
For i = 1 To #Tries
  L1 = Len(TestString$)
  L2 = Len(Signature$)
  *A1 = Ascii(TestString$)
  *A2 = Ascii(Signature$)
  R1 = SwarmiSearchSSE2(*A1,L1,*A2,L2)
  FreeMemory (*A1)
  FreeMemory (*A2)
Next
T1 = ElapsedMilliseconds() - T1

T2 = ElapsedMilliseconds()
For i = 1 To #Tries
  *A1 = Ascii(TestString$)
  *A2 = Ascii(Signature$)
  R2 = FindString_SSE42_64Bit(*A1,*A2,0)
  FreeMemory (*A1)
  FreeMemory (*A2)
Next
T2 = ElapsedMilliseconds() - T2

T3 = ElapsedMilliseconds()
For i = 1 To #Tries
  R3 = FindString(TestString$, Signature$)
Next
T3 = ElapsedMilliseconds() - T3


MessageRequester("Result",
                 "T1: " + T1 + #CR$ + "R1: " + R1 + #CR$ + #CR$ + 
                 "T2: " + T2 + #CR$ + "R2: " + R2 + #CR$ + #CR$ + 
                 "T3: " + T3 + #CR$ + "R3: " + R3)
And all the quoting that ensued...
User avatar
Michael Vogel
Addict
Addict
Posts: 2811
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Why FindString() is so terribly slow?

Post by Michael Vogel »

By the way, the original FindString is very fast in ASCII mode but could be also a used in unicode - here are the results of wilbert's routines including PureBasic's FindString (32 Bit):

Code: Select all

Test 2
----------
Tailed Substring : 	584ms	125
QuickSearch : 	722ms	125
Boyer-Moore : 	759ms	125
FastSearch : 	263ms	125
SSE2 Find : 	143ms	125
FindString : 	12ms		126
Wilbert's code hasn't been changed except some few things like:

Code: Select all

HayStackText.s="A small test with a small haystack and short needle."+Space(MoreSpaces)+" This will perform differently because some algorithms take some time to initialize"
NeedleText.s="initial"
PokeS(*HayStack,HayStackText, -1, #PB_Ascii)
PokeS(*Needle,NeedleText, -1, #PB_Ascii)

Anyhow SSE2_Find seems to be the perfect solution when using 64 bit!
For replacing original code using FindString, it would be fine to add a compatible offset handling and return identical results - the time consuming MemoryStringLength calculations can be handled seperately (I would assume that the length of the search phrase is known many times)...

Code: Select all

Procedure MyFindString(*String,*Search,StartPos=1)
		If StartPos>1
			*String+StartPos-1
		Else
			StartPos=1
		EndIf
		CompilerIf #PB_Compiler_Processor=#PB_Processor_x86
			*String=FindData::SSE2_Find(*String,MyStringLen(*String),*Search,MyStringLen(*Search))
		CompilerElse
			*String=FindData::SSE2_Find(*String,MemoryStringLength(*String,#PB_Ascii),*Search,MemoryStringLength(*Search,#PB_Ascii))
		CompilerEndIf
		If *String<0
			ProcedureReturn #Null
		Else
			ProcedureReturn StartPos+*String
		EndIf
	EndProcedure
And this one is faster than MemoryStringLength (32 bit only):

Code: Select all

Procedure.l MyStringLen(*String)

		!MOV     Eax, [p.p_String]
		!LEA     Edx, [Eax+3]
		!L2_RUN:
		!MOV     Ebx, [Eax]
		!ADD     Eax, 4
		!LEA     Ecx, [Ebx-01010101h]
		!NOT     Ebx
		!AND     Ecx, Ebx
		!AND     Ecx, 80808080h
		!JZ      L2_RUN

		!MOV     Ebx, Ecx
		!SHR     Ebx, 16
		!TEST    Ecx, 00008080h
		!cmovz   Ecx, Ebx
		!LEA     Ebx, [Eax+2]
		!cmovz   Eax, Ebx
		!ADD     cl,  cl
		!SBB     Eax, Edx
		ProcedureReturn

	EndProcedure
User avatar
Mijikai
Addict
Addict
Posts: 1520
Joined: Sun Sep 11, 2016 2:17 pm

Re: Why FindString() is so terribly slow?

Post by Mijikai »

Michael Vogel wrote:By the way, the original FindString is very fast in ASCII mode but could be also a used in unicode - here are the results of wilbert's routines including PureBasic's FindString (32 Bit):
Pls. post the Testcode.
User avatar
Michael Vogel
Addict
Addict
Posts: 2811
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Why FindString() is so terribly slow?

Post by Michael Vogel »

Mijikai wrote:Pls. post the Testcode.

Code: Select all

DisableDebugger

#BinaryTest=	0
#CalculateLength=1
#AddSpaces=	1;500
#TestCount1=	10000
#TestCount2=	50000

; Define Wilbert's Search Module
	DeclareModule FindData

		; all routines return a zero based index or -1 (not found)

		Declare TS(*HayStack, HayStackSize, *Needle, NeedleSize.l)          ; Tailed Substring algorithm
		Declare QS(*HayStack, HayStackSize, *Needle, NeedleSize.l)          ; Quick Search algorithm
		Declare BM(*HayStack, HayStackSize, *Needle, NeedleSize.u)          ; Boyer-Moore algorithm
		Declare FastSearch(*HayStack, HayStackSize, *Needle, NeedleSize)    ; Based on the Python fast search algorithm
		Declare SSE2_Find(*Haystack, HaystackSize, *Needle, NeedleSize, Count = #False)

	EndDeclareModule
	Module FindData

		EnableASM
		EnableExplicit
		DisableDebugger

		CompilerIf #PB_Compiler_Processor=#PB_Processor_x86
			Macro rax : eax : EndMacro
			Macro rbx : ebx : EndMacro
			Macro rcx : ecx : EndMacro
			Macro rdx : edx : EndMacro
			Macro rsi : esi : EndMacro
			Macro rdi : edi : EndMacro
			Macro rbp : ebp : EndMacro
			Macro rsp : esp : EndMacro
		CompilerEndIf

		Macro M_movdqa(arg1, arg2)
			!movdqa arg1, arg2
		EndMacro


		; *** Tailed Substring algorithm code ***
		Macro M_TS_Phase(phase)
			!finddata.l_ts_#phase#0:
			movzx rax, byte [rsi + rcx]
			!finddata.l_ts_#phase#1:
			cmp al, [rdi + rcx]
			!je finddata.l_ts_#phase#2
			inc rdi
			cmp rdi, rbp
			!jbe finddata.l_ts_#phase#1
			!jmp finddata.l_ts_exit_notfound
			!finddata.l_ts_#phase#2:
			sub rdx, rdx
			!finddata.l_ts_#phase#3:
			movzx rax, byte [rsi + rdx]
			cmp al, [rdi + rdx]
			!jne finddata.l_ts_#phase#4
			inc rdx
			cmp rdx, rbx
			!jne finddata.l_ts_#phase#3
			mov rax, rdi
			sub rax, [p.p_HayStack]
			!jmp finddata.l_ts_exit_found
			!finddata.l_ts_#phase#4:
			CompilerIf phase = 1
				mov rdx, rcx
				dec rdx
				!js finddata.l_ts_#phase#6
				movzx rax, byte [rsi + rcx]
				!finddata.l_ts_#phase#5:
				cmp al, [rsi + rdx]
				!je finddata.l_ts_#phase#6
				dec rdx
				!jns finddata.l_ts_#phase#5
				!finddata.l_ts_#phase#6:
				mov rax, rcx
				sub rax, rdx
				cmp rax, [rsp - 48]             ; if i-h > dim => {k=i; dim=i-h;}
				!jng finddata.l_ts_#phase#7
				mov [rsp - 40], rcx
				mov [rsp - 48], rax
				!finddata.l_ts_#phase#7:
				add rdi, rcx                    ;  s + i
				sub rdi, rdx                    ;  s - h
				dec rcx
				cmp rcx, [rsp - 48]
				!jb finddata.l_ts_phase2        ; if i < dim => proceed with phase 2
			CompilerElse
				; phase 2
				add rdi, [rsp - 48]
			CompilerEndIf
			cmp rdi, rbp
			!jbe finddata.l_ts_#phase#0
			!jmp finddata.l_ts_exit_notfound

		EndMacro
		Procedure TS(*HayStack, HayStackSize, *Needle, NeedleSize.l)

			; backup some registers
			mov [rsp -  8], rbx
			mov [rsp - 16], rsi
			mov [rsp - 24], rdi
			mov [rsp - 32], rbp

			; init some things
			mov ebx, [p.v_NeedleSize]
			mov ecx, ebx
			dec ecx
			!js finddata.l_ts_exit_notfound   ; exit when NeedleSize < 1
			mov rbp, [p.v_HayStackSize]
			sub rbp, rbx
			!jc finddata.l_ts_exit_notfound   ; exit when NeedleSize > HayStackSize
			mov rsi, [p.p_Needle]
			mov rdi, [p.p_HayStack]
			add rbp, rdi
			mov rax, 1
			mov [rsp - 40], rcx               ; k
			mov [rsp - 48], rax               ; dim

			; search
			M_TS_Phase(1)
			!finddata.l_ts_phase2:
			mov rcx, [rsp - 40]
			M_TS_Phase(2)

			; exit
			!finddata.l_ts_exit_notfound:
			mov rax, -1
			!finddata.l_ts_exit_found:
			mov rbx, [rsp -  8]
			mov rsi, [rsp - 16]
			mov rdi, [rsp - 24]
			mov rbp, [rsp - 32]
			ProcedureReturn

		EndProcedure

		; *** Code based on Quick Search / Sunday algorithm ***
		Macro M_QS_Search(n = 0)
			!finddata.l_qs_search0#n:
			!xor ecx, ecx
			!finddata.l_qs_search1#n:
			movzx eax, byte [rsi + rcx]
			cmp al, [rdi + rcx]
			!je finddata.l_qs_continue#n
			CompilerIf n = 0
				movzx eax, byte [rdi + rbx]
				mov eax, [rsp + rax * 4 - 1024]
				add rdi, rax
				cmp rdi, rdx
				!jb finddata.l_qs_search0#n
				!je finddata.l_qs_finalcheck
			CompilerEndIf
			!jmp finddata.l_qs_exit_notfound
			!finddata.l_qs_continue#n:
			!inc ecx
			!cmp ecx, ebx
			!jb finddata.l_qs_search1#n
			mov rax, rdi
			sub rax, [p.p_HayStack]
			!jmp finddata.l_qs_exit_found
		EndMacro
		Procedure QS(*HayStack, HayStackSize, *Needle, NeedleSize.l)

			; backup some registers
			mov [rsp - 1032], rbx
			mov [rsp - 1040], rsi
			mov [rsp - 1048], rdi

			; perform some checks
			mov ebx, [p.v_NeedleSize]
			cmp ebx, 1
			!jl finddata.l_qs_exit_notfound
			mov rdx, [p.v_HayStackSize]
			sub rdx, rbx
			!jc finddata.l_qs_exit_notfound   ; exit when NeedleSize > HayStackSize

			; prepare
			lea rdi, [rsp - 1024]
			lea eax, [ebx + 1]
			mov ecx, 256
			cld
			rep stosd
			mov rsi, [p.p_Needle]
			mov rdx, rsi
			mov ecx, ebx
			!finddata.l_qs_prep_table:
			movzx eax, byte [rdx]
			mov [rsp + rax * 4 - 1024], ecx
			inc rdx
			dec ecx
			!jnz finddata.l_qs_prep_table

			; search
			mov rdi, [p.p_HayStack]
			mov rdx, [p.v_HayStackSize]
			sub rdx, rbx
			!jz finddata.l_qs_finalcheck      ; jump to finalcheck if NeedleSize = HayStackSize
			add rdx, rdi
			M_QS_Search(0)
			!finddata.l_qs_finalcheck:
			M_QS_Search(1)

			; exit
			!finddata.l_qs_exit_notfound:
			mov rax, -1
			!finddata.l_qs_exit_found:
			mov rbx, [rsp - 1032]
			mov rsi, [rsp - 1040]
			mov rdi, [rsp - 1048]
			ProcedureReturn

		EndProcedure

		; *** Code based on Boyer-Moore algorithm ***
		Macro M_BM(n=0)
			CompilerIf n=0
				add rsi, rcx
			CompilerElse
				add rsi, 1
			CompilerEndIf
			mov ecx, ebx
			cmp rsi, rdx
			!jna finddata.l_bm_search1
			!jmp finddata.l_bm_exit
		EndMacro
		Procedure BM(*HayStack, HayStackSize, *Needle, NeedleSize.u)

			; Boyer-Moore algorithm
			; based on code from 'schic'
			; http://www.purebasic.fr/english/viewtopic.php?p=130032#p130032

			; backup some registers
			mov [rsp - 520], rbx
			mov [rsp - 528], rsi
			mov [rsp - 536], rdi

			; perform some checks
			movzx ebx, word [p.v_NeedleSize]   ; exit if NeedleSize = 0
			cmp ebx, 0
			!je finddata.l_bm_exit
			mov rdx, [p.v_HayStackSize]        ; exit if NeedleSize > HayStackSize
			sub rdx, rbx
			!jc finddata.l_bm_exit

			; prepare
			lea rdi, [rsp - 512]
			mov eax, 0xff
			mov ecx, 512
			cld
			rep stosb
			mov rdi, [p.p_Needle]
			mov ecx, ebx
			mov rsi, rdi
			!finddata.l_bm_table0:
			movzx eax, byte [rsi]
			add rsi, 1
			sub ecx, 1
			mov [rsp + rax * 2 - 512], cx
			!jnz finddata.l_bm_table0
			mov rsi, [p.p_HayStack]
			add rdx, rsi

			; search
			mov ecx, ebx

			!finddata.l_bm_search1:
			movzx eax, byte [rsi + rcx - 1]
			cmp al, [rdi + rcx - 1]
			!je finddata.l_bm_search4

			movzx eax, word [rsp + rax * 2 - 512]
			cmp ax, 0xffff
			!jne finddata.l_bm_search2
			M_BM()

			!finddata.l_bm_search2:
			add ecx, eax
			sub ecx, ebx
			!jna finddata.l_bm_search3
			M_BM()

			!finddata.l_bm_search3:
			M_BM(1)

			!finddata.l_bm_search4:
			sub ecx, 1
			!jnz finddata.l_bm_search1

			mov rax, rsi
			sub rax, *HayStack
			!finddata.l_bm_return:
			mov rbx, [rsp - 520]
			mov rsi, [rsp - 528]
			mov rdi, [rsp - 536]
			ProcedureReturn

			; exit not found
			!finddata.l_bm_exit:
			mov rax, -1
			!jmp finddata.l_bm_return

		EndProcedure

		; *** Code based on Python fast search algorithm ***
		Procedure FastSearch(*HayStack, HayStackSize, *Needle, NeedleSize)

			; Based on the Python fast search algorithm (Python License)

			Protected.i reg_bx, reg_si, reg_di
			Protected.i hs_end, skip

			; backup registers
			mov [p.v_reg_bx], rbx
			mov [p.v_reg_si], rsi
			mov [p.v_reg_di], rdi

			; perform some checks
			mov rbx, [p.v_NeedleSize]
			mov rcx, [p.v_HayStackSize]
			cmp rbx, rcx
			!jg finddata.l_fs_exit                ; exit when NeedleSize > HayStackSize
			cmp rbx, 1
			!jg finddata.l_fs_prep
			!jne finddata.l_fs_exit               ; exit when NeedleSize < 1

			; handle needle size 1
			mov rsi, [p.p_HayStack]
			mov rdi, [p.p_Needle]
			mov rax, rcx
			mov bl, [rdi]
			!finddata.l_fs_sns0:
			cmp bl, [rsi]
			!je finddata.l_fs_sns1
			inc rsi
			dec rcx
			!jnz finddata.l_fs_sns0
			!jmp finddata.l_fs_exit
			!finddata.l_fs_sns1:
			sub rax, rcx
			!jmp finddata.l_fs_return

			; prepare (skip and register rdx mask)
			!finddata.l_fs_prep:
			mov rdi, [p.p_Needle]
			lea rsi, [rbx - 2]
			mov [p.v_skip], rsi
			mov rdx, 0
			movzx ecx, byte [rdi + rsi + 1]
			bts rdx, rcx
			!finddata.l_fs_prep0:
			movzx eax, byte [rdi + rsi]
			bts rdx, rax
			sub rsi, 1
			!js finddata.l_fs_prep2
			cmp eax, ecx
			!jne finddata.l_fs_prep0
			lea rax, [rsi + 1]
			sub [p.v_skip], rax
			!finddata.l_fs_prep1:
			movzx eax, byte [rdi + rsi]
			bts rdx, rax
			sub rsi, 1
			!jns finddata.l_fs_prep1
			!finddata.l_fs_prep2:

			; search
			mov rax, [p.v_HayStackSize]
			sub rax, rbx
			mov rsi, [p.p_HayStack]
			add rax, rsi
			mov [p.v_hs_end], rax
			movzx ecx, byte [rdi + rbx - 1]
			dec rsi

			!finddata.l_fs_search0:
			inc rsi
			cmp rsi, [p.v_hs_end]                 ; check for haystack end boundary
			!jae finddata.l_fs_search7
			cmp cl, [rsi + rbx - 1]               ; compare last needle byte
			!je finddata.l_fs_search2             ; equal => continue with search2
			movzx eax, byte [rsi + rbx]
			bt rdx, rax                           ; 'Sunday' check
			!jc finddata.l_fs_search0
			!finddata.l_fs_search1:
			add rsi, rbx
			!jmp finddata.l_fs_search0

			!finddata.l_fs_search2:               ; compare first two needle bytes
			movzx eax, word [rdi]
			cmp ax, [rsi]
			!je finddata.l_fs_search4             ; equal => continue with search4
			!finddata.l_fs_search3:
			movzx eax, byte [rsi + rbx]           ; 'Sunday' check
			bt rdx, rax
			!jnc finddata.l_fs_search1
			add rsi, [p.v_skip]                   ; 'Horspool' skip
			!jmp finddata.l_fs_search0

			!finddata.l_fs_search4:               ; compare remaining needle bytes
			mov rax, rbx
			sub rax, 3
			!jna finddata.l_fs_search9            ; no more bytes left => found !
			push rcx                              ; keep a copy of rcx
			lea rcx, [rax + 1]
			shr rcx, 1
			!finddata.l_fs_search5:
			movzx eax, word [rdi + rcx * 2]       ; compare two bytes at a time
			cmp ax, [rsi + rcx * 2]
			!jne finddata.l_fs_search6            ; not equal => restore rcx and back to search3
			dec rcx
			!jnz finddata.l_fs_search5            ; no more bytes left ?
			pop rcx                               ; restore rcx
			!jmp finddata.l_fs_search9            ; => found !
			!finddata.l_fs_search6:
			pop rcx
			!jmp finddata.l_fs_search3

			!finddata.l_fs_search7:               ; code to handle haystack exactly on end boundary
			!ja finddata.l_fs_exit                ; boundary exceeded => exit
			!finddata.l_fs_search8:               ; final check
			movzx eax, byte [rdi + rbx - 1]
			cmp al, [rsi + rbx - 1]
			!jne finddata.l_fs_exit
			dec rbx
			!jnz finddata.l_fs_search8
			!finddata.l_fs_search9:               ; return result
			mov rax, rsi
			sub rax, [p.p_HayStack]

			; restore registers & return result
			!finddata.l_fs_return:
			mov rbx, [p.v_reg_bx]
			mov rsi, [p.v_reg_si]
			mov rdi, [p.v_reg_di]
			ProcedureReturn

			; exit not found
			!finddata.l_fs_exit:
			mov rax, -1
			!jmp finddata.l_fs_return

		EndProcedure

		; *** End of code based on Python fast search algorithm ***
		Procedure SSE2_Find(*Haystack, HaystackSize, *Needle, NeedleSize, Count = #False)

			; backup some registers
			mov [rsp -  8], rbx
			mov [rsp - 16], rsi
			mov [rsp - 24], rdi

			; init count
			mov rax, [p.v_Count]
			sub rax, 1
			sbb rax, rax
			mov [rsp - 40], rax

			; perform some checks
			mov rax, [p.v_HaystackSize]
			mov rbx, [p.v_NeedleSize]
			sub rax, rbx
			!jc finddata.l_sse2_search8     ; exit if NeedleSize > HaystackSize
			add rax, [p.p_Haystack]
			mov [rsp - 32], rax             ; rsp - 32 = *SearchEnd
			cmp rbx, 1
			!jl finddata.l_sse2_search8     ; exit if NeedleSize < 1

			; load first two needle bytes
			!pcmpeqb xmm4, xmm4
			mov rdi, [p.p_Needle]
			movzx eax, byte [rdi]
			!je finddata.l_sse2_search0
			mov ah, [rdi + 1]
			!pslldq xmm4, 15
			!finddata.l_sse2_search0:
			!movd xmm2, eax
			!punpcklbw xmm2, xmm2
			!punpcklwd xmm2, xmm2
			!pshufd xmm3, xmm2, 01010101b   ; xmm3 = 16 times second needle byte
			!pshufd xmm2, xmm2, 0           ; xmm2 = 16 times first needle byte

			; start search
			mov rsi, [p.p_Haystack]
			mov rcx, rsi
			shr rsi, 4                      ; align Haystack to 16 bytes
			shl rsi, 4
			sub rcx, rsi
			M_movdqa(xmm0, [rsi])           ; handle first 16 bytes
			!movdqa xmm1, xmm0
			!pcmpeqb xmm0, xmm2             ; compare against first needle byte
			!pmovmskb eax, xmm0
			!shr eax, cl                    ; shift off unwanted bytes
			!shl eax, cl
			!test eax, eax
			!jnz finddata.l_sse2_search2

			; main search loop
			!finddata.l_sse2_search1:
			add rsi, 16                     ; next 16 bytes
			cmp rsi, [rsp - 32]
			!ja finddata.l_sse2_search8
			M_movdqa(xmm0, [rsi])
			!movdqa xmm1, xmm0
			!pcmpeqb xmm0, xmm2             ; compare against first needle byte
			!pmovmskb eax, xmm0
			!test eax, eax
			!jz finddata.l_sse2_search1     ; no match ? => search1
			!finddata.l_sse2_search2:
			!pcmpeqb xmm1, xmm3             ; compare against second needle byte
			!psrldq xmm1, 1
			!por xmm1, xmm4
			!pmovmskb ecx, xmm1
			!and eax, ecx                   ; combine both searches
			!jz finddata.l_sse2_search1     ; no match ? => search1

			; compare rest of bytes
			!finddata.l_sse2_search3:
			!bsf ecx, eax                   ; get index of first match
			!jz finddata.l_sse2_search1
			!btr eax, ecx
			lea rdx, [rsi + rcx]            ; create a pointer to it
			cmp rdx, [rsp - 32]
			mov rcx, [p.v_NeedleSize]
			!ja finddata.l_sse2_search8
			sub rcx, 2
			!jb finddata.l_sse2_search5     ; NeedleSize < 2 ? => search5 (already done)

			!finddata.l_sse2_search4:
			movzx ebx, word [rdx + rcx]     ; compare rest of needle right-to-left
			cmp bx, [rdi + rcx]             ; two bytes at a time
			!jne finddata.l_sse2_search3
			sub rcx, 2
			!jae finddata.l_sse2_search4

			!finddata.l_sse2_search5:
			mov rbx, [rsp - 40]
			cmp rbx, -1
			!je finddata.l_sse2_search6
			add rbx, 1                      ; increase count
			mov [rsp - 40], rbx
			!jmp finddata.l_sse2_search3
			!finddata.l_sse2_search6:
			mov rax, rdx                    ; return result
			sub rax, [p.p_Haystack]
			!finddata.l_sse2_search7:
			mov rbx, [rsp -  8]
			mov rsi, [rsp - 16]
			mov rdi, [rsp - 24]
			ProcedureReturn

			; not found / return count
			!finddata.l_sse2_search8:
			mov rax, [rsp - 40]
			!jmp finddata.l_sse2_search7

		EndProcedure

	EndModule
; EndDefine
; Define Search Routines
	Procedure.i Ascii(Unicode.s)
		Protected *b=AllocateMemory(StringByteLength(Unicode)+1)
		PokeS(*b,Unicode,-1,#PB_Ascii)
		ProcedureReturn *b
	EndProcedure
	Procedure.i LocStrW(*Input,*Signature)
		CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
			!mov rcx,[p.p_Input]
			!mov rdx,[p.p_Signature]
			!xor rax,rax
			!mov r10,rcx
			!cmp word[rdx],0h
			!je LocStr_Return1
			!LocStr_Loop1:
			!cmp word[rcx],0h
			!je LocStr_Return1
			!mov bx,word[rcx]
			!cmp bx,word[rdx]
			!jne LocStr_Next1
			!lea r8,[rcx+2h]
			!lea r9,[rdx+2h]
			!LocStr_Match1:
			!cmp word[r9],0h
			!je LocStr_Result1
			!cmp word[r8],0h
			!je LocStr_Next1
			!mov bx,word[r8]
			!cmp bx,word[r9]
			!jne LocStr_Next1
			!lea r8,[r8+2h]
			!lea r9,[r9+2h]
			!jmp LocStr_Match1
			!LocStr_Result1:
			!sub rcx,r10
			!lea rax,[rcx+2h]
			!shr rax,1h
			!jmp LocStr_Return1
			!LocStr_Next1:
			!lea rcx,[rcx+2h]
			!jmp LocStr_Loop1
			!LocStr_Return1:
			ProcedureReturn
		CompilerElse
			!mov ecx,[p.p_Input]
			!mov edx,[p.p_Signature]
			!xor eax,eax
			!cmp word[edx],0h
			!je LocStr_Return1
			!LocStr_Loop1:
			!cmp word[ecx],0h
			!je LocStr_Return1
			!mov bx,word[ecx]
			!cmp bx,word[edx]
			!jne LocStr_Next1
			!lea esi,[ecx+2h]
			!lea edi,[edx+2h]
			!LocStr_Match1:
			!cmp word[edi],0h
			!je LocStr_Result1
			!cmp word[esi],0h
			!je LocStr_Next1
			!mov bx,word[esi]
			!cmp bx,word[edi]
			!jne LocStr_Next1
			!lea esi,[esi+2h]
			!lea edi,[edi+2h]
			!jmp LocStr_Match1
			!LocStr_Result1:
			!sub ecx,[p.p_Input]
			!lea eax,[ecx+2h]
			!shr eax,1h
			!jmp LocStr_Return1
			!LocStr_Next1:
			!lea ecx,[ecx+2h]
			!jmp LocStr_Loop1
			!LocStr_Return1:
			ProcedureReturn
		CompilerEndIf
	EndProcedure
	Procedure.i LocStrA(*Input,*Signature)

		CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
			!mov rcx,[p.p_Input]
			!mov rdx,[p.p_Signature]
			!xor rax,rax
			!mov r10,rcx
			!cmp byte[rdx],0h
			!je LocStr_Return2
			!LocStr_Loop2:
			!cmp byte[rcx],0h
			!je LocStr_Return2
			!mov bl,byte[rcx]
			!cmp bl,byte[rdx]
			!jne LocStr_Next2
			!lea r8,[rcx+1h]
			!lea r9,[rdx+1h]
			!LocStr_Match2:
			!cmp byte[r9],0h
			!je LocStr_Result2
			!cmp byte[r8],0h
			!je LocStr_Next2
			!mov bl,byte[r8]
			!cmp bl,byte[r9]
			!jne LocStr_Next2
			!inc r8
			!inc r9
			!jmp LocStr_Match2
			!LocStr_Result2:
			!sub rcx,r10
			!lea rax,[rcx+1h]
			!jmp LocStr_Return2
			!LocStr_Next2:
			!inc rcx
			!jmp LocStr_Loop2
			!LocStr_Return2:
			ProcedureReturn
		CompilerElse
			!mov ecx,[p.p_Input]
			!mov edx,[p.p_Signature]
			!xor eax,eax
			!cmp byte[edx],0h
			!je LocStr_Return2
			!LocStr_Loop2:
			!cmp byte[ecx],0h
			!je LocStr_Return2
			!mov bl,byte[ecx]
			!cmp bl,byte[edx]
			!jne LocStr_Next2
			!lea esi,[ecx+1h]
			!lea edi,[edx+1h]
			!LocStr_Match2:
			!cmp byte[edi],0h
			!je LocStr_Result2
			!cmp byte[esi],0h
			!je LocStr_Next2
			!mov bl,byte[esi]
			!cmp bl,byte[edi]
			!jne LocStr_Next2
			!inc esi
			!inc edi
			!jmp LocStr_Match2
			!LocStr_Result2:
			!sub ecx,[p.p_Input]
			!lea eax,[ecx+1h]
			!jmp LocStr_Return2
			!LocStr_Next2:
			!inc ecx
			!jmp LocStr_Loop2
			!LocStr_Return2:
			ProcedureReturn
		CompilerEndIf
	EndProcedure
	Procedure.l StringFind(*String,*Search,StartPos=1)

		Protected MemString=MemoryStringLength(*String);		StringLen(*String);
		Protected MemSearch=MemoryStringLength(*Search);	StringLen(*Search);

		!MOV Esi,[p.p_Search]
		!MOV Edx,[p.v_MemSearch]

		!Or Edx,Edx
		!JZ l_FSFertig		; Länge des Suchtexts= 0

		!MOV Edi,[p.p_String]
		!ADD Edi,[p.v_StartPos]
		!DEC Edi
		!MOV Ecx,[p.v_MemString]
		!SUB Ecx,[p.v_StartPos]
		!ADD Ecx,2
		!SUB Ecx,Edx
		!JB l_FSFertig		; Suchtext länger als String

		!CLD				; clear direction flag -> DF=0
		!l_FSSearch:
		!LODSB				; = mov al, [esi]: if DF=0 esi+1 (else esi-1)
		; first char of pattern to al

		!REPNZ SCASB		; Al - [edi]:edi+1:ecx-1 and repeat while not null or ecx>0
		; compare first char of pattern with source until it matches or ecx is counted down

		!JNZ l_FSFertig		; Al - [edi]<>0 but ecx=0 (end of SrcMem reached)

		!MOV Eax,Edi
		!MOV Ebx,Ecx
		!MOV Ecx,Edx
		!DEC Ecx
		!REPZ CMPSB		; [esi] - [edi]: (if DF=0) esi+1: edi+1: ecx-1 and repeat while null or ecx>0
		!JZ l_FSFound		; [esi] - [edi] was 0 and ecx is 0 -> whole pattern matched

		; else ecx is 0 but [esi]-[edi] <> 0
		!MOV Edi,Eax
		!MOV Ecx,Ebx
		!MOV Esi,[p.p_Search]
		!JMP l_FSSearch

		!l_FSFound:
		!SUB Eax,[p.p_String]
		ProcedureReturn

		!l_FSFertig:
		!MOV eax,0
		ProcedureReturn

	EndProcedure
	Procedure.l MyStringLen(*String)

		!MOV     Eax, [p.p_String]
		!LEA     Edx, [Eax+3]
		!L2_RUN:
		!MOV     Ebx, [Eax]
		!ADD     Eax, 4
		!LEA     Ecx, [Ebx-01010101h]
		!NOT     Ebx
		!AND     Ecx, Ebx
		!AND     Ecx, 80808080h
		!JZ      L2_RUN

		!MOV     Ebx, Ecx
		!SHR     Ebx, 16
		!TEST    Ecx, 00008080h
		!cmovz   Ecx, Ebx
		!LEA     Ebx, [Eax+2]
		!cmovz   Eax, Ebx
		!ADD     cl,  cl
		!SBB     Eax, Edx
		ProcedureReturn

	EndProcedure

	Procedure MyFindString(*String,*Search,StartPos=1)
		If StartPos>1
			*String+(StartPos-1)<<#PB_Compiler_Unicode
		Else
			StartPos=1
		EndIf
		CompilerIf #PB_Compiler_Processor=#PB_Processor_x86; And #PB_Compiler_Unicode=#Null
			*String=FindData::SSE2_Find(*String,MyStringLen(*String),*Search,MyStringLen(*Search))
		CompilerElse
			*String=FindData::SSE2_Find(*String,MemoryStringLength(*String,#PB_Ascii),*Search,MemoryStringLength(*Search,#PB_Ascii))
		CompilerEndIf
		If *String<0
			ProcedureReturn #Null
		Else
			ProcedureReturn StartPos+*String
		EndIf
	EndProcedure

; EndDefine

; Define Computer Type

	CompilerIf #PB_Compiler_OS = #PB_OS_Windows
		S.s = "Windows "
	CompilerElseIf #PB_Compiler_OS = #PB_OS_MacOS
		S.s = "OSX "
	CompilerElse
		S.s = "Linux "
	CompilerEndIf

	CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
		S + "(x86) "
	CompilerElse
		S + "(x64) "
	CompilerEndIf

	S + #CRLF$ + CPUName() + #CRLF$ + #CRLF$

; EndDefine
; Define Initialize Memory

	HayStackSize = 65536
	NeedleSize = 64

	*HayStack = AllocateMemory(HayStackSize)
	*Needle = AllocateMemory(NeedleSize)

; EndDefine

; Define Search Binary Data

	If #BinaryTest

		RandomSeed(0)
		RandomData(*HayStack, HayStackSize)
		CopyMemory(*HayStack + HayStackSize - NeedleSize - 32, *Needle, NeedleSize)

		t0 = ElapsedMilliseconds()
		For i = 1 To #TestCount1
			x1 = FindData::TS(*HayStack, HayStackSize, *Needle, NeedleSize)
		Next
		t1 = ElapsedMilliseconds()
		For i = 1 To #TestCount1
			x2 = FindData::QS(*HayStack, HayStackSize, *Needle, NeedleSize)+1
		Next
		t2 = ElapsedMilliseconds()
		For i = 1 To #TestCount1
			x3 = FindData::BM(*HayStack, HayStackSize, *Needle, NeedleSize)+1
		Next
		t3 = ElapsedMilliseconds()
		For i = 1 To #TestCount1
			x4 = FindData::FastSearch(*HayStack, HayStackSize, *Needle, NeedleSize)+1
		Next
		t4 = ElapsedMilliseconds()
		For i = 1 To #TestCount1
			x5 = FindData::SSE2_Find(*HayStack, HayStackSize, *Needle, NeedleSize)+1
		Next
		t5 = ElapsedMilliseconds()


		S + "Binary Data Search Test" + #CRLF$
		S + "-------------------------" + #CRLF$
		S + "Tailed Substring : " + #TAB$+Str(t1-t0) + "ms" + #TAB$+Str(x1)+#CRLF$
		S + "QuickSearch : " + #TAB$+Str(t2-t1) + "ms" + #TAB$+Str(x2)+#CRLF$
		S + "Boyer-Moore : " + #TAB$+Str(t3-t2) + "ms" + #TAB$+Str(x3)+#CRLF$
		S + "FastSearch : " + #TAB$+Str(t4-t3) + "ms" + #TAB$+Str(x4)+#CRLF$
		S + "SSE2 Find : " + #TAB$+Str(t5-t4) + "ms" + #TAB$+Str(x5)+#CRLF$
		S + "FindString : " + #TAB$+"-" + #TAB$+"-"+#CRLF$

	EndIf

; EndDefine
; Define Search String Data

	Macro CalculateLength
		HayStackSize2 = MemoryStringLength(*HayStack,#PB_Ascii)
		NeedleSize2 = MemoryStringLength(*Needle,#PB_Ascii)
	EndMacro
	Macro LengthCalcuation
		CompilerIf #CalculateLength
			CalculateLength
		CompilerEndIf
	EndMacro

	HayStackText.s="A small test with a small haystack and short needle."+Space(#AddSpaces)+"This will perform differently because some algorithms take some time to initialize"
	NeedleText.s="initial"
	PokeS(*HayStack,HayStackText, -1, #PB_Ascii)
	PokeS(*Needle,NeedleText, -1, #PB_Ascii)
	CalculateLength


	t0 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		LengthCalcuation
		x1 = FindData::TS(*HayStack, HayStackSize2, *Needle, NeedleSize2)+1
	Next
	t1 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		LengthCalcuation
		x2 = FindData::QS(*HayStack, HayStackSize2, *Needle, NeedleSize2)+1
	Next
	t2 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		LengthCalcuation
		x3 = FindData::BM(*HayStack, HayStackSize2, *Needle, NeedleSize2)+1
	Next
	t3 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		LengthCalcuation
		x4 = FindData::FastSearch(*HayStack, HayStackSize2, *Needle, NeedleSize2)+1
	Next
	t4 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		LengthCalcuation
		x5 = FindData::SSE2_Find(*HayStack, HayStackSize2, *Needle, NeedleSize2)+1
	Next
	t5 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		x6 = StringFind(@HayStackText,@NeedleText)
	Next
	t6 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		CompilerIf #PB_Compiler_Unicode
			x7 = LocStrW(@HayStackText,@NeedleText)
		CompilerElse
			x7 = LocStrA(@HayStackText,@NeedleText)
		CompilerEndIf
	Next
	t7 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		x8 = FindString(HayStackText,NeedleText)
	Next
	t8 = ElapsedMilliseconds()
	For i = 1 To #TestCount2
		x9 = MyFindString(@HayStackText,@NeedleText)
	Next
	t9 = ElapsedMilliseconds()

	S + #CRLF$
	S + "String Search Test" + #CRLF$
	S + "-------------------" + #CRLF$
	S + "Tailed Substring : " + #TAB$+Str(t1-t0) + "ms" + #TAB$+Str(x1)+#CRLF$
	S + "QuickSearch : " + #TAB$+Str(t2-t1) + "ms" + #TAB$+Str(x2)+#CRLF$
	S + "Boyer-Moore : " + #TAB$+Str(t3-t2) + "ms" + #TAB$+Str(x3)+#CRLF$
	S + "FastSearch : " + #TAB$+Str(t4-t3) + "ms" + #TAB$+Str(x4)+#CRLF$
	S + "SSE2 Find : " + #TAB$+Str(t5-t4) + "ms" + #TAB$+Str(x5)+#CRLF$
	S + "StringFind : " + #TAB$+Str(t6-t5) + "ms" + #TAB$+Str(x6)+#CRLF$
	S + "LocStr (A/W) : " + #TAB$+Str(t7-t6) + "ms" + #TAB$+Str(x7)+#CRLF$
	S + "PB FindString : " + #TAB$+Str(t8-t7) + "ms" + #TAB$+Str(x8)+#CRLF$
	S + "My FindString : " + #TAB$+Str(t9-t8) + "ms" + #TAB$+Str(x9)+#CRLF$

; EndDefine

;SetClipboardText(S)
MessageRequester("Test results", S)

ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

Code: Select all

;Fast enough ?

Global *s

OpenConsole("FindString-Test")

*s = AllocateMemory(Len(Space(1000000) + "!"))
PokeS(*s, Space(1000000) + "!", #PB_Any, #PB_Ascii)

t = ElapsedMilliseconds()
For i = 1 To 1000
  x = strcspn_(*s, @"!")
Next
PrintN(Str(x+1)) ;Index is 0
PrintN(Str(ElapsedMilliseconds() - t))
Input()
CloseConsole()
Takes less than 100 ms (Linux)
With Windows it takes over 2500 ms (and does not work properly)
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Why FindString() is so terribly slow?

Post by Josh »

ccode wrote:

Code: Select all

;Fast enough ?

Global *s

OpenConsole("FindString-Test")

*s = AllocateMemory(Len(Space(1000000) + "!"))
PokeS(*s, Space(1000000) + "!", #PB_Any, #PB_Ascii)

t = ElapsedMilliseconds()
For i = 1 To 1000
  x = strcspn_(*s, @"!")
Next
PrintN(Str(x+1)) ;Index is 0
PrintN(Str(ElapsedMilliseconds() - t))
Input()
CloseConsole()
Takes less than 100 ms (Linux)
With Windows it takes over 2500 ms (and does not work properly)
Sorry ccode, the same problem as with other codes here and as I have repeatedly criticized.
  • Why do you think that AllocateMemory () and PokeS () are not part of the timekeeping? The two commands are needed to execute your code. Therefore, they are also part of the time measurement and have to be places INSIDE the for loop, including a FreeMemory().
  • Why should you just convert a string from Unicode to Ascii and then expect a real result? This still works for the characters 1 to 127, but if there are proper Unicode characters in your strings, the result is a lottery game.
@All
I strongly advise against using the codes here in this topic (not sure I've seen it all). Sorry for the direct testimony, but this is hokum and charlatanism.

@ccode
Sorry ccode to place this hard critic directly to your posting, but your code is so short and compact. You can see the problem more easily than with other codes here with several hundred lines.
sorry for my bad english
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

NoNo!

Code: Select all

;Fast enough ?

;Windows: Very slow ! ( > 5000 ms) ??? (pelles-c)
;Linux: Very fast ! ( < 80 ms ) (gcc)

EnableExplicit

Global *s, *s2, l, x, t, i

OpenConsole("FS-Unicode-Test")

t = ElapsedMilliseconds()

CompilerSelect #PB_Compiler_OS
  CompilerCase #PB_OS_Windows
    l = Len(Space(1000000) + "!")*2
    *s = AllocateMemory(l)
    *s2 = AllocateMemory(Len("!")*2)
    PokeS(*s, Space(1000000) + "!", #PB_Any)
    PokeS(*s2, "!", #PB_Any)
  CompilerCase #PB_OS_Linux
    *s = AllocateMemory(Len(Space(1000000) + "!"))
    *s2 = AllocateMemory(Len("!"))
    PokeS(*s, Space(1000000) + "!", #PB_Any, #PB_UTF8)
    PokeS(*s2, "!", #PB_Any, #PB_UTF8)
CompilerEndSelect

For i = 1 To 1000
  x = strstr_(*s, *s2) ; Linux ( < 80 ms)
  ;x = strcspn_(*s, *s2) ; Linux ( < 110 ms )
Next
PrintN(Str(ElapsedMilliseconds() - t)+" ms")
;PrintN(Str(x+1)) ;Index is 0
CompilerSelect #PB_Compiler_OS
  CompilerCase #PB_OS_Windows
    PrintN("Char: "+PeekS(x,-1))
    PrintN("Pos: "+Str(((x-*s)/2)+1))
  CompilerCase #PB_OS_Linux
    PrintN("Char: "+PeekS(x,-1,#PB_UTF8))
    PrintN("Pos: "+Str(((x-*s))+1)) ;I get 63 ms (Linux)
CompilerEndSelect

Input()
CloseConsole()
fryquez
Enthusiast
Enthusiast
Posts: 391
Joined: Mon Dec 21, 2015 8:12 pm

Re: Why FindString() is so terribly slow?

Post by fryquez »

@ccode
Windows is slow in your example because you use the old msvcrt.dll
If you have Windows 10, try ucrtbase.dll, it beats all the "code" I see here.

Code: Select all

PrototypeC pstrstr(a, b)
OpenLibrary(0, "ucrtbase.dll")
Global strstr.pstrstr = GetFunction(0, "strstr")
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

@fryquez

Thanks for the information!

ucrtbase.dll is faster!

But it's still a little faster under Linux!
User avatar
Michael Vogel
Addict
Addict
Posts: 2811
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Why FindString() is so terribly slow?

Post by Michael Vogel »

It's clear that an universal killer routine which works fast in ASCII and unicode, 32 and 64 bit (for different OS) and for short and long strings is not an easy job. Depending what kind of comparism is needed and which plattform is used, some routines here are quite good and if speed matters a collection of four FindString commands (FS32Ascii, FS64Ascii, FS32Uni, FS64Uni) is not a bad idea.

One part which could still get a boost is the StringLength part, here's the top performer I found in the forum here:

Code: Select all

	DataSection
		STRINGTBL:
		Data.l 0, 0, 0, $ff
		Data.l 0, $ffff, 0, $ffffff
		Data.l 0, $ffffffff, $ff, $ffffffff
		Data.l $ffff, $ffffffff, $ffffff, $ffffffff
	EndDataSection

	Procedure.l MyStringLen(*String)

		!MOV eax,[p.p_String]
		!pxor     mm1, mm1
		!MOV      ecx, eax
		!MOV      edx, eax
		!And      ecx, -8
		!And      eax, 7
		!movq     mm0, [ecx]
		!por      mm0, [l_stringtbl+eax*8]
		!@@:
		!ADD      ecx, 8
		!pcmpeqb  mm0, mm1
		!packsswb mm0, mm0
		!movd     eax, mm0
		!movq     mm0, [ecx]
		!TEST     eax, eax
		!JZ       @b
		!BSF      eax, eax
		!SHR      eax, 2
		!LEA      ecx, [ecx+eax-8]
		!SUB      ecx, edx
		!MOV eax, ecx
		!emms
		ProcedureReturn

	EndProcedure
ccode
User
User
Posts: 99
Joined: Sat Jun 23, 2018 5:21 pm

Re: Why FindString() is so terribly slow?

Post by ccode »

Thanks fryquez !

Update!

Code: Select all

;Fast enough ?

;Windows: A little bit slow ! ( > 400 ms) (ucrtbase)
;Linux: Very fast ! ( < 80 ms ) (gcc)

EnableExplicit

Global *s, *s2, l, x, t, i

CompilerIf #PB_Compiler_OS = #PB_OS_Windows
  PrototypeC pstrstr(a, b)
  OpenLibrary(0, "ucrtbase.dll")
  Global strstr.pstrstr = GetFunction(0, "strstr")
CompilerElseIf #PB_Compiler_OS = #PB_OS_Linux
  Macro strstr(s, s2)
    strstr_(s, s2)
  EndMacro
CompilerEndIf

OpenConsole("FS-Unicode-Test")

t = ElapsedMilliseconds()

*s = AllocateMemory(Len(Space(1000000) + "!"))
*s2 = AllocateMemory(Len("!"))
PokeS(*s, Space(1000000) + "!", #PB_Any, #PB_UTF8)
PokeS(*s2, "!", #PB_Any, #PB_UTF8)

For i = 1 To 1000
  x = strstr(*s, *s2) ; Linux ( < 80 ms)
 ;x = strcspn_(*s, *s2) ; Linux ( < 110 ms )
Next
PrintN(Str(ElapsedMilliseconds() - t)+" ms")
;PrintN(Str(x+1)) ;Index is 0
PrintN("Char: "+PeekS(x,-1,#PB_UTF8))
PrintN("Pos: "+Str(((x-*s))+1)) ;I get 63 ms (Linux)

Input()
CloseConsole()
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3943
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Why FindString() is so terribly slow?

Post by wilbert »

Michael Vogel wrote:It's clear that an universal killer routine which works fast in ASCII and unicode, 32 and 64 bit (for different OS) and for short and long strings is not an easy job.
The speed difference between 32 and 64 bit usually isn't very big if you use SSE2 instructions.
What you are searching for does make a big difference. If you are searching for a single character, the best approach is completely different compared to searching for a string of 16 characters.

If you are using OS functionality, especially case insensitive search varies a lot depending on what OS you are using.
The standard c functions for wide char (wcsstr) unfortunately can't be used on MacOS and Linux with PB unicode strings.

A better SSE2 find for unicode strings could be created especially if string addresses are always aligned on a 2 byte boundary. A 8 or 16 byte boundary would be ideal but I don't know how PB memory alignment is for strings across all supported OS.
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply