Page 2 of 3

Posted: Sun Nov 11, 2007 3:57 pm
by kinglestat
a point of interest
myfind fails with search patterns long 1 character

a very informative post
cheers

Posted: Mon Nov 12, 2007 12:07 am
by pdwyer
Opps, didn't I update that bug fix :oops:

I'll get to that when I get home.

Actually, if there is a high chance of a search being only a couple of characters then this algorithm is not going to do much for you.

Thanks for spotting that though!

Posted: Mon Nov 12, 2007 9:15 am
by kinglestat
you do score a good point
but I have the knack of finding these akward bugs
I point them out as it happens to me at times that somehting doesnt work as it should and banging my head against the wall doesnt really work!

good work

Posted: Mon Nov 12, 2007 10:25 am
by pdwyer
If it gets the bugs out then keep reading!

The code will only get better

Code: Select all


Structure MemoryArray 
    Byte.c[0] 
EndStructure 

Procedure.l Quicksearch(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

    *MainByteArray.MemoryArray = *MainMem
    *FindByteArray.MemoryArray = *FindMem 
    StartPos =- 1

    ; Build BadChr Array
    Dim BadChar.l(255)
    
    ; set all alphabet to max shift pos (length of find string plus 1)
    For i = 0 To 255
        BadChar(i)  =  FindLen + 1
    Next
    
    ;Update chars that are in the find string to their position from the end.
    For i = 0 To findlen -1
        BadChar(*FindByteArray\byte[i]) = findlen - i    
    Next     

    MainArrayLoop.l = StartPos 
    EndSearchPos.l = MainLen - (FindLen -1)
    
    While MainArrayLoop <= EndSearchPos
    
        If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
            FoundPos = MainArrayLoop + 1
            Break
        EndIf 
        
        ;Didn't find the string so shift as per the table.
        MainArrayLoop + BadChar(*MainByteArray.MemoryArray\byte[MainArrayLoop + FindLen])
    
    Wend

    ProcedureReturn FoundPos

EndProcedure 


Debug Quicksearch(1, @"ABCDEF", 6, @"A", 1) 
Debug Quicksearch(1, @"ABCDEF", 6, @"F", 1)
Debug Quicksearch(1, @"ABCDEF", 6, @"BC", 2)


Posted: Wed Aug 27, 2008 3:57 pm
by Little John
Thanks a lot Paul, your QuickSearch() function is very useful! A big advantage over the built-in FindString() function is, that QuickSearch() can be used for arbitrary chunks of memory, regardless whether or not they contain NULL bytes! Very much appreciated!
I have a tiny suggestion for cosmetic improvement: When all variables inside the procedure were declared, then the code would work with EnableExplicit without the need of any change.
And a question, please: Does QuickSearch() work only in ASCII mode, or does it work in Unicode mode as well?

Regards, Little John

Posted: Wed Aug 27, 2008 4:17 pm
by pdwyer
bit of a sore point that. The .c type is what I want to use as it's unsigned but changes size with unicode and so will screw up there. This should work fine with .b as I'm not shifting bits. (need to test)

Then, since it's a memory search and not a string one it will fine whatever you look for provided the data is the same. (ie if you are looking for unicode in unicode text with pointers would be fine)

Posted: Wed Aug 27, 2008 6:53 pm
by Little John
Unfortunately, I can't claim that I exactly understand how QuickSearch() works, and it was the following part which made me insecure regarding Unicode:

Code: Select all

   ; set all alphabet to max shift pos [...]
   For i = 0 To 255
So I assumed only the alphabet from 0 to 255 would be supported.

Regards, Little John

Posted: Thu Aug 28, 2008 4:39 am
by Little John
Paul, referring to your most recent code:

Instead of

Code: Select all

StartPos =- 1
I think it must be

Code: Select all

StartPos - 1
Regards, Little John

Posted: Thu Aug 28, 2008 2:51 pm
by pdwyer
:oops:

I just checked the code in my include file where I keep such things and the line is completely commented out

Sorry about that

Posted: Sun Nov 16, 2008 12:30 pm
by mback2k
I have updated your code to be fully Unicode compatible. :)

Code: Select all

Structure MemoryArray
  Byte.b[0]
EndStructure

Procedure.l QuickSearch(StartPos, *MainMem, MainLen, *FindMem, FindLen)
  Protected MainArrayLoop, EndSearchPos, *MainByteArray.MemoryArray, *FindByteArray.MemoryArray
  
  *MainByteArray = *MainMem
  *FindByteArray = *FindMem
  StartPos =- 1
  
  ; Build BadChr Array
  Protected Dim BadChar(255)
  
  ; set all alphabet to max shift pos (length of find string plus 1)
  For i = 0 To 255
    BadChar(i)  =  FindLen + 1
  Next
  
  ;Update chars that are in the find string to their position from the end.
  For i = 0 To FindLen -1
    BadChar(*FindByteArray\Byte[i]) = FindLen - i   
  Next     
  
  MainArrayLoop = StartPos
  EndSearchPos = MainLen - (FindLen -1)
  
  While MainArrayLoop <= EndSearchPos
    
    If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
      FoundPos = MainArrayLoop + 1
      Break
    EndIf
    
    ;Didn't find the string so shift as per the table.
    MainArrayLoop + BadChar(*MainByteArray.MemoryArray\Byte[MainArrayLoop + FindLen])
    
  Wend
  
  ProcedureReturn FoundPos
  
EndProcedure

Procedure QuickStringSearch(StartPos, String$, StringToFind$)
  
  CompilerIf #PB_Compiler_Unicode
    ProcedureReturn QuickSearch(StartPos, @String$, StringByteLength(String$), @StringToFind$, StringByteLength(StringToFind$))/2+1
  CompilerElse
    ProcedureReturn QuickSearch(StartPos, @String$, Len(String$), @StringToFind$, Len(StringToFind$))
  CompilerEndIf

EndProcedure

Debug QuickStringSearch(1, "ABCDEF34637347347", "A")
Debug QuickStringSearch(1, "ABCDEF34637347347", "F")
Debug QuickStringSearch(1, "ABCDEF34637347347", "BC")

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Tue Jan 31, 2012 4:38 pm
by kinglestat
Thanks for this...exactly what I was looking for...though still trying it

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Sat Feb 18, 2012 3:21 pm
by Michael Vogel
Did some tests to check the QuickSearch speed:

Code: Select all

Procedure.l StringFind(*String,*Search,StartPos)

	Protected MemString=MemoryStringLength(*String);		StringLen(*String);
	Protected MemSearch=MemoryStringLength(*Search);	StringLen(*Search);

	MOV Esi,*Search
	MOV Edx,MemSearch

	OR Edx,Edx
	JZ l_FSFertig		; Länge des Suchtexts= 0

	MOV Edi,*String
	ADD Edi,StartPos
	DEC Edi
	MOV Ecx,MemString
	SUB Ecx,StartPos
	ADD Ecx,2
	SUB Ecx,Edx
	JB l_FSFertig		; Suchtext länger als String

	CLD				; clear direction flag -> DF=0
	!l_FSSearch:
	LODSB				; = mov al, [esi]: if DF=0 esi+1 (else esi-1)
	; first char of pattern to al

	REPNZ SCASB		; Al - [edi]:edi+1:ecx-1 and repeat while not null or ecx>0
	; compare first char of pattern with source until it matches or ecx is counted down

	JNZ l_FSFertig		; Al - [edi]<>0 but ecx=0 (end of SrcMem reached)

	MOV Eax,Edi
	MOV Ebx,Ecx
	MOV Ecx,Edx
	DEC Ecx
	REPZ CMPSB		; [esi] - [edi]: (if DF=0) esi+1: edi+1: ecx-1 and repeat while null or ecx>0
	JZ l_FSFound		; [esi] - [edi] was 0 and ecx is 0 -> whole pattern matched

	; else ecx is 0 but [esi]-[edi] <> 0
	MOV Edi,Eax
	MOV Ecx,Ebx
	MOV Esi,*Search
	JMP l_FSSearch

	!l_FSFound:
	SUB Eax,*String
	ProcedureReturn

	!l_FSFertig:
	MOV eax,0
	ProcedureReturn

EndProcedure
Procedure.l QuickSearch(StartPos,*MainMem,MainLen,*FindMem,FindLen)

	Structure MemoryArray
		Byte.b[0]
	EndStructure

	Protected MainArrayLoop,EndSearchPos,*MainByteArray.MemoryArray,*FindByteArray.MemoryArray

	*MainByteArray=*MainMem
	*FindByteArray=*FindMem
	StartPos=- 1

	; Build BadChr Array
	Protected Dim BadChar(255)

	; set all alphabet to max shift pos (length of find string plus 1)
	For i=0 To 255
		BadChar(i)=FindLen+1
	Next

	;Update chars that are in the find string to their position from the end.
	For i=0 To FindLen-1
		BadChar(*FindByteArray\Byte[i])=FindLen-i
	Next

	MainArrayLoop=StartPos
	EndSearchPos=MainLen-(FindLen-1)

	While MainArrayLoop<=EndSearchPos

		If CompareMemory(*MainMem+MainArrayLoop,*FindMem,FindLen)=1
			FoundPos=MainArrayLoop+1
			Break
		EndIf

		;Didn't find the string so shift as per the table.
		MainArrayLoop+BadChar(*MainByteArray.MemoryArray\Byte[MainArrayLoop+FindLen])

	Wend

	ProcedureReturn FoundPos

EndProcedure
Procedure QuickStringSearch(StartPos,String$,StringToFind$)

	CompilerIf #PB_Compiler_Unicode

		ProcedureReturn QuickSearch(StartPos,@String$,StringByteLength(String$),@StringToFind$,StringByteLength(StringToFind$))/2+1

	CompilerElse

		ProcedureReturn QuickSearch(StartPos,@String$,Len(String$),@StringToFind$,Len(StringToFind$))

	CompilerEndIf

EndProcedure


Global a.s="This is a simple string for testing several find procedures ~ the results will show, if any of the tested functions are faster than the internal FindString() function"

Global f1.s="find"
Global f2.s="Find"
Global f3.s="FIND"

Procedure test(function)

	Protected s

	#Loop=100000
	
	t=-ElapsedMilliseconds()
	
	For i=0 To #Loop

		Select function

		Case 1
			s1=FindString(a,f1)
			s1+FindString(a,f2)
			s1+FindString(a,f3)

		Case 2
			s2=StringFind(@a,@f1,1)
			s2+StringFind(@a,@f2,1)
			s2+StringFind(@a,@f3,1)

		Case 3
			s3=QuickSearch(1,@a,Len(a),@f1,4)
			s3+QuickSearch(1,@a,Len(a),@f2,4)
			s3+QuickSearch(1,@a,Len(a),@f3,4)

		EndSelect

	Next i

	ProcedureReturn t+ElapsedMilliseconds()

EndProcedure

s.s=""
For i=1 To 3
	s+"Test #"+Str(i)+": "+Str(test(i))+" ms"+#CR$
Next i

MessageRequester("Results",s)
The assembler routine is the fastest here, quicksearch takes a little bit longer...

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Sun Feb 19, 2012 12:53 pm
by wilbert
Michael Vogel wrote:The assembler routine is the fastest here, quicksearch takes a little bit longer...
Are you sure ?
When I test on OS X, the built in FindString is about twice as fast compared to your StringFind routine.

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Sun Feb 19, 2012 1:06 pm
by Little John
Just do not forget the decisive difference:
Little John wrote:A big advantage over the built-in FindString() function is, that QuickSearch() can be used for arbitrary chunks of memory, regardless whether or not they contain NULL bytes!
Regards, Little John

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Tue Feb 28, 2012 1:46 pm
by Helle
@ Michael Vogel: For Fun a Procedure for FindString with SSE4.2:

Code: Select all

;- FindString with SSE4.2, 32-Bit-Version. no Unicode
;- "Helle" Klaus Helbing, 28.02.2012, PB 4.60 (x86)
;- Versions for 64-Bit and Unicode exists

Procedure.l FindString_SSE42_32Bit(pStr1, pStr2, StartPos)
  !mov eax,[p.v_StartPos] 
  !cmp eax,1
  !jge @f
  !mov eax,1                                ;or End or Error
!@@:
  !mov edx,[p.v_pStr2]
  !mov ecx,[p.v_pStr1]                      ;push ebx!
  !push ebx
  !push edi
  !push esi
  !mov ebx,ecx
  !sub ebx,1
  !add ebx,eax                              ;eax=StartPos
  !movdqu xmm0,[edx]                        ;first max.16 Bytes of Find$
  !xor ecx,ecx
!Search:
  !add eax,ecx
  !add ebx,ecx
  !pcmpistri xmm0,dqword[ebx],00001100b     ;Least Significant Index, Positive Polarity (IntRes2=IntRes1), Equal Ordered, Unsigned Bytes (=No Unicode)  
  !jz Base$End                              ;Zero-Flag is set if found end of Base$
  !js Find$End                              ;Sign-Flag is set if found end of Find$  
  !jecxz @f                                 ;jump if ECX is zero 
  !jmp Search
!@@:
  !movdqa xmm1,xmm0                         ;save xmm0
  !mov edi,ebx
  !mov esi,edx
!@@:
  !add esi,16
  !add edi,16                               ;next 16 Bytes of Base$
  !movdqu xmm1,[esi]                        ;next 16 Bytes of Find$
  !pcmpistri xmm1,dqword[edi],00001100b     ;compare the strings 
  !jz Base$End
  !js Find$End
  !jecxz @b
  !jmp Search
!Base$End:
  !cmp ecx,16
  !jne Found
  !xor eax,eax
  !jmp NoFound
!Find$End:
  !or ecx,ecx                               ;ECX Zero? Conventional
  !jnz Search
!Found:
  !add eax,ecx
!NoFound:
  !pop esi
  !pop edi
  !pop ebx
 ProcedureReturn                            ;EAX 
EndProcedure 

Base$ = "This is a simple string for testing several find procedures ~ the results will show, if any of the tested functions are faster than the internal FindString() function"
Find$ = "find"

Pos = 1                                     ;Start-Position

FindSSE42 = FindString_SSE42_32Bit(@Base$, @Find$, Pos)
SSE42$ = "Found String at Position : " + Str(FindSSE42)

PB$ = "Test with PB : " + Str(FindString(Base$, Find$, Pos))

MessageRequester("FindString with SSE4.2", SSE42$ + #LFCR$ + PB$)

I hope, this is "waterproof", I will make another tests. The speed-factor is 5-6 with PB.
Have Fun!
Helle