Page 1 of 3

QuickSearch Algorithm (Blitzes Findstring)

Posted: Sun Jul 08, 2007 2:42 pm
by pdwyer
An Offshoot from this thread http://www.purebasic.fr/english/viewtopic.php?t=27845 (Inmem etc)

After hitting some performance issues on this other thread I took a look on the net and Wikipedia pointed me to http://www-igm.univ-mlv.fr/~lecroq/stri ... CTION00190.

Boyer & Moore have quite a few variations and this is one of the better ones and easy to code. On the same test as the Inmem link the results for finding a 32char string inside a 3mb file 5000 times were

InMem: 111.875 secs :?
PB's Findstring: 48.390 secs 8)
Quicksearch: 5.734 secs :shock:


Note though that there is an overhead on quick search at the beginning so small search, especially where the "Findmem" is short will be slower. The larger the "findmem" the more the algorithm can "jump" or "shift". There's still some room to squeeze more power out of it, 9x Findstring and it works with binary is fine with me.

Finding things like this makes programming all the more worthwhile! :D

Code: Select all

Procedure.l Quicksearch(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l) 

    *MainByteArray.MemoryArray = *MainMem
    *FindByteArray.MemoryArray = *FindMem 

    ; Build BadChr Array
    Dim BadChar.l(255)
    
    ; set all alphabet to max shift pos (length of find string plus 1)
    For i = 0 To 255
        BadChar(i)  =  FindLen + 1
    Next
    
    ;Update chars that are in the find string to their position from the end.
    For i = 0 To findlen -1
        BadChar(*FindByteArray\char[i]) = findlen - i    
    Next     

    MainArrayLoop.l = StartPos 
    EndSearchPos.l = MainLen - FindLen
    
    While MainArrayLoop <= EndSearchPos
    
        If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
            FoundPos = MainArrayLoop + 1
            Break
        EndIf 
        
        ;Didn't find the string so shift as per the table.
        MainArrayLoop + BadChar(*MainByteArray.MemoryArray\char[MainArrayLoop + FindLen])
    
    Wend

    ProcedureReturn FoundPos

EndProcedure 

Posted: Mon Jul 09, 2007 9:59 am
by Max
Very good job :D :D :D

But you 've forget the memoryArray structure:

Code: Select all

Structure MemoryArray
    char.c[0]
EndStructure
my results on 5MB string (5000 searches) are:
FindString: 93.578 secs
QuickSearch: 19.766 secs

Posted: Mon Jul 09, 2007 1:17 pm
by Dare
Nifty

Posted: Mon Jul 09, 2007 1:31 pm
by PB
Can you please give an example of how to use it?

Posted: Mon Jul 09, 2007 2:55 pm
by Max
PB wrote:Can you please give an example of how to use it?

Code: Select all

OpenConsole()
  Texto.s = "This is a text of any number of characters"
  LenBase = 5 * 1024 * 1024
  StringBase.s = space(LenBase)  
  *MemBase     = @StringBase
  For x = 0 To LenBase-1
    PokeB(*MemBase + x, Random(240)+15)  ; avoid x00 character
  Next
  PosString = LenBase - 100    ; to the end of string
  PokeS(*MemBase + PosString, Texto)

print( "Search with FindString : " )
  time = ElapsedMilliseconds()
  For x = 1 To 5000
    posic = FindString(StringBase, Texto, 1)
  Next x
  printn(str(posic)+" : "+Str(ElapsedMilliseconds()-time) )
printn("")

print( "Search with QuickSearch: " )
  time = ElapsedMilliseconds()
  For x = 1 To 5000
    posic = QuickSearch(1, *MemBase, LenBase, @Texto, len(Texto))
  Next x
  printn(str(posic)+" : "+Str(ElapsedMilliseconds()-time) )

input()
END

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Mon Jul 09, 2007 4:52 pm
by horst
pdwyer wrote:Finding things like this makes programming all the more worthwhile! :D
Fully agree :) . It took me some time to find out what's going on, so I thought it would be helpful to make the wording a little more meaningful. I did not change the algorithm, just the wording (and the order of the parameters!!).

Code: Select all

; The idea is obvious, when you consider the situation after a CompareMemory():
; If the next character (current location +length) does not occur in the search string, 
; then you can safely skip length+1, because there would be no match anyhow. 
; If the next character does occur in the string, the number of characters to skip 
; depends on its position in the string. This is prepared in the SkipLength array,
; so the evaluation will be very fast in the main loop.

Structure CharArray : char.c[0] : EndStructure 

Procedure Quicksearch(*Object.CharArray, ObjectLength.l, StartOffset.l, *String.CharArray, StringLength.l)

  Dim SkipLength.l(255)  ; for each ascii char: length to skip ..
 
  ; default to skip after non-match (if following char in object does not occur in string)
  For i = 0 To 255
    SkipLength(i) = StringLength + 1
  Next
  ; length to skip for each char in string (acc. to position from end) is put into table 
  For i = 0 To StringLength -1
    SkipLength(*String\char[i]) = StringLength - i   
  Next     

  CurrentOffset.l = StartOffset
  MaxOffset.l = ObjectLength - StringLength
 
  While CurrentOffset <= MaxOffset
    If CompareMemory(*Object + CurrentOffset, *String, StringLength) = 1
      FoundPos = CurrentOffset + 1
      Break
    EndIf
    ; skip number of chars (picked from table) depending on next char. 
    CurrentOffset + SkipLength(*Object\char[CurrentOffset + StringLength])
  Wend
  ProcedureReturn FoundPos
EndProcedure 
Edited: emphasis

Posted: Tue Jul 10, 2007 1:19 am
by pdwyer
@Max, Thanks for catching that, I left it outside the proc... oops
@Max, In your example you are using strings so you avoid the Null but generally the aglorithm is fine with nulls in there. (I guess you knew that but I thought I'd mention it anyway)

@horst, Cheers, that makes it clearer. :)

I did some other perf tests with it, I also tried just a string version where you parse two strings but all the asc(mid(MyString,i,1)) stuff slooooowed it down to the point it was useless. I also tried parsing two string, assigning pointers to the string then doing the same as this function above, it was much better than asc(mid()) etc but still too slow. The reason was the PB LEN() function :(

It seems that PB strings being null terminated rather than BSTRs don't have their size pre calced and so when you do a len() (guessing a bit here) the string needs to be traversed to find the null. If you think about other string functions, most would require the length to be calculated at some stage making them quite slow (compared to say VB strings).

Consider... (still guessing though)
C$ = B$ + A$

Memory needs to be allocated for C$ so the len of A$ and B$ need to be calculated. Once assigned though, the length of C$ is not saved anywhere so C$ = C$ + "Hello" later would need the len calced again.

In a BSTR the length is in the first long of the string structure so if the string is 1mb it doesn't have to traverse this, the length of C$ can be calced from the known lengths of A$ and B$ then written to the C$ string length area.


Moral of the story, only use strings in PB when you need to (like working with the UI). Get the strings into pointers asap or start them there from FileLoad() and do everything with pointers for by far best performance.

I guess PB veterens already know this though so I'm preaching to the choir so to speak but as a PB noob I'm just figuring this out so bear with me :oops:

Re: QuickSearch Algorithm (Blitzes Findstring)

Posted: Tue Jul 10, 2007 1:26 am
by NoahPhense
Maybe I'm doing something wrong, but it's not
working on my 4.10 beta 2.

- np

Posted: Tue Jul 10, 2007 1:41 am
by pdwyer
I'm not quite sure why but it's crashing when I use horst's version. Something to do with the pointer allocation at the beginning being changed perhaps... :? I'll take a closer look later

Not sure if that's what you mean.

EDIT: Actually, it could be that max's example is for my function but the params have changed for horsts. So probably not a problem on Horsts side at all, apologies

Posted: Tue Jul 10, 2007 8:52 am
by Max
pdwyer wrote: It seems that PB strings being null terminated rather than BSTRs don't have their size pre calced and so when you do a len() (guessing a bit here) the string needs to be traversed to find the null. If you think about other string functions, most would require the length to be calculated at some stage making them quite slow (compared to say VB strings).
I'm not sure..., try this code (using my previous example):

Code: Select all

  time = ElapsedMilliseconds()
  leng = memorystringlength(@Stringbase)
  printn("base "+str(leng)+" : "+Str(ElapsedMilliseconds()-time) )
  time = ElapsedMilliseconds()
  leng = len(Stringbase)
  printn("base "+str(leng)+" : "+Str(ElapsedMilliseconds()-time) )

  time = ElapsedMilliseconds()
  leng = memorystringlength(@texto)
  printn("text "+str(leng)+" : "+Str(ElapsedMilliseconds()-time) )
  time = ElapsedMilliseconds()
  leng = len(texto)
  printn("text "+str(leng)+" : "+Str(ElapsedMilliseconds()-time) )
As you can see, "MemoryStringLength()" really counts the number of strings characters (in my old PC 16 ms for Stringbase and 1 ms for Texto),
but "len()" say 0 ms in both cases.
I think PB saves the length of their strings (defined as "string"). Perhaps in a quad (8bytes) previous to memory allocation?.

Posted: Tue Jul 10, 2007 9:14 am
by pdwyer
Take a look at this though: Borrowing a little of your code, one string is 5mb the other is only about 30 bytes, if Len() was being held somewhere then these should return in about the same time but there is a huge difference. For an exe, the 5mb string takes about 15secs on my pc where as the 30byte string takes 0 or 16ms (basically the smallest increment that ElapsedMilliseconds can handle)

Code: Select all

  LenBase = 5 * 1024 * 1024 
  StringBase.s = Space(LenBase)  
  *MemBase     = @StringBase 
  For x = 0 To LenBase-1 
    PokeB(*MemBase + x, Random(240)+15)  ; avoid x00 character 
  Next
  
  time = ElapsedMilliseconds() 
  
  For i = 1 To 2000
    Len(StringBase)
  Next
  
  MessageRequester("Long String Len() time (2000 loops)", Str(ElapsedMilliseconds() - time) + "ms")
  
  StringBase = "This is a much shorter string"
  
  time = ElapsedMilliseconds() 
  
  For i = 1 To 2000
    Len(StringBase)
  Next
  
  MessageRequester("Short String Len() time (2000 loops)", Str(ElapsedMilliseconds() - time) + "ms")

Posted: Tue Jul 10, 2007 9:24 am
by Max
You're right.
In my example, I've tried only 1 times and it's not significant.
It must try a high number of "len()"....

Posted: Tue Jul 10, 2007 11:19 am
by pdwyer
I seem to remember reading somewhere that ElapsedMilliseconds() and other windows timer functions except that whatever-performancecounter API one are only sensitive to about 15-16ms.

Another perf benefit area of quicksearch and similar algorithms is that if you remove that "Break" statement it keeps going so if you previously had a loop with findstring and you were updating the start pos each time you wouldn't need to keep calling the function. Also, quicksearch wouldn't need to regenerate the shift bad-char table each time so your overhead would be reduced (per search). You could put a function call in at the break point or if this function was in a lib/dll have a callback function in there.

Brilliant !

Posted: Tue Jul 24, 2007 10:01 am
by HarryE
This Quicksearch routine seems to be faster than strstr C routine in both Windoze&Linux form :!:

Posted: Sat Jul 28, 2007 10:37 am
by schic
I have collected some search algorithm over the time
and done some own on base of that.

Maybe an interesting suggestion to this topic.
Try different patterns...

Code: Select all

Global GbTmp.l 
Global tmpcount

Structure MemoryArray
  Char.c[0]
EndStructure

Procedure.l Quicksearch(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l)
  ; Quicksearch Boyer-Moore Algorithm im Basic from Paul Dwyer
  
  *MainByteArray.MemoryArray = *MainMem
  *FindByteArray.MemoryArray = *FindMem
  
  ; Build BadChr Array
  Dim BadChar.l(255)
  
  ; set all alphabet to max shift pos (length of find string plus 1)
  For i = 0 To 255
    BadChar(i)  =  FindLen + 1
  Next
  
  ;Update chars that are in the find string to their position from the end.
  For i = 0 To FindLen -1
    BadChar(*FindByteArray\Char[i]) = FindLen - i   
  Next     
  
  MainArrayLoop.l = StartPos
  EndSearchPos.l = MainLen - FindLen
  
  While MainArrayLoop <= EndSearchPos
    
    If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
      FoundPos = MainArrayLoop + 1
      Break
    EndIf
    
    ;Didn't find the string so shift as per the table.
    MainArrayLoop + BadChar(*MainByteArray.MemoryArray\Char[MainArrayLoop + FindLen])
    
  Wend
  
  ProcedureReturn FoundPos
  
EndProcedure

Procedure.l MyFind( *Source.l,SrcLen, *Pattern.l,PatLen, StartPos.l)
  
  ;
  ; MyFind routine, who needs Bayer & Moore???
  ;
  ; This uses two tricks I discovered some 12 odd years ago.
  ; First assumption: Find out the length of BOTH strings.
  ; Second test all strings through the instructions "REPNE SCASB" for the length of a string (NULL ending).
  ; Compare strings using the instruction "REPE   CMPSB"
  ; Third test the first character, if found test the LAST character in the string to search,
  ;   If a match TEST BOTH strings,
  ;   if NO match continue searching where we found the last character.
  ; Pro's:
  ;   1) We do ! Not compare ALL found strings.
  ;   2) Easy to implement
  ;   3) No intensive comparing of found first string match in de string.
  ;
  ; Con's: We have to do a liniar search, no 'skipping and sliding' like in Bayer-Moore...
  ;
  ; NEEDS to run as a EXE ! Not from the IDE, or we get false results!!!
  ;
  ! Xor      Eax, Eax      ; EAX = 0
  ! Xor    Ecx, Ecx      ; ECX = 0
  ! Not    Ecx            ; ECX = -1, or 4,294,967,295 max bytes to search, bloody BIG number!!
  ;
  MOV      Edi, *Pattern   ; Adress of the patern in EDI..
  CLD                  ; Looking UP (to the end if the string)..
  ; REPNE   scasb         ; Find the NULL byte..
  ; ! Not      Ecx            ; ECX is strlen (but in minus and +2), invers and included is a -1..
  ; DEC      Ecx            ; ECX -1, ECX now contains the length of the Pattern string..
  MOV Ecx, PatLen
  ; ECX now has the length of the pattern string, test if it is longer than zero. Otherwise we DO have ! Nothing to search for..
  ;
  CMP      Ecx, 1
  JG      label1
  MOV      Eax, 0         ; Return = ! Not FOUND..
  JMP      labelExit
  ;   
  ! label1:
  ; First leg is done, we know length of the search pattern. Keep it in EBX for reference..
  MOV      Ebx, Ecx
  ;
  ; Find the length of the source string, to prevent we search ALL memory (OOPS!)...
  SUB    Ecx, Ecx      ; ECX = 0
  ! Not      Ecx            ; ECX = -1, or 4,294,967,295
  SUB      al, al         ; ZERO AL or AL = 0 the byte we look for..
  MOV      Edi, *Source   ; Adress in EDI..
  CLD                  ; Looking UP (to the end if the string)..
  ; REPNE   scasb         ; Find the NULL byte..
  ; ! Not      Ecx            ; ECX is strlen (but in minus and +2), invers and included a -1..
  ; DEC      Ecx            ; ECX -1, ECX now contains the length of the Source string..
  MOV Ecx, SrcLen
  ; Test if the source is a NULL string?
  CMP      Ecx, 1
  JG      label2
  MOV      Eax, 0         ; Return = ! Not FOUND..
  JMP      labelExit
  ;   
  
  ! label2 :
  MOV      Edx, Ecx      ; Keep a record for the length later on.
  ;
  ; Load into AH the LAST! Character of the search pattern...
  MOV      Esi, *Pattern   ; Adress of the patern in ESI..
  MOV      AH, [Esi+Ebx-1]     ; Now we have the  LAST  char of the search string..
  MOV      al, [Esi]        ; Here we have the FIRST char of the search string..
  ;
  MOV      Edi, *Source   ; Adress of the Source in EDI..
  ! label3:
  CLD                  ; Looking UP (to the end if the Source string)..
  REPNE   scasb         ; Find the FIRST character of the search string (a byte)..
  JECXZ   labelZeroExit   ; ECX is ZERO so ! Not found in the Source, jump exit..
  JMP      labelCont      ; Continue, jump..
  ;
  ! labelZeroExit:
  MOV      Eax, 0         ; ! Not found..
  ProcedureReturn
  
  ! labelCont:
  ;
  ; We have a HIT!!
  ; Look if the LAST char of the Pattern string (which is in AH) is the same as the in the Source string on the same position!!
  ; If so we have a real hit if ! Not, go back and search further..
  CMP      AH, [Edi+Ebx-2]   ; the trick of storing the length of the Pattern string. The -2 is needed because EDI shoots one further AFTER a REPNE..
  JNE      label3         ; IF ! Not jump and search...
  ;
  ; We have the FIRST and LAST position of the Patern string found in the Source string,
  ;   now compare the rest of the characters (from the start+1). Keep the start in the source string AND the count..
  PUSH   Edi            ; save these we need them if we DO ! Not have a match!!
  PUSH   Ecx
  ;
  DEC      Edi            ; Correct the Source pointer..
  MOV      Ecx, Ebx      ; The length of the Pattern string we want to find ! Not of the source string!!..
  CLD
  REPE   cmpsb         ; Compare the two strings (REPE = REPeat while Equal)..
  ;
  POP      Ecx            ; Restore these in case we do ! Not have a match..
  POP      Edi
  ;
  JE      Label4         ; Jump if EQUAL (strings)..
  ;
  ; So ! Not equal ! Not found!! Search further..
  MOV      Esi, *Pattern   ; Reload the adress of the Pattern in ESI..
  JMP      label3
  
  ! Label4:
  ;
  ; YES!!! We have found the string!!! And where is in ECX (coded)..
  MOV      Eax, Edx      ; retrieve the length of the Source string..
  SUB      Eax, Ecx      ; Now we have the length, ready!! Result = EAX register!!
  
  ProcedureReturn
  ; JMP      labelExit
  
  ! labelExit:
  MOV      Eax, 0         ; Error and/or ! Not found..
  ProcedureReturn
EndProcedure

Procedure.l inMemStr5(StartPos.l,*Source.l,SrcLen.l,*Pattern.l,PatLen.l)
  
  ; Boyer-Moore-Algorithmus in Flat-Assembler
  ; von schic nach Ausführungen der Uni Flensburg 
  ; http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/bm2.htm
  ; siehe auch Darstellung von Strother Moore auf 
  ; http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html
  
  If StartPos >=SrcLen Or *Source = 0 Or SrcLen < PatLen Or SrcLen < 2 Or *Pattern < 1
    ProcedureReturn -1
  EndIf
  
  result.l
  PMax.l
  PMore.l
  
  shiftTable.l = AllocateMemory($400)
  
  MOV Ebx, PatLen
  
  CMP Ebx,    1
  JG  _lbl1
  MOV result, -1            ; Muster muss mind. 2 Zeichen sein
  JMP _lblExit
  
  ! _lbl1:
  MOV Esi,    *Source
  ADD Esi,    SrcLen
  SUB Esi,    Ebx
  MOV Edx,    Esi           ; setze Ausstiegslänge auf Source-Ende - Musterlänge
  
  ; ---------------------------------------------------------------------
  ; Tabelle füllen, mit Position für Zeichen die in Findstring vorkommen
  ; ---------------------------------------------------------------------
  MOV Ecx,          Ebx             ; Muster-Länge in ECX
  MOV Esi,          *Pattern        ; Adresse von Muster in ESI
  MOV Edi,          shiftTable      ; Adresse von Shift-Tabelle in Edi
  
  ! Xor Eax,          Eax           ; Eax leeren, da nur al gefüllt, aber EAX zum Weiterarbeiten verwendet wird
  ! _lblFillTable:
  MOV al,           [Esi]           ; Zeichen holen
  INC Esi                           ; Zähler für Muster Eins hoch
  MOV [Edi+Eax*4],  Ecx             ; shift-Wert für Zeichen an ASCII-Stelle in Tabelle
  DEC Ecx                           ; shift-Wert um Eins runter
  JNS _lblFillTable                ; wend while Ecx > 0
  
  ; -------------------------------------------
  ; setup für Schleife "While t<=TMax And p>=0"
  ; -------------------------------------------
  MOV PMax,         Ebx
  DEC PMax                          ; PMax=Len(pattern)-1
  MOV PMore,        Ebx             ; PMore=PMax+1
  MOV Ebx,          shiftTable      ; Zeiger auf shiftTabelle
  MOV Esi,          *Source         ; esi = *Source
  ADD Esi,          StartPos        ; + Startposition
  MOV Edi,          *Pattern        ; Edi = *pattern
  
  ; ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  ! _lblWhileTMax:
  CMP Edx,          Esi
  JL _lblExit                      ; While t<=TMax
  
  MOV Ecx,          PMax            ; Zähler zurücksetzen p=PMax
  ! Xor Eax,          Eax             ; Eax leeren, da nur al gefüllt, aber Eax als long zum Weiterarbeiten verwendet wird
  ! _lblWhile_p_greater_0:         ; zwischen-Einsprung, da Source-Zeiger nicht verändert, kann obiger Vergleich gespart werden
  MOV al,           [Esi+Ecx]       ; Zeichen von Source nach al
  ; CMP al, 65 
  ; JL doneCase
  ; CMP al, 90 
  ; JG doneCase
  ; ADD al, 32
  ; ! doneCase:
  CMP al,           [Edi+Ecx]       ; mit Zeichen in Muster vergleichen (If PeekB(t+p)<>PeekB(@pattern+p))
  JE _lblElse                      ; ein Zeichen ist gleich -> weitere vergleichen
  
  MOV Eax,          [Ebx+Eax*4]     ; shift-Wert für source-Zeichen, von shift-Tabelle laden
  CMP Eax,          0               ; wenn kein Wert (Zeichen kommt nicht im Muster vor)
  JNZ _lblCalcShift                  ; berechne Verschiebung
  
  ADD Esi,          Ecx
  INC Esi                           ; t=t+p+1
  JMP _lblWhileTMax                ; Wend
  
  ! _lblCalcShift:
  ADD Eax,          Ecx
  SUB Eax,          PMore           ; tmpSkip=tmpSkip+p-PMore
  JNL _lblAdd_Shift                  ; wenn Verschiebung < 0
  
  MOV Eax,          1               ; kleinste Verschiebung ist 1
  
  ! _lblAdd_Shift:
  ADD Esi,          Eax             ; t=t+tmpSkip
  JMP _lblWhileTMax                ; Wend
  
  ! _lblElse:
  DEC Ecx                           ; p-1
  JNS _lblWhile_p_greater_0        ; Wend While p>=0
  ; ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  
  SUB Esi,          *Source 
  INC Esi                           ; t+1-@text
  MOV result,       Esi
  
  ! _lblExit:
  FreeMemory(shiftTable)
  
  ProcedureReturn result
EndProcedure

Procedure.l inMemStr2(*SrcMem, memSizeSrc, *Pattern, memSizePattern, StartPos)
  
  ; memSizePattern = MemoryStringLength(*Pattern)                                     
  ; memSizeSrc = MemoryStringLength(*SrcMem)
  
  result.l
  
  MOV Esi,*Pattern          
  MOV Edx,memSizePattern
  
  ! Or Edx,Edx
  JZ l_fertig              ;Länge *Pattern = 0
  MOV Edi,*SrcMem
  ADD Edi,StartPos
  DEC Edi
  MOV Ecx,memSizeSrc
  SUB Ecx,StartPos
  ADD Ecx,2
  SUB Ecx,Edx
  JB l_fertig              ;memSizePattern > memSizeSrc     
  
  CLD                      ;clear direction flag -> DF=0
  !l_search:
  LODSB                    ;= mov al, [esi]: if DF=0 esi+1 (else esi-1)
  ; first char of pattern to al
  
  REPNZ scasb              ;Al - [edi]:edi+1:ecx-1 and repeat while not null or ecx>0
  ; compare first char of pattern with source until it matches or ecx is counted down
  
  JNZ l_fertig             ;Al - [edi]<>0 but ecx=0 (end of SrcMem reached)
  
  MOV Eax,Edi
  MOV Ebx,Ecx
  MOV Ecx,Edx
  DEC Ecx                  ;
  REPZ cmpsb               ; [esi] - [edi]: (if DF=0) esi+1: edi+1: ecx-1 and repeat while null or ecx>0
  JZ l_found               ; [esi] - [edi] was 0 and ecx is 0 -> whole pattern matched
    
  ;else ecx is 0 but [esi]-[edi] <> 0
  MOV Edi,Eax
  MOV Ecx,Ebx
  MOV Esi,*Pattern
  JMP l_search
  
  !l_found:        
  SUB Eax,*SrcMem
  MOV result,Eax
  
  !l_fertig:
  
  ProcedureReturn result 
EndProcedure 

Procedure BMBinSearch (StartPos.l,lpSource.l,srcLngth.l,lpSubStr.l,subLngth.l)
  ; Boyer Moore Exact pattern Matching Algorithms
  ; ASM-Code from
  ; Steve Hutchesson
  ; transfered to (PB-)FASM with slight modifications from schic
  
  ;LOCAL cval   :DWORD
  cVal.l
  ;LOCAL shift_table[256]:DWORD
  shift_table = AllocateMemory($400)
  result.l
  
  MOV Ebx, subLngth
  
  CMP Ebx, 1
  JG jump1
  MOV result, -2                 ; string too short, must be > 1
  JMP Cleanup
  ! jump1:
  MOV Esi, lpSource
  ADD Esi, srcLngth
  SUB Esi, Ebx
  MOV Edx, Esi            ; set Exit Length: edx = pointer to end of source - srcLngth
  
  ; ----------------------------------------
  ; load shift table with value in subLngth
  ; ----------------------------------------
  MOV Ecx, 256
  MOV Eax, Ebx
  MOV Edi, shift_table
  REP stosd
  
  ; ----------------------------------------------
  ; load decending count values into shift table
  ; ----------------------------------------------
  MOV Ecx, Ebx                ; SubString length in ECX
  DEC Ecx                     ; correct for zero based index
  MOV Esi, lpSubStr           ; address of SubString in ESI
  MOV Edi, shift_table
  
  ! Xor Eax, Eax
  
  ! Write_Shift_Chars:
  MOV al, [Esi]               ; get the character
  INC Esi
  MOV [Edi+Eax*4], Ecx        ; write shift for each character
  DEC Ecx                     ; to ascii location in table
  JNZ Write_Shift_Chars
  
  ; -----------------------------
  ; set up for main compare loop
  ; -----------------------------
  MOV Ecx, Ebx          ;Ecx = subLngth
  DEC Ecx               ;Ecx - 1
  MOV cVal, Ecx         ;cval = subLngth -1
  
  MOV Esi, lpSource     ;esi = lpSource
  MOV Edi, lpSubStr     ;Edi = lpSubStr
  ADD Esi, StartPos           ; add starting position
  JMP Pre_Loop
  ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  ! Calc_Suffix_Shift:    ;source-char is in SubStr
  ADD Eax, Ecx            ;add counter in SubStr to value of shift-table
  SUB Eax, cVal               ; sub loop count (cval = subLngth -1)
  JNS Add_Suffix_Shift
  MOV Eax, 1                  ; minimum shift is 1
  
  ! Add_Suffix_Shift:
  ADD Esi, Eax                ; add SUFFIX shift
  MOV Ecx, cVal               ; reset counter in compare loop
  
  ! Test_Length:
  CMP Edx, Esi                ; test exit condition: esi <= pointer to end of source - srcLngth
  JL No_Match            ;if esi > edx (pointer to end of source) then search is ending -> no match
  
  ! Pre_Loop:
  ! Xor Eax, Eax                ; zero EAX for following partial writes
  MOV al, [Esi+Ecx]      ;char from lpSource to al
  
  CMP al, [Edi+Ecx]      ;compare characters from lpSource (esi) to char from lpSubStr (Edi)
  JE Jump2
  ;MOV eax, shift_table[eax*4]
  SHL   Eax,      2h     ;if lpSource (esi) = char from lpSubStr clc shift-value
  ADD Eax,shift_table
  MOV Eax, [Eax]         ;load shift value for source-char from shift-table
  CMP Ebx, Eax           ;compare shift value with subLngth
  JNE Calc_Suffix_Shift;Add_Suffix_Shift   ;if source-char not in SubStr -> bypass SUFFIX calculations
  LEA Esi, [Esi+Ecx+1]   ;if source-char is in SubStr -> add BAD CHAR shift (esi=lpSource+lpSubStr+1)
  JMP Test_Length
  ! Jump2:
  DEC Ecx
  ! Xor Eax, Eax                ; zero EAX for following partial writes
  
  ! Cmp_Loop:
  MOV al, [Esi+Ecx]
  CMP al, [Edi+Ecx]           ; cmp characters in ESI / EDI
  JNE Set_Shift               ; if not equal, get next shift
  DEC Ecx
  JNS Cmp_Loop
  JMP match                   ; fall through on match
  
  ! Set_Shift:
  ;MOV eax, shift_table[eax*4]
  SHL   Eax,      2h 
  ADD Eax,shift_table
  MOV Eax, [Eax]
  CMP Ebx, Eax
  JNE Calc_Suffix_Shift       ;if source-char <> subLngth -> source-char is in SubStr -> run SUFFIX calculations
  LEA Esi, [Esi+Ecx+1]        ; add BAD CHAR shift
  JMP Test_Length
  
  ; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  ! match:
  SUB Esi, lpSource           ; sub source from ESI
  ;MOV eax, esi               ; put length in eax
  INC Esi
  MOV result, Esi
  
  JMP Cleanup
  
  ! No_Match:
  MOV Eax, -1
  
  ! Cleanup:
  FreeMemory(shift_table)
  
  ProcedureReturn result 
EndProcedure

Procedure.l inMemStr(StartPos, *Source, memSizeSrc, strStr, memSizePattern)
  
  ; naiver Algorithmus überprüft ein gesuchtes Muster an 
  ; allen Positionen i eines Speicherbereichs
  ; Ende des Speicherbereiches muß null-terminiert sein
  
  result.l
  
  ;init pointers etc.
  MOV Ebx,strStr    ; Ebx = Pointer to first Char in string
  MOV Ecx,*Source   ; Ecx = Pointer to akt. Char in source
  MOV Edx, Ecx
  ADD Edx, memSizeSrc
  ADD Ecx,StartPos  ; set source-pointer to startposition
  DEC Ecx      ; StartPos - 1
  
  ! rpt_Src:        ; startpoint for loop scanning through the source
  MOV al,byte[Ecx]
  INC Ecx           ; Ecx + 1
  CMP Edx,Ecx          ; if null (end of source-string)
  JS endProc        ; -> end Procedure, Result=0
  CMP al, 65 
  JS endCase1
  CMP al, 90 
  JNS endCase1
  ADD al, 32
  ! endCase1:
  CMP byte[Ebx],al  ; if found first Char of strStr look for the rest
  JNE rpt_Src       ; else go on with next Char in source
  
  ;found the first Char of strStr in source
  ;now look if the rest does match
  ! Xor Esi,Esi             ; esi = 0, esi = Pointer to akt. Char in strStr
  ! rpt_Str:              ; startpoint for loop pounding the string
  MOV Edi, Ecx
  CMP byte[Ebx+Esi],0     ; if 0 (end of string) 
  JE got_it               ; -> got it, all Chars of the string did match
  ADD Edi, Esi
  CMP Edx,Edi               ; if null then end of source 
  JS endProc              ; -> end Procedure, Result=0
  INC Esi                 ; esi + 1 (the first Char in string is already found)
  MOV al,byte[Edi]        ; move actual Char in source to accumulator, have 
  CMP al, 65 
  JS endCase
  CMP al, 90 
  JNS endCase
  ADD al, 32
  ! endCase:
  CMP byte[Ebx+Esi],al    ; if actual Char in source (Ebx+esi) = act Char in string (al)
  JE rpt_Str              ; then take next Char
  JMP rpt_Src             ; and go on with scanning source
  
  ! got_it:
  ;Result = Ecx - Source, to get the place in the source-string
  SUB Ecx,*Source
  MOV result,Ecx
  
  ! endProc:
  ProcedureReturn result
EndProcedure


Macro FindReport()
  PrintN ("pattern found at position: " + Str(a) + " in " + Str(timeGetTime_()-StartTime ) + " milliseconds")
  PrintN (" ")
EndMacro

Text.s="axbbaaxbbaaxbbabbal axbbaaxbbaaxbbabbal axbbaaxbbaaxbbabbalaxbbaaxbbaaxbbabbal This is any text of any number of characters with no content"
       ;123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789
       ;  0        10          20        30         40      50      60          70        80        90        100       110       120         130     140         150     160

findstr.s="gxbba" ; "gxbba" very adverse for Boyer-Moore algorithm with the "axbba..."-text

;findstr.s= "anypattern" 

OpenConsole()
  
For i = 1 To 8
  Text=Text+Text
Next
Text=Text+findstr
  
loopcount = 1500

PrintN ("Quicksearch Boyer-Moore Algorithm im Basic from Paul Dwyer")
lentxt=Len(Text) 
lenfind=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=Quicksearch(1, @Text,lentxt,@findstr,lenfind)
Next i
FindReport()

PrintN ("MyFind ASM, from Jan Vooijs")
lentxt=Len(Text) 
lenfind=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=MyFind(@Text,lentxt,@findstr,lenfind,1)
Next i
FindReport()

PrintN ("PB FindString")
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=FindString(Text, findstr, 1)
Next i
FindReport()

PrintN ("inMemStr5 ASM - Boyer Moore 2")
lentxt=Len(Text)
lenfind=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=inMemStr5(1,@Text, lentxt, @findstr,lenfind)
Next i
FindReport()

PrintN ("inMemStr2 ASM")
lentxt=Len(Text)
lenfindstr=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=inMemStr2(@Text, lentxt, @findstr, lenfindstr, 1)
Next i
FindReport()

PrintN ("BMBinSearch ASM from Steve Hutchesson")
lentxt=Len(Text)
lenfindstr=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=inMemStr2(@Text, lentxt, @findstr, lenfindstr, 1)
Next i
FindReport()

PrintN ("Search in ASM naive Alg.")
lentxt=Len(Text)
lenfindstr=Len(findstr)
StartTime = timeGetTime_()
For i = 1 To loopcount
  a=inMemStr(1, @Text, lentxt, @findstr, lenfindstr)
Next i
FindReport()

Input()
CloseConsole()
End