This demo is for anyone in the future having the same challenge (read file buffer which may contain nulls, search it for a substring which also may contain nulls).
It's basically just a modification of wilberts code above but 'ready to go' in terms of searching a large file for a substring, so thankyou wilbert! plus i also added 'non-naive' variant of 'naive', plus another algo by skywalk, so thankyou especially to all the authors of the search algorithms involved!! very cool work
In my case im searching a 640mb .ISO file for a 7-byte string that only exists once, at nearly the very end of the file.
Code: Select all
EnableASM
DisableDebugger ;this is about speed, baby!
Global GbTmp.l
Global tmpcount
Structure MemoryArray
Char.c[0]
EndStructure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.i FindStringMems(offset.l, *String, LenStr.i, *Srch, LenSrch.i, Enc.i=#PB_Ascii)
; REV: 150621, skywalk
; + changes by keya: 1) added 'offset', 2) changed from 0 To 1 base As per the other algos (0=not found)
; Return byte location(1-based) of *Srch in *String.
; Useful to search an Ascii string in Unicode app(default as of v5.4+).
Protected.i lenChar
If Enc = #PB_Unicode ; 2 bytes/char
lenChar = 2
Else ; #PB_Ascii = 1 bytes/char, #PB_UTF8 = variable bytes/char
lenChar = 1
EndIf
If *Srch
Protected.i *pos = *String + (offset - 1)
lenStr * lenchar + *String
While *pos <= lenStr
If CompareMemory(*pos, *Srch, lenSrch)
ProcedureReturn *pos - *String + 1
EndIf
*pos + lenChar
Wend
Else
ProcedureReturn 0
EndIf
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l Quicksearch(StartPos.l, *MainMem, MainLen.l, *FindMem, FindLen.l)
; Quicksearch Boyer-Moore Algorithm im Basic from Paul Dwyer
*MainByteArray.MemoryArray = *MainMem
*FindByteArray.MemoryArray = *FindMem
; Build BadChr Array
Dim BadChar.l(255)
; set all alphabet to max shift pos (length of find string plus 1)
For i = 0 To 255
BadChar(i) = FindLen + 1
Next
;Update chars that are in the find string to their position from the end.
For i = 0 To FindLen -1
BadChar(*FindByteArray\Char[i]) = FindLen - i
Next
MainArrayLoop.l = StartPos
EndSearchPos.l = MainLen - FindLen
While MainArrayLoop <= EndSearchPos
If CompareMemory(*MainMem + MainArrayLoop, *FindMem, FindLen) = 1
FoundPos = MainArrayLoop + 1
Break
EndIf
;Didn't find the string so shift as per the table.
MainArrayLoop + BadChar(*MainByteArray.MemoryArray\Char[MainArrayLoop + FindLen])
Wend
ProcedureReturn FoundPos
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l MyFind( *Source.Long,SrcLen, *Pattern.Long,PatLen, StartPos.l)
;
; MyFind routine, who needs Bayer & Moore???
;
; This uses two tricks I discovered some 12 odd years ago.
; First assumption: Find out the length of BOTH strings.
; Second test all strings through the instructions "REPNE SCASB" for the length of a string (NULL ending).
; Compare strings using the instruction "REPE CMPSB"
; Third test the first character, if found test the LAST character in the string to search,
; If a match TEST BOTH strings,
; if NO match continue searching where we found the last character.
; Pro's:
; 1) We do ! Not compare ALL found strings.
; 2) Easy to implement
; 3) No intensive comparing of found first string match in de string.
;
; Con's: We have to do a liniar search, no 'skipping and sliding' like in Bayer-Moore...
;
; NEEDS to run as a EXE ! Not from the IDE, or we get false results!!!
;
! Xor Eax, Eax ; EAX = 0
! Xor Ecx, Ecx ; ECX = 0
! Not Ecx ; ECX = -1, or 4,294,967,295 max bytes to search, bloody BIG number!!
;
MOV Edi, *Pattern ; Adress of the patern in EDI..
CLD ; Looking UP (to the end if the string)..
; REPNE scasb ; Find the NULL byte..
; ! Not Ecx ; ECX is strlen (but in minus and +2), invers and included is a -1..
; DEC Ecx ; ECX -1, ECX now contains the length of the Pattern string..
MOV Ecx, PatLen
; ECX now has the length of the pattern string, test if it is longer than zero. Otherwise we DO have ! Nothing to search for..
;
CMP Ecx, 1
JG label1
MOV Eax, 0 ; Return = ! Not FOUND..
JMP labelExit
;
! label1:
; First leg is done, we know length of the search pattern. Keep it in EBX for reference..
MOV Ebx, Ecx
;
; Find the length of the source string, to prevent we search ALL memory (OOPS!)...
SUB Ecx, Ecx ; ECX = 0
! Not Ecx ; ECX = -1, or 4,294,967,295
SUB al, al ; ZERO AL or AL = 0 the byte we look for..
MOV Edi, *Source ; Adress in EDI..
CLD ; Looking UP (to the end if the string)..
; REPNE scasb ; Find the NULL byte..
; ! Not Ecx ; ECX is strlen (but in minus and +2), invers and included a -1..
; DEC Ecx ; ECX -1, ECX now contains the length of the Source string..
MOV Ecx, SrcLen
; Test if the source is a NULL string?
CMP Ecx, 1
JG label2
MOV Eax, 0 ; Return = ! Not FOUND..
JMP labelExit
;
! label2 :
MOV Edx, Ecx ; Keep a record for the length later on.
;
; Load into AH the LAST! Character of the search pattern...
MOV Esi, *Pattern ; Adress of the patern in ESI..
MOV AH, [Esi+Ebx-1] ; Now we have the LAST char of the search string..
MOV al, [Esi] ; Here we have the FIRST char of the search string..
;
MOV Edi, *Source ; Adress of the Source in EDI..
! label3:
CLD ; Looking UP (to the end if the Source string)..
REPNE scasb ; Find the FIRST character of the search string (a byte)..
JECXZ labelZeroExit ; ECX is ZERO so ! Not found in the Source, jump exit..
JMP labelCont ; Continue, jump..
;
! labelZeroExit:
MOV Eax, 0 ; ! Not found..
ProcedureReturn
! labelCont:
;
; We have a HIT!!
; Look if the LAST char of the Pattern string (which is in AH) is the same as the in the Source string on the same position!!
; If so we have a real hit if ! Not, go back and search further..
CMP AH, [Edi+Ebx-2] ; the trick of storing the length of the Pattern string. The -2 is needed because EDI shoots one further AFTER a REPNE..
JNE label3 ; IF ! Not jump and search...
;
; We have the FIRST and LAST position of the Patern string found in the Source string,
; now compare the rest of the characters (from the start+1). Keep the start in the source string AND the count..
PUSH Edi ; save these we need them if we DO ! Not have a match!!
PUSH Ecx
;
DEC Edi ; Correct the Source pointer..
MOV Ecx, Ebx ; The length of the Pattern string we want to find ! Not of the source string!!..
CLD
REPE cmpsb ; Compare the two strings (REPE = REPeat while Equal)..
;
POP Ecx ; Restore these in case we do ! Not have a match..
POP Edi
;
JE Label4 ; Jump if EQUAL (strings)..
;
; So ! Not equal ! Not found!! Search further..
MOV Esi, *Pattern ; Reload the adress of the Pattern in ESI..
JMP label3
! Label4:
;
; YES!!! We have found the string!!! And where is in ECX (coded)..
MOV Eax, Edx ; retrieve the length of the Source string..
SUB Eax, Ecx ; Now we have the length, ready!! Result = EAX register!!
ProcedureReturn
; JMP labelExit
! labelExit:
MOV Eax, 0 ; Error and/or ! Not found..
ProcedureReturn
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l inMemStr5(StartPos.l,*Source.Long,SrcLen.l,*Pattern.Long,PatLen.l)
; Boyer-Moore-Algorithmus in Flat-Assembler
; von schic nach Ausführungen der Uni Flensburg
; http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/bm2.htm
; siehe auch Darstellung von Strother Moore auf
; http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/index.html
If StartPos >=SrcLen Or *Source = 0 Or SrcLen < PatLen Or SrcLen < 2 Or *Pattern < 1
ProcedureReturn -1
EndIf
result.l
PMax.l
PMore.l
shiftTable.l = AllocateMemory($400)
MOV Ebx, PatLen
CMP Ebx, 1
JG _lbl1
MOV result, -1 ; Muster muss mind. 2 Zeichen sein
JMP _lblExit
! _lbl1:
MOV Esi, *Source
ADD Esi, SrcLen
SUB Esi, Ebx
MOV Edx, Esi ; setze Ausstiegslänge auf Source-Ende - Musterlänge
; ---------------------------------------------------------------------
; Tabelle füllen, mit Position für Zeichen die in Findstring vorkommen
; ---------------------------------------------------------------------
MOV Ecx, Ebx ; Muster-Länge in ECX
MOV Esi, *Pattern ; Adresse von Muster in ESI
MOV Edi, shiftTable ; Adresse von Shift-Tabelle in Edi
! Xor Eax, Eax ; Eax leeren, da nur al gefüllt, aber EAX zum Weiterarbeiten verwendet wird
! _lblFillTable:
MOV al, [Esi] ; Zeichen holen
INC Esi ; Zähler für Muster Eins hoch
MOV [Edi+Eax*4], Ecx ; shift-Wert für Zeichen an ASCII-Stelle in Tabelle
DEC Ecx ; shift-Wert um Eins runter
JNS _lblFillTable ; wend while Ecx > 0
; -------------------------------------------
; setup für Schleife "While t<=TMax And p>=0"
; -------------------------------------------
MOV PMax, Ebx
DEC PMax ; PMax=Len(pattern)-1
MOV PMore, Ebx ; PMore=PMax+1
MOV Ebx, shiftTable ; Zeiger auf shiftTabelle
MOV Esi, *Source ; esi = *Source
ADD Esi, StartPos ; + Startposition
MOV Edi, *Pattern ; Edi = *pattern
; ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
! _lblWhileTMax:
CMP Edx, Esi
JL _lblExit ; While t<=TMax
MOV Ecx, PMax ; Zähler zurücksetzen p=PMax
! Xor Eax, Eax ; Eax leeren, da nur al gefüllt, aber Eax als long zum Weiterarbeiten verwendet wird
! _lblWhile_p_greater_0: ; zwischen-Einsprung, da Source-Zeiger nicht verändert, kann obiger Vergleich gespart werden
MOV al, [Esi+Ecx] ; Zeichen von Source nach al
; CMP al, 65
; JL doneCase
; CMP al, 90
; JG doneCase
; ADD al, 32
; ! doneCase:
CMP al, [Edi+Ecx] ; mit Zeichen in Muster vergleichen (If PeekB(t+p)<>PeekB(@pattern+p))
JE _lblElse ; ein Zeichen ist gleich -> weitere vergleichen
MOV Eax, [Ebx+Eax*4] ; shift-Wert für source-Zeichen, von shift-Tabelle laden
CMP Eax, 0 ; wenn kein Wert (Zeichen kommt nicht im Muster vor)
JNZ _lblCalcShift ; berechne Verschiebung
ADD Esi, Ecx
INC Esi ; t=t+p+1
JMP _lblWhileTMax ; Wend
! _lblCalcShift:
ADD Eax, Ecx
SUB Eax, PMore ; tmpSkip=tmpSkip+p-PMore
JNL _lblAdd_Shift ; wenn Verschiebung < 0
MOV Eax, 1 ; kleinste Verschiebung ist 1
! _lblAdd_Shift:
ADD Esi, Eax ; t=t+tmpSkip
JMP _lblWhileTMax ; Wend
! _lblElse:
DEC Ecx ; p-1
JNS _lblWhile_p_greater_0 ; Wend While p>=0
; ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
SUB Esi, *Source
INC Esi ; t+1-@text
MOV result, Esi
! _lblExit:
FreeMemory(shiftTable)
ProcedureReturn result
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l inMemStr2(*SrcMem, memSizeSrc, *Pattern, memSizePattern, StartPos)
; memSizePattern = MemoryStringLength(*Pattern)
; memSizeSrc = MemoryStringLength(*SrcMem)
result.l
MOV Esi,*Pattern
MOV Edx,memSizePattern
! Or Edx,Edx
JZ l_fertig ;Länge *Pattern = 0
MOV Edi,*SrcMem
ADD Edi,StartPos
DEC Edi
MOV Ecx,memSizeSrc
SUB Ecx,StartPos
ADD Ecx,2
SUB Ecx,Edx
JB l_fertig ;memSizePattern > memSizeSrc
CLD ;clear direction flag -> DF=0
!l_search:
LODSB ;= mov al, [esi]: if DF=0 esi+1 (else esi-1)
; first char of pattern to al
REPNZ scasb ;Al - [edi]:edi+1:ecx-1 and repeat while not null or ecx>0
; compare first char of pattern with source until it matches or ecx is counted down
JNZ l_fertig ;Al - [edi]<>0 but ecx=0 (end of SrcMem reached)
MOV Eax,Edi
MOV Ebx,Ecx
MOV Ecx,Edx
DEC Ecx ;
REPZ cmpsb ; [esi] - [edi]: (if DF=0) esi+1: edi+1: ecx-1 and repeat while null or ecx>0
JZ l_found ; [esi] - [edi] was 0 and ecx is 0 -> whole pattern matched
;else ecx is 0 but [esi]-[edi] <> 0
MOV Edi,Eax
MOV Ecx,Ebx
MOV Esi,*Pattern
JMP l_search
!l_found:
SUB Eax,*SrcMem
MOV result,Eax
!l_fertig:
ProcedureReturn result
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure BMBinSearch (StartPos.l,lpSource.l,srcLngth.l,lpSubStr.l,subLngth.l)
; Boyer Moore Exact pattern Matching Algorithms
; ASM-Code from
; Steve Hutchesson
; transfered to (PB-)FASM with slight modifications from schic
;LOCAL cval :DWORD
cVal.l
;LOCAL shift_table[256]:DWORD
shift_table = AllocateMemory($400)
result.l
MOV Ebx, subLngth
CMP Ebx, 1
JG jump1
MOV result, -2 ; string too short, must be > 1
JMP Cleanup
! jump1:
MOV Esi, lpSource
ADD Esi, srcLngth
SUB Esi, Ebx
MOV Edx, Esi ; set Exit Length: edx = pointer to end of source - srcLngth
; ----------------------------------------
; load shift table with value in subLngth
; ----------------------------------------
MOV Ecx, 256
MOV Eax, Ebx
MOV Edi, shift_table
REP stosd
; ----------------------------------------------
; load decending count values into shift table
; ----------------------------------------------
MOV Ecx, Ebx ; SubString length in ECX
DEC Ecx ; correct for zero based index
MOV Esi, lpSubStr ; address of SubString in ESI
MOV Edi, shift_table
! Xor Eax, Eax
! Write_Shift_Chars:
MOV al, [Esi] ; get the character
INC Esi
MOV [Edi+Eax*4], Ecx ; write shift for each character
DEC Ecx ; to ascii location in table
JNZ Write_Shift_Chars
; -----------------------------
; set up for main compare loop
; -----------------------------
MOV Ecx, Ebx ;Ecx = subLngth
DEC Ecx ;Ecx - 1
MOV cVal, Ecx ;cval = subLngth -1
MOV Esi, lpSource ;esi = lpSource
MOV Edi, lpSubStr ;Edi = lpSubStr
ADD Esi, StartPos ; add starting position
JMP Pre_Loop
; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
! Calc_Suffix_Shift: ;source-char is in SubStr
ADD Eax, Ecx ;add counter in SubStr to value of shift-table
SUB Eax, cVal ; sub loop count (cval = subLngth -1)
JNS Add_Suffix_Shift
MOV Eax, 1 ; minimum shift is 1
! Add_Suffix_Shift:
ADD Esi, Eax ; add SUFFIX shift
MOV Ecx, cVal ; reset counter in compare loop
! Test_Length:
CMP Edx, Esi ; test exit condition: esi <= pointer to end of source - srcLngth
JL No_Match ;if esi > edx (pointer to end of source) then search is ending -> no match
! Pre_Loop:
! Xor Eax, Eax ; zero EAX for following partial writes
MOV al, [Esi+Ecx] ;char from lpSource to al
CMP al, [Edi+Ecx] ;compare characters from lpSource (esi) to char from lpSubStr (Edi)
JE Jump2
;MOV eax, shift_table[eax*4]
SHL Eax, 2h ;if lpSource (esi) = char from lpSubStr clc shift-value
ADD Eax,shift_table
MOV Eax, [Eax] ;load shift value for source-char from shift-table
CMP Ebx, Eax ;compare shift value with subLngth
JNE Calc_Suffix_Shift;Add_Suffix_Shift ;if source-char not in SubStr -> bypass SUFFIX calculations
LEA Esi, [Esi+Ecx+1] ;if source-char is in SubStr -> add BAD CHAR shift (esi=lpSource+lpSubStr+1)
JMP Test_Length
! Jump2:
DEC Ecx
! Xor Eax, Eax ; zero EAX for following partial writes
! Cmp_Loop:
MOV al, [Esi+Ecx]
CMP al, [Edi+Ecx] ; cmp characters in ESI / EDI
JNE Set_Shift ; if not equal, get next shift
DEC Ecx
JNS Cmp_Loop
JMP foundmatch ; fall through on match
! Set_Shift:
;MOV eax, shift_table[eax*4]
SHL Eax, 2h
ADD Eax,shift_table
MOV Eax, [Eax]
CMP Ebx, Eax
JNE Calc_Suffix_Shift ;if source-char <> subLngth -> source-char is in SubStr -> run SUFFIX calculations
LEA Esi, [Esi+Ecx+1] ; add BAD CHAR shift
JMP Test_Length
; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
! foundmatch:
SUB Esi, lpSource ; sub source from ESI
;MOV eax, esi ; put length in eax
INC Esi
MOV result, Esi
JMP Cleanup
! No_Match:
MOV Eax, -1
! Cleanup:
FreeMemory(shift_table)
ProcedureReturn result
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l inMemStrNaive(StartPos, *Source, memSizeSrc, strStr, memSizePattern)
; naiver Algorithmus überprüft ein gesuchtes Muster an
; allen Positionen i eines Speicherbereichs
; Ende des Speicherbereiches muß null-terminiert sein
result.l
;init pointers etc.
MOV Ebx,strStr ; Ebx = Pointer to first Char in string
MOV Ecx,*Source ; Ecx = Pointer to akt. Char in source
MOV Edx, Ecx
ADD Edx, memSizeSrc
ADD Ecx,StartPos ; set source-pointer to startposition
DEC Ecx ; StartPos - 1
! rpt_Src: ; startpoint for loop scanning through the source
MOV al,byte[Ecx]
INC Ecx ; Ecx + 1
CMP Edx,Ecx ; if null (end of source-string)
JS endProc ; -> end Procedure, Result=0
CMP al, 65
JS endCase1
CMP al, 90
JNS endCase1
ADD al, 32
! endCase1:
CMP byte[Ebx],al ; if found first Char of strStr look for the rest
JNE rpt_Src ; else go on with next Char in source
;found the first Char of strStr in source
;now look if the rest does match
! Xor Esi,Esi ; esi = 0, esi = Pointer to akt. Char in strStr
! rpt_Str: ; startpoint for loop pounding the string
MOV Edi, Ecx
CMP byte[Ebx+Esi],0 ; if 0 (end of string)
JE got_it ; -> got it, all Chars of the string did match
ADD Edi, Esi
CMP Edx,Edi ; if null then end of source
JS endProc ; -> end Procedure, Result=0
INC Esi ; esi + 1 (the first Char in string is already found)
MOV al,byte[Edi] ; move actual Char in source to accumulator, have
CMP al, 65
JS endCase
CMP al, 90
JNS endCase
ADD al, 32
! endCase:
CMP byte[Ebx+Esi],al ; if actual Char in source (Ebx+esi) = act Char in string (al)
JE rpt_Str ; then take next Char
JMP rpt_Src ; and go on with scanning source
! got_it:
;Result = Ecx - Source, to get the place in the source-string
SUB Ecx,*Source
MOV result,Ecx
! endProc:
ProcedureReturn result
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Procedure.l inMemNonNaive(StartPos, *Source, memSizeSrc, strStr, memSizePattern)
;This is schic's original "Naive" (case-insensitive) search... but I simply commented-out the
;case-insensitivity code to make this "Non-Naive" Case-sensitive exact string match variant :P
; naiver Algorithmus überprüft ein gesuchtes Muster an
; allen Positionen i eines Speicherbereichs
; Ende des Speicherbereiches muß null-terminiert sein
result.l
;init pointers etc.
MOV Ebx,strStr ; Ebx = Pointer to first Char in string
MOV Ecx,*Source ; Ecx = Pointer to akt. Char in source
MOV Edx, Ecx
ADD Edx, memSizeSrc
ADD Ecx,StartPos ; set source-pointer to startposition
DEC Ecx ; StartPos - 1
! rpt_SrcNN: ; startpoint for loop scanning through the source
MOV al,byte[Ecx]
INC Ecx ; Ecx + 1
CMP Edx,Ecx ; if null (end of source-string)
JS endProcNN ; -> end Procedure, Result=0
;CMP al, 65 ;\
;JS endCase1 ;|
;CMP al, 90 ;| Case sensitivity code
;JNS endCase1 ;| commented out by Keya (if i can take credit for commenting-out!?!? :)
;ADD al, 32 ;|
;! endCase1: ;/
CMP byte[Ebx],al ; if found first Char of strStr look for the rest
JNE rpt_SrcNN ; else go on with next Char in source
;found the first Char of strStr in source
;now look if the rest does match
! Xor Esi,Esi ; esi = 0, esi = Pointer to akt. Char in strStr
! rpt_StrNN: ; startpoint for loop pounding the string
MOV Edi, Ecx
CMP byte[Ebx+Esi],0 ; if 0 (end of string)
JE got_itNN ; -> got it, all Chars of the string did match
ADD Edi, Esi
CMP Edx,Edi ; if null then end of source
JS endProcNN ; -> end Procedure, Result=0
INC Esi ; esi + 1 (the first Char in string is already found)
MOV al,byte[Edi] ; move actual Char in source to accumulator, have
;CMP al, 65 ;\
;JS endCase ;|
;CMP al, 90 ;| Case sensitivity code
;JNS endCase ;|
;ADD al, 32 ;|
;! endCase: ;/
CMP byte[Ebx+Esi],al ; if actual Char in source (Ebx+esi) = act Char in string (al)
JE rpt_StrNN ; then take next Char
JMP rpt_SrcNN ; and go on with scanning source
! got_itNN:
;Result = Ecx - Source, to get the place in the source-string
SUB Ecx,*Source
MOV result,Ecx
! endProcNN:
ProcedureReturn result
EndProcedure
;---------------------------------------------------------------------------------------------------------------------------------
Macro FindReport()
PrintN ("pattern found at position: " + Str(a) + " in " + Str(timeGetTime_()-StartTime ) + " milliseconds")
PrintN (" ")
EndMacro
;PARAMETERS:
HaystackFile.s = "c:\temp\TEST.ISO" ;The file containing the haystack buffer
*needle = AllocateMemory(64) ;Allocate some memory for our needle search buffer
PokeS(*needle, "333333p") ;The needle substring to find in the file
lenneedle = 7 ;Length of needle buf, but we dont want to use Len() as thats for nul-terminated strings
loopcount = 1 ;How many loops to use (good for time tests)
offset.l = 1 ;Offset To start searching from
OpenConsole()
Print("Loading file into memory... ")
hFile = ReadFile(#PB_Any, HaystackFile.s)
If hFile
lenhaystack.l = Lof(hFile)
*haystack = AllocateMemory(lenhaystack)
If *haystack
lenhaystack = ReadData(hFile, *haystack, lenhaystack)
PrintN("read " + Str(lenhaystack))
Else
PrintN("Error from AllocateMemory"): Input()
End
EndIf
CloseFile(hFile)
Else
PrintN("Error from ReadFile"): Input()
End
EndIf
PrintN(" ")
;TESTS ---------------------------------------------------------------------------------------------------------------------------
;PrintN ("PB FindString") ;cannot use PB native FindString for this test due to nulls
;StartTime = timeGetTime_()
;For i = 1 To loopcount
; a=FindString(*haystack, findstr, 1)
;Next i
;FindReport()
;Do one search first for cache reasons
a=Quicksearch(offset, *haystack,lenhaystack,*needle,lenneedle)
PrintN ("Quicksearch Boyer-Moore Algorithm in PB, from Paul Dwyer")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=Quicksearch(offset, *haystack,lenhaystack,*needle,lenneedle)
Next i
FindReport()
PrintN ("MyFind ASM, from Jan Vooijs")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=MyFind(*haystack,lenhaystack,*needle,lenneedle,offset)
Next i
FindReport()
PrintN ("inMemStr5 ASM - Boyer Moore 2, from schic")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=inMemStr5(offset,*haystack, lenhaystack, *needle,lenneedle)
Next i
FindReport()
PrintN ("inMemStr2 ASM, from schic")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=inMemStr2(*haystack, lenhaystack, *needle, lenneedle, offset)
Next i
FindReport()
PrintN ("FindStringMems in PB, from skywalk")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=FindStringMems(offset, *haystack,lenhaystack,*needle,lenneedle, #PB_Ascii)
Next i
FindReport()
PrintN ("'Naive' ASM (case-insensitive), from schic")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=inMemStrNaive(offset, *haystack, lenhaystack, *needle, lenneedle)
Next i
FindReport()
PrintN ("'NON-Naive' ASM (case-sensitive), from schic")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=inMemNonNaive(offset, *haystack, lenhaystack, *needle, lenneedle)
Next i
FindReport()
PrintN ("BMBinSearch ASM, from Steve Hutchesson")
StartTime = timeGetTime_()
For i = 1 To loopcount
a=BMBinSearch(offset, *haystack, lenhaystack, *needle, lenneedle)
Next i
FindReport()
Print("Done! Press enter to continue...")
Input()
CloseConsole()
End
Very impressive performance! And perhaps some candidates for a PB native FindData() function *wink wink nudge nudge*
Of course, everyones search criteria are different (small/large haystack, small/large needle, repeating data, partial matches and so on!) and i know its very important to choose your search algorithm accordingly, but wow that inMemStr5 routine looks an absolute winner for my particular needs heehee!
Im not sure why the BM search is failing. It does work in some situations, but other times when it manages to finish it's finding just 1 of the search bytes and exiting... but not the first occurance of that byte, so im really not sure whats going on. And other times still it'll just get stuck in an infinite loop. Perhaps it just doesnt like the latest compilers and needs a redo, i dunno.
"Naive" is actually a case-insensitive search, but as you can see she's still very fast! ... so I simply commented-out the case-sensitivity code to create a "non-Naive" variant, which, as it turns out is actually the 2nd fastest of them all heehee! a hidden gem there!
Anyway i am now no longer stuck and have a few verrryy nice algorithms to choose from which accomplish this first part of my task, so I can proceed to the next stage now the shackles are slowly coming off heehee, thankyou!!
btw im pretty sure all of the above algorithms use single-byte ("cmp al") searching. It would be interesting to see one that uses 32bits:) (but can Boyer-Moore etc be used in that way? im not sure)
ps. wilbert just one minor thingy in the code you posted, youre doing the inMemStr2 test twice (the second time is meant to be call to BMBinSearch!)
psps. if anyone gets bored or has the time there are
in masm32\m32lib\ (bm.asm/bmh.asm/sbm.asm) thatd make sweet ports to Purebasic! their code is tiny (BMBinsearch above is one of them)