IdeasVacuum wrote:Looks like a convincing result in favour of the Tuned Boyer-Moore algorithm. What is the performance though with a huge file? Say 1 gig?
The bigger problem is the time used to read the file, 33MB take about 1 Sec in exe mode and 9-12 Sec in debug mode.
How can I improve the read file speed? I'm reading one byte/time.Look:
Code: Select all
Procedure.b GetByte(FileName.s,Array Output.a(1), Size.l=0, START.l=0)
Protected Status.a, i.l
If FileSize(FileName)
Status = OpenFile(1,FileName)
If Status <> 0
If Size>0
Else
Size=FileSize(FileName)
EndIf
ReDim Output(Size)
FileSeek(1,START)
While i<=(Size)
Output(i)=ReadAsciiCharacter(1)
i+1
Wend
CloseFile(1)
ProcedureReturn Status
Else
ProcedureReturn Status
EndIf
EndIf
EndProcedure
Here is the code, try it, you must change for string search.
Code: Select all
Procedure.l Max(Number1.l,Number2.l)
If Number1>Number2
ProcedureReturn Number1
Else
ProcedureReturn Number2
EndIf
EndProcedure
Procedure suffixes(Array Source.a(1), Array Suff.i(1));
;void suffixes(char *x, int m, int *suff)
Protected.i F, g, i, m
m = ArraySize(Source())
Suff(m - 1) = m;
g = m - 1;
For i = m - 2 To 0 Step -1 ; i >= 0; --i)
If (i > g And Suff(i + m - 1 - F) < i - g)
Suff(i) = Suff(i + m - 1 - F);
Else
If (i < g)
g = i;
EndIf
F = i;
While (g >= 0 And Source(g) = Source(g + m - 1 - F))
g-1
Wend
Suff(i) = F - g;
EndIf
Next
EndProcedure
Procedure preBmBc(Array Source.a(1),Array bmBc.i(1))
Protected.i i, m;
m = ArraySize(Source())
While i<=255
bmBc(i) = m;
i + 1
Wend
i = 0
While i < m
bmBc(Source(i)) = m - i - 1;
i + 1
Wend
EndProcedure
Procedure preBmGs(Array Source.a(1),Array bmGs.i(1))
Protected.i i, j, m
Protected Dim Suff(255);
m = ArraySize(Source())
suffixes(Source(), Suff());
For i = 0 To m
bmGs(i) = m;
Next
j = 0;
For i = m - 1 To 0 Step -1 ; i >= 0; --i
If Suff(i) = i + 1
For j = 0 To m - 1 - i
If bmGs(j) = m
bmGs(j) = m - 1 - i;
EndIf
Next
EndIf
Next
For i = 0 To m - 2 ; i <= m - 2; ++i)
bmGs(m - 1 - Suff(i)) = m - 1 - i;
Next
EndProcedure
;Boyer-Moore
Procedure BM(START.l,Array Source.a(1), Array Pattern.a(1))
Protected.i i, j, m, n
Protected Dim bmGs(255)
Protected Dim bmBc(255)
m = ArraySize(Pattern())
n = ArraySize(Source())
; Preprocessing
preBmGs(Pattern(), bmGs())
preBmBc(Pattern(), bmBc())
; Searching
j = Max(0,START-1)
While j <= n - m
i = m - 1
While i >= 0 And (Pattern(i) = Source(i + j))
i - 1
Wend
If i < 0
ProcedureReturn(j)
j + bmGs(0)
Else
If bmGs(i)> bmBc(Source(i + j)) - m + 1 + i
j + bmGs(i)
Else
j + bmBc(Source(i + j)) - m + 1 + i
EndIf
EndIf
Wend
ProcedureReturn 0
EndProcedure
; Tuned Boyer-Moore
Procedure TUNEDBM(START.l,Array Source.a(1), Array Pattern.a(1))
Protected.i i, j, m, n, k, match, shift
Protected.i Dim bmBc(255)
m = ArraySize(Pattern())
n = ArraySize(Source())
; Preprocessing
preBmBc(Pattern(), bmBc())
shift = Max(bmBc(Pattern(m - 1)),1)
bmBc(Pattern(m - 1)) = 0
; Searching
j = Max(0,START-1)
While j < n
While (j + m < n) And (bmBc(Source(j + m -1)) <> 0)
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
Wend
While i <= m-1
If Source(j + i) = Pattern(i)
match+1
If match + 1 = m
ProcedureReturn j
EndIf
Else
match=0
i=0
Break
EndIf
i+1
Wend
j + shift ; shift
Wend
ProcedureReturn 0
EndProcedure
This is adapted to Array search.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.