Page 1 of 2
Fastest Array(Byte) Search Procedure
Posted: Tue Oct 19, 2010 2:59 am
by Vitor_Boss®
Hi, I'm new on PB and my application need a pattern search algorithm and i wrote this.
Brute Force algorithm
Code: Select all
Procedure.l FindArray (Start.l ,Array Source.a(1), Array Pattern.a(1))
Protected.l Position, Pas, Found, Begin, PatternSize, Size
If Start <> 0
Position = Start
EndIf
Begin = Pattern(0)
PatternSize = ArraySize(Pattern())-1
Size = ArraySize(Source()) - PatternSize
While Position <> Size
If Source(Position) = Begin
While Pas < PatternSize
Pas+1
If (Source(Position + Pas) = Pattern(Pas)); Or Pattern(Pas) = $FF; The last comparator is for ignore
Else
Pas=0
Break
EndIf
If Pas = PatternSize
ProcedureReturn Position
EndIf
Wend
EndIf
Position+1
Wend
ProcedureReturn 0
EndProcedure
The problem is the time used for compare both, I'm using PB demo 4.50 x86 on a x64 OS. How to increase the speed??
Re: Array Search Procedure
Posted: Tue Oct 19, 2010 4:52 pm
by IdeasVacuum
Not entirely sure how your code works Vitor but you could try PB's Regular Expressions. They are not so easy to define and the Help is virtually non-existent but if you can apply a regex to your problem then it should be fast.
Re: Array Search Procedure
Posted: Tue Oct 19, 2010 8:22 pm
by Vitor_Boss®
Could you give me an example?
I'm new here.
Re: Array Search Procedure
Posted: Wed Oct 20, 2010 3:18 am
by IdeasVacuum
...I think my help would be much more a hindrance in this case
Didelphodon has made a terrific little app which you can use to experiment with regex:
http://www.purebasic.fr/english/viewtop ... 12&start=0
...and these links explain how to make a regex:
http://www.regular-expressions.info/tutorialcnt.html
http://en.wikipedia.org/wiki/Regular_ex ... s#Examples
Plus of course you need to take a look at the PB Help:
http://www.purebasic.com/documentation/ ... index.html
Re: Array Search Procedure
Posted: Wed Oct 20, 2010 7:40 am
by blueznl
Re: Array Search Procedure
Posted: Wed Oct 20, 2010 10:26 pm
by Vitor_Boss®
I've found one really faster search algorithm but need translate.
Tuned Boyer-Moore algorithm
Code: Select all
Procedure preBmBc(Array Source.a(1),Array bmBc.i(1))
Protected.i i, m;
m = ArraySize(Source())
While i<=255
bmBc(i) = m
i + 1
Wend
i = 0
While i < m
bmBc(Source(i)) = m - i - 1
i + 1
Wend
EndProcedure
Procedure TUNEDBM(Start.l,Array Source.a(1), Array Pattern.a(1))
Protected.i i, j, m, n, k, match, shift
Protected.i Dim bmBc(255)
m = ArraySize(Pattern())
n = ArraySize(Source())
; Preprocessing
preBmBc(Pattern(), bmBc())
shift = Max(bmBc(Pattern(m - 1)),1)
bmBc(Pattern(m - 1)) = 0
; Searching
j = Max(0,Start-1)
While j < n
k = bmBc(Source(j + m -1))
While k <> 0
j + k
k = bmBc(Source(j + m -1))
j + k
k = bmBc(Source(j + m -1))
j + k
If (j+m-1)> n
Break
EndIf
k = bmBc(Source(j + m -1))
Wend
While i <= m-1
If Source(j + i) = Pattern(i)
match+1
If match + 1 = m
ProcedureReturn j
EndIf
Else
match=0
i=0
Break
EndIf
i+1
Wend
j + shift ; shift
Wend
ProcedureReturn 0
EndProcedure
Who can help?
Re: Array Search Procedure
Posted: Thu Oct 21, 2010 4:02 am
by rsts
There are several fast search algorithms in the forum if you search.
Re: Array Search Procedure
Posted: Thu Oct 21, 2010 9:37 pm
by idle
It might also help to explain what your data is and what your searching for, might be faster alternatives.
Re: Array Search Procedure
Posted: Thu Oct 21, 2010 9:41 pm
by Vitor_Boss®
I've tried search, the forums search is a bad engine.
This is a ported
Knuth-Morris-Pratt algorithm. I spend less than 20 minutes to port this.
Code: Select all
Procedure preKmp(Array Source.a(1), Array kmpNext.i(1))
Protected.i i, j, m
m = ArraySize(Source())
i = 0
kmpNext(0) = -1
j = kmpNext(0)
While i < m
While j > -1 And Source(i) <> Source(j)
j = kmpNext(j)
Wend
i + 1
j + 1
If Source(i) = Source(j)
kmpNext(i) = kmpNext(j)
Else
kmpNext(i) = j
EndIf
Wend
EndProcedure
Procedure kmp_search(Start.l,Array Source.a(1), Array Pattern.a(1))
Protected.i m, i, j, n
Protected Dim kmpNext.i(ArraySize(Pattern()))
preKmp(Pattern(), kmpNext())
j = ArraySize(Source())
n = ArraySize(Pattern())
m = Start-1
While m+i < j
If Pattern(i) = Source(m+i) Or Pattern(i) = $FF
If i = n-1
ProcedureReturn m
EndIf
i + 1
Else
m = m + i - kmpNext(i)
If kmpNext(i) > -1
i = kmpNext(i)
Else
i = 0
EndIf
EndIf
Wend
ProcedureReturn 0
EndProcedure
My data is a Sony Ericsson Firmware and I load it as a byte Array and search for random pattern for Automatic patch porting.
This is a list of string search algorithms:
http://www-igm.univ-mlv.fr/~lecroq/string/node1.html
Lets port some stuff.

Re: Array Search Procedure
Posted: Fri Oct 22, 2010 1:22 am
by IdeasVacuum
The Forum search is not so great but there is a really excellent way to use Goggle to specifically search the PB forum:
http://www.purebasic.fr/english/viewtop ... 61&start=0
It turns-up one interesting discussion in particular:
http://www.purebasic.fr/english/viewtopic.php?p=297990
Re: Array Search Procedure
Posted: Sat Oct 23, 2010 1:04 am
by Vitor_Boss®
Finally I've found a better solution, I've ported the Boyer-Moore algorithm and it is very faster.
The difference between KMP algorithm and BM algorithm searching a 32 bytes pattern on a 33MB file is about 410ms on my PC.
Re: Array Search Procedure
Posted: Mon Nov 01, 2010 2:38 pm
by Vitor_Boss®
Searching for a 8 bytes Pattern length in a 33MB file i got this results:
Code: Select all
1DD8AA4- Tuned Boyer-Moore algorithm = 47.0ms
1DD8AA4- Smith algorithm = 94.0ms
1DD8AA4- Boyer-Moore algorithm = 93.0ms
1DD8AA4- Turbo BM algorithm = 156.0ms
1DD8AA4- Galil-Giancarlo algorithm = 421.0ms
1DD8AA4- Knuth-Morris-Pratt algorithm = 375.0ms
Another file:
92D26E- uned Boyer-Moore algorithm = 15.0ms
92D26E- Smith algorithm = 31.0ms
92D26E- Boyer-Moore algorithm = 32.0ms
92D26E- Turbo BM algorithm = 46.0ms
92D26E- Galil-Giancarlo algorithm = 125.0ms
92D26E- Knuth-Morris-Pratt algorithm = 125.0ms
If someone need the code post here.

Re: Array Search Procedure
Posted: Tue Nov 02, 2010 10:22 pm
by IdeasVacuum
Looks like a convincing result in favour of the Tuned Boyer-Moore algorithm. What is the performance though with a huge file? Say 1 gig?
Re: Array Search Procedure
Posted: Wed Nov 03, 2010 12:16 am
by rsts
Vitor_Boss® wrote:Searching for a 8 bytes Pattern length in a 33MB file i got this results:
Code: Select all
1DD8AA4- Tuned Boyer-Moore algorithm = 47.0ms
1DD8AA4- Smith algorithm = 94.0ms
1DD8AA4- Boyer-Moore algorithm = 93.0ms
1DD8AA4- Turbo BM algorithm = 156.0ms
1DD8AA4- Galil-Giancarlo algorithm = 421.0ms
1DD8AA4- Knuth-Morris-Pratt algorithm = 375.0ms
Another file:
92D26E- uned Boyer-Moore algorithm = 15.0ms
92D26E- Smith algorithm = 31.0ms
92D26E- Boyer-Moore algorithm = 32.0ms
92D26E- Turbo BM algorithm = 46.0ms
92D26E- Galil-Giancarlo algorithm = 125.0ms
92D26E- Knuth-Morris-Pratt algorithm = 125.0ms
If someone need the code post here.

I'm using the assembler Boyer-Moore algorithm from
http://www.purebasic.fr/english/viewtop ... d&start=22
Yes, I would like to see your code. I would be interested in testing it against that, unless you're using equivalent code.
cheers
Re: Array Search Procedure
Posted: Wed Nov 03, 2010 2:04 am
by Vitor_Boss®
IdeasVacuum wrote:Looks like a convincing result in favour of the Tuned Boyer-Moore algorithm. What is the performance though with a huge file? Say 1 gig?
The bigger problem is the time used to read the file, 33MB take about 1 Sec in exe mode and 9-12 Sec in debug mode.
How can I improve the read file speed? I'm reading one byte/time.Look:
Code: Select all
Procedure.b GetByte(FileName.s,Array Output.a(1), Size.l=0, START.l=0)
Protected Status.a, i.l
If FileSize(FileName)
Status = OpenFile(1,FileName)
If Status <> 0
If Size>0
Else
Size=FileSize(FileName)
EndIf
ReDim Output(Size)
FileSeek(1,START)
While i<=(Size)
Output(i)=ReadAsciiCharacter(1)
i+1
Wend
CloseFile(1)
ProcedureReturn Status
Else
ProcedureReturn Status
EndIf
EndIf
EndProcedure
Here is the code, try it, you must change for string search.
Code: Select all
Procedure.l Max(Number1.l,Number2.l)
If Number1>Number2
ProcedureReturn Number1
Else
ProcedureReturn Number2
EndIf
EndProcedure
Procedure suffixes(Array Source.a(1), Array Suff.i(1));
;void suffixes(char *x, int m, int *suff)
Protected.i F, g, i, m
m = ArraySize(Source())
Suff(m - 1) = m;
g = m - 1;
For i = m - 2 To 0 Step -1 ; i >= 0; --i)
If (i > g And Suff(i + m - 1 - F) < i - g)
Suff(i) = Suff(i + m - 1 - F);
Else
If (i < g)
g = i;
EndIf
F = i;
While (g >= 0 And Source(g) = Source(g + m - 1 - F))
g-1
Wend
Suff(i) = F - g;
EndIf
Next
EndProcedure
Procedure preBmBc(Array Source.a(1),Array bmBc.i(1))
Protected.i i, m;
m = ArraySize(Source())
While i<=255
bmBc(i) = m;
i + 1
Wend
i = 0
While i < m
bmBc(Source(i)) = m - i - 1;
i + 1
Wend
EndProcedure
Procedure preBmGs(Array Source.a(1),Array bmGs.i(1))
Protected.i i, j, m
Protected Dim Suff(255);
m = ArraySize(Source())
suffixes(Source(), Suff());
For i = 0 To m
bmGs(i) = m;
Next
j = 0;
For i = m - 1 To 0 Step -1 ; i >= 0; --i
If Suff(i) = i + 1
For j = 0 To m - 1 - i
If bmGs(j) = m
bmGs(j) = m - 1 - i;
EndIf
Next
EndIf
Next
For i = 0 To m - 2 ; i <= m - 2; ++i)
bmGs(m - 1 - Suff(i)) = m - 1 - i;
Next
EndProcedure
;Boyer-Moore
Procedure BM(START.l,Array Source.a(1), Array Pattern.a(1))
Protected.i i, j, m, n
Protected Dim bmGs(255)
Protected Dim bmBc(255)
m = ArraySize(Pattern())
n = ArraySize(Source())
; Preprocessing
preBmGs(Pattern(), bmGs())
preBmBc(Pattern(), bmBc())
; Searching
j = Max(0,START-1)
While j <= n - m
i = m - 1
While i >= 0 And (Pattern(i) = Source(i + j))
i - 1
Wend
If i < 0
ProcedureReturn(j)
j + bmGs(0)
Else
If bmGs(i)> bmBc(Source(i + j)) - m + 1 + i
j + bmGs(i)
Else
j + bmBc(Source(i + j)) - m + 1 + i
EndIf
EndIf
Wend
ProcedureReturn 0
EndProcedure
; Tuned Boyer-Moore
Procedure TUNEDBM(START.l,Array Source.a(1), Array Pattern.a(1))
Protected.i i, j, m, n, k, match, shift
Protected.i Dim bmBc(255)
m = ArraySize(Pattern())
n = ArraySize(Source())
; Preprocessing
preBmBc(Pattern(), bmBc())
shift = Max(bmBc(Pattern(m - 1)),1)
bmBc(Pattern(m - 1)) = 0
; Searching
j = Max(0,START-1)
While j < n
While (j + m < n) And (bmBc(Source(j + m -1)) <> 0)
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
j + bmBc(Source(j + m -1))
If j + m >= n: Break: EndIf
Wend
While i <= m-1
If Source(j + i) = Pattern(i)
match+1
If match + 1 = m
ProcedureReturn j
EndIf
Else
match=0
i=0
Break
EndIf
i+1
Wend
j + shift ; shift
Wend
ProcedureReturn 0
EndProcedure
This is adapted to Array search.