Fastest Array(Byte) Search Procedure

Everything else that doesn't fall into one of the other PB categories.
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Fastest Array(Byte) Search Procedure

Post by Vitor_Boss® »

Hi, I'm new on PB and my application need a pattern search algorithm and i wrote this.
Brute Force algorithm

Code: Select all

Procedure.l FindArray (Start.l ,Array Source.a(1), Array Pattern.a(1))
  Protected.l Position, Pas, Found, Begin, PatternSize, Size
  If Start <> 0 
    Position = Start
  EndIf
  Begin = Pattern(0)
  PatternSize = ArraySize(Pattern())-1
  Size = ArraySize(Source()) - PatternSize
  While Position <> Size
    If Source(Position) = Begin
      While Pas < PatternSize
        Pas+1
        If (Source(Position + Pas) = Pattern(Pas)); Or Pattern(Pas) = $FF; The last comparator is for ignore
        Else
          Pas=0
          Break
        EndIf
        If Pas = PatternSize
          ProcedureReturn Position
        EndIf    
      Wend  
    EndIf  
    Position+1  
  Wend
  ProcedureReturn 0
EndProcedure
The problem is the time used for compare both, I'm using PB demo 4.50 x86 on a x64 OS. How to increase the speed??
Last edited by Vitor_Boss® on Wed Jan 26, 2011 9:12 pm, edited 3 times in total.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Array Search Procedure

Post by IdeasVacuum »

Not entirely sure how your code works Vitor but you could try PB's Regular Expressions. They are not so easy to define and the Help is virtually non-existent but if you can apply a regex to your problem then it should be fast.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

Could you give me an example?

I'm new here.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Array Search Procedure

Post by IdeasVacuum »

...I think my help would be much more a hindrance in this case :)

Didelphodon has made a terrific little app which you can use to experiment with regex:
http://www.purebasic.fr/english/viewtop ... 12&start=0

...and these links explain how to make a regex:
http://www.regular-expressions.info/tutorialcnt.html
http://en.wikipedia.org/wiki/Regular_ex ... s#Examples

Plus of course you need to take a look at the PB Help:
http://www.purebasic.com/documentation/ ... index.html
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Re: Array Search Procedure

Post by blueznl »

Here's another PureBasic oriented explanation:

http://www.xs4all.nl/~bluez/purebasic/p ... 26.htm#top
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

I've found one really faster search algorithm but need translate.
Tuned Boyer-Moore algorithm

Code: Select all

Procedure preBmBc(Array Source.a(1),Array bmBc.i(1))
   Protected.i i, m;
   m = ArraySize(Source())
   While i<=255
     bmBc(i) = m
     i + 1
   Wend
   i = 0
   While i < m
     bmBc(Source(i)) = m - i - 1
     i + 1
   Wend
EndProcedure

Procedure TUNEDBM(Start.l,Array Source.a(1), Array Pattern.a(1))
   Protected.i i, j, m, n, k, match, shift
   Protected.i Dim bmBc(255)   
   m = ArraySize(Pattern())
   n = ArraySize(Source()) 
   
   ; Preprocessing 
   preBmBc(Pattern(), bmBc())
   shift = Max(bmBc(Pattern(m - 1)),1)
   bmBc(Pattern(m - 1)) = 0   

   ; Searching
   j = Max(0,Start-1)
   While j < n
	  k = bmBc(Source(j + m -1))
	  While k <> 0
		 j + k
		 k = bmBc(Source(j + m -1))
		 j + k
		 k = bmBc(Source(j + m -1))
		 j + k
		 If (j+m-1)> n
		   Break
		 EndIf  
		 k = bmBc(Source(j + m -1))
	  Wend
	  While i <= m-1
      If Source(j + i) = Pattern(i)        
        match+1
        If match + 1 = m
          ProcedureReturn j
        EndIf
		  Else
		    match=0
		    i=0
		    Break
		  EndIf
		  i+1
	  Wend    
	  j + shift ; shift
	Wend
	ProcedureReturn 0
EndProcedure
Who can help?
Last edited by Vitor_Boss® on Mon Nov 01, 2010 2:21 pm, edited 1 time in total.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Re: Array Search Procedure

Post by rsts »

There are several fast search algorithms in the forum if you search.
User avatar
idle
Always Here
Always Here
Posts: 5838
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Array Search Procedure

Post by idle »

It might also help to explain what your data is and what your searching for, might be faster alternatives.
Windows 11, Manjaro, Raspberry Pi OS
Image
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

I've tried search, the forums search is a bad engine.

This is a ported Knuth-Morris-Pratt algorithm. I spend less than 20 minutes to port this.

Code: Select all

Procedure preKmp(Array Source.a(1), Array kmpNext.i(1))
   Protected.i i, j, m
   m = ArraySize(Source())
   i = 0
   kmpNext(0) = -1
   j = kmpNext(0)
   While i < m
	  While j > -1 And Source(i) <> Source(j)
		 j = kmpNext(j)
	  Wend
	  i + 1
	  j + 1
	  If Source(i) = Source(j)
		 kmpNext(i) = kmpNext(j)
	  Else
		 kmpNext(i) = j
	  EndIf
   Wend
EndProcedure

Procedure kmp_search(Start.l,Array Source.a(1), Array Pattern.a(1))
  Protected.i m, i, j, n
  Protected Dim kmpNext.i(ArraySize(Pattern()))
  preKmp(Pattern(), kmpNext())
  j = ArraySize(Source())
  n = ArraySize(Pattern()) 
	m = Start-1
    While m+i < j
        If Pattern(i) = Source(m+i) Or Pattern(i) = $FF
          If i = n-1
            ProcedureReturn m
          EndIf
          i + 1
        Else
            m = m + i - kmpNext(i)
            If kmpNext(i) > -1
              i = kmpNext(i)
            Else
              i = 0
            EndIf  
        EndIf    
    Wend          
  ProcedureReturn 0
EndProcedure   
My data is a Sony Ericsson Firmware and I load it as a byte Array and search for random pattern for Automatic patch porting.

This is a list of string search algorithms: http://www-igm.univ-mlv.fr/~lecroq/string/node1.html
Lets port some stuff. :D
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Array Search Procedure

Post by IdeasVacuum »

The Forum search is not so great but there is a really excellent way to use Goggle to specifically search the PB forum:

http://www.purebasic.fr/english/viewtop ... 61&start=0


It turns-up one interesting discussion in particular:

http://www.purebasic.fr/english/viewtopic.php?p=297990
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

Finally I've found a better solution, I've ported the Boyer-Moore algorithm and it is very faster.
The difference between KMP algorithm and BM algorithm searching a 32 bytes pattern on a 33MB file is about 410ms on my PC.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

Searching for a 8 bytes Pattern length in a 33MB file i got this results:

Code: Select all

1DD8AA4- Tuned Boyer-Moore algorithm = 47.0ms
1DD8AA4- Smith algorithm = 94.0ms
1DD8AA4- Boyer-Moore algorithm = 93.0ms
1DD8AA4- Turbo BM algorithm = 156.0ms
1DD8AA4- Galil-Giancarlo algorithm = 421.0ms
1DD8AA4- Knuth-Morris-Pratt algorithm = 375.0ms
Another file:
92D26E- uned Boyer-Moore algorithm = 15.0ms
92D26E- Smith algorithm = 31.0ms
92D26E- Boyer-Moore algorithm = 32.0ms
92D26E- Turbo BM algorithm = 46.0ms
92D26E- Galil-Giancarlo algorithm = 125.0ms
92D26E- Knuth-Morris-Pratt algorithm = 125.0ms
If someone need the code post here. 8)
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Array Search Procedure

Post by IdeasVacuum »

Looks like a convincing result in favour of the Tuned Boyer-Moore algorithm. What is the performance though with a huge file? Say 1 gig?
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Re: Array Search Procedure

Post by rsts »

Vitor_Boss® wrote:Searching for a 8 bytes Pattern length in a 33MB file i got this results:

Code: Select all

1DD8AA4- Tuned Boyer-Moore algorithm = 47.0ms
1DD8AA4- Smith algorithm = 94.0ms
1DD8AA4- Boyer-Moore algorithm = 93.0ms
1DD8AA4- Turbo BM algorithm = 156.0ms
1DD8AA4- Galil-Giancarlo algorithm = 421.0ms
1DD8AA4- Knuth-Morris-Pratt algorithm = 375.0ms
Another file:
92D26E- uned Boyer-Moore algorithm = 15.0ms
92D26E- Smith algorithm = 31.0ms
92D26E- Boyer-Moore algorithm = 32.0ms
92D26E- Turbo BM algorithm = 46.0ms
92D26E- Galil-Giancarlo algorithm = 125.0ms
92D26E- Knuth-Morris-Pratt algorithm = 125.0ms
If someone need the code post here. 8)
I'm using the assembler Boyer-Moore algorithm from http://www.purebasic.fr/english/viewtop ... d&start=22

Yes, I would like to see your code. I would be interested in testing it against that, unless you're using equivalent code.

cheers
Vitor_Boss®
User
User
Posts: 81
Joined: Thu Sep 23, 2010 4:22 am

Re: Array Search Procedure

Post by Vitor_Boss® »

IdeasVacuum wrote:Looks like a convincing result in favour of the Tuned Boyer-Moore algorithm. What is the performance though with a huge file? Say 1 gig?
The bigger problem is the time used to read the file, 33MB take about 1 Sec in exe mode and 9-12 Sec in debug mode.
How can I improve the read file speed? I'm reading one byte/time.Look:

Code: Select all

Procedure.b GetByte(FileName.s,Array Output.a(1), Size.l=0, START.l=0)
  Protected Status.a, i.l
  If FileSize(FileName)
    Status = OpenFile(1,FileName)
    If Status <> 0
      If Size>0      
      Else
        Size=FileSize(FileName)
      EndIf    
      ReDim Output(Size)
      FileSeek(1,START)      
      While i<=(Size)       
        Output(i)=ReadAsciiCharacter(1)
        i+1    
      Wend
      CloseFile(1)
      ProcedureReturn Status
    Else
      ProcedureReturn Status
    EndIf
  EndIf  
EndProcedure
Here is the code, try it, you must change for string search.

Code: Select all

Procedure.l Max(Number1.l,Number2.l)
  If Number1>Number2
    ProcedureReturn Number1
  Else  
    ProcedureReturn Number2    
  EndIf  
EndProcedure
Procedure suffixes(Array Source.a(1), Array Suff.i(1));
;void suffixes(char *x, int m, int *suff) 
   Protected.i F, g, i, m
   m = ArraySize(Source())
   Suff(m - 1) = m;
   g = m - 1;
   For i = m - 2 To 0 Step -1 ; i >= 0; --i)
      If (i > g And Suff(i + m - 1 - F) < i - g)
         Suff(i) = Suff(i + m - 1 - F);
      Else
         If (i < g)
            g = i;
         EndIf   
         F = i;
         While (g >= 0 And Source(g) = Source(g + m - 1 - F))
            g-1
         Wend   
         Suff(i) = F - g;
      EndIf         
   Next
EndProcedure
Procedure preBmBc(Array Source.a(1),Array bmBc.i(1))
   Protected.i i, m;
   m = ArraySize(Source())
   While i<=255
     bmBc(i) = m;
     i + 1
   Wend
   i = 0
   While i < m
     bmBc(Source(i)) = m - i - 1;
     i + 1
   Wend
EndProcedure
Procedure preBmGs(Array Source.a(1),Array bmGs.i(1))
    Protected.i i, j, m
    Protected Dim Suff(255);
    m = ArraySize(Source())
    suffixes(Source(), Suff()); 
   For i = 0 To m
      bmGs(i) = m;
   Next   
   j = 0;
   For i = m - 1 To 0 Step -1 ; i >= 0; --i
      If Suff(i) = i + 1
         For j = 0 To m - 1 - i
            If bmGs(j) = m
              bmGs(j) = m - 1 - i;
            EndIf
          Next
      EndIf    
   Next           
   For i = 0 To m - 2 ; i <= m - 2; ++i)
     bmGs(m - 1 - Suff(i)) = m - 1 - i;
   Next  
EndProcedure      
;Boyer-Moore
Procedure BM(START.l,Array Source.a(1), Array Pattern.a(1))
   Protected.i i, j, m, n
   Protected Dim bmGs(255)
   Protected Dim bmBc(255)   
   m = ArraySize(Pattern())
   n = ArraySize(Source()) 
 
   ; Preprocessing 
   preBmGs(Pattern(), bmGs())
   preBmBc(Pattern(), bmBc())

   ; Searching 
   j = Max(0,START-1)
  While j <= n - m
	  i = m - 1
	  While i >= 0 And (Pattern(i) = Source(i + j))
		  i - 1
	  Wend
	  If i < 0
		 ProcedureReturn(j)
		 j + bmGs(0)
	  Else
	    If bmGs(i)> bmBc(Source(i + j)) - m + 1 + i	      
	      j + bmGs(i)
	    Else
	      j + bmBc(Source(i + j)) - m + 1 + i
	    EndIf  
	  EndIf
	Wend
	ProcedureReturn 0
EndProcedure
; Tuned Boyer-Moore
Procedure TUNEDBM(START.l,Array Source.a(1), Array Pattern.a(1))
   Protected.i i, j, m, n, k, match, shift
   Protected.i Dim bmBc(255)
   m = ArraySize(Pattern())
   n = ArraySize(Source()) 
   
   ; Preprocessing 
   preBmBc(Pattern(), bmBc())
   shift = Max(bmBc(Pattern(m - 1)),1)
   bmBc(Pattern(m - 1)) = 0

   ; Searching
   j = Max(0,START-1)
   While j < n
	  While (j + m < n) And (bmBc(Source(j + m -1)) <> 0)
	    j + bmBc(Source(j + m -1))		  
	    If j + m >= n: Break: EndIf
	    j + bmBc(Source(j + m -1))
	    If j + m >= n: Break: EndIf
	    j + bmBc(Source(j + m -1))
	    If j + m >= n: Break: EndIf
	  Wend
	  While i <= m-1
      If Source(j + i) = Pattern(i)
        match+1
        If match + 1 = m
          ProcedureReturn j
        EndIf
		  Else
		    match=0
		    i=0
		    Break
		  EndIf
		  i+1
	  Wend    
	  j + shift ; shift
	Wend
	ProcedureReturn 0
EndProcedure
This is adapted to Array search.
Sorry by bad English.
HP Pavilion DV6-2155DX: Intel i3-330m 2.13 / 4GB DDR3 / 500GB Sata2 HD / Display 15.6" LED / Win7 Ultimate x64 / PB 4.50 x86 demo.
Post Reply