Rabin-Karp rolling hash example

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Rabin-Karp rolling hash example

Post by wilbert »

An example of the Rabin-Karp rolling hash.

For multiplication 257 is chosen. This seems to work pretty well and can also be implemented with a shift and add instead of a multiplication (see RKHash asm code).
For modulo 2^32 is chosen so the modulo calculation can be omitted when using 32 bit variables.
So you end up with

Power.l = 257 ^ WindowSize
RKHash.l = (RKHash * 257) + UnsignedByteValueToAdd - (UnsignedByteValueToRemove * Power)


or with shift and add

RKHash.l = (RKHash << 8 ) + RKHash + UnsignedByteValueToAdd - (UnsignedByteValueToRemove * Power)

In its current form it's not very useful since a simple compare would be faster but it can be useful when you want to search for multiple patterns.
I that case you can compare the rolling hash against multiple pattern hashes.

Code: Select all

Procedure.l MPow(Base, Exponent)
  ; Fast computation of
  ; Base^Exponent Mod 2^32
  !     mov   edx, [p.v_Base]
  !     mov   ecx, [p.v_Exponent]
  !     mov   eax, 1
  !.l0: test  ecx, 1
  !     jz    .l1
  !     imul  eax, edx
  !.l1: imul  edx, edx
  !     shr   ecx, 1
  !     jnz   .l0
  ProcedureReturn
EndProcedure

Procedure.l RKHash(*Memory, Size)
  ; Rabin-Karp hash
  ; Mul 257, Mod 2^32
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !     mov   rdx, [p.p_Memory]
    !     mov   rcx, [p.v_Size]
    !     push  rbx
    !     xor   eax, eax
    !     test  rcx, rcx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [rdx]
    !     add   rdx, 1
    !     add   eax, ebx
    !     sub   rcx, 1
    !     jnz   .l0
    !.l1: pop   rbx
  CompilerElse
    !     mov   edx, [p.p_Memory]
    !     mov   ecx, [p.v_Size]
    !     push  ebx
    !     xor   eax, eax
    !     test  ecx, ecx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [edx]
    !     add   edx, 1
    !     add   eax, ebx
    !     sub   ecx, 1
    !     jnz   .l0
    !.l1: pop   ebx
  CompilerEndIf
  ProcedureReturn
EndProcedure



; Rabin-Karp rolling hash example 

Text.s = "A small demonstration of a rolling hash"
Pattern.s = "rolling"

TxtByteLen = StringByteLength(Text)
PatByteLen = StringByteLength(Pattern)

Power.l = MPow(257, PatByteLen)

PatHash.l = RKHash(@Pattern, PatByteLen)
TxtHash.l = RKHash(@Text, PatByteLen)

*Old.Ascii = @Text
*New.Ascii = @Text + PatByteLen
*End       = @Text + TxtByteLen

Repeat
  
  ; Compare hash
  If PatHash = TxtHash And CompareMemory(@Pattern, *Old, PatByteLen)
    BytePosFound = *Old - @Text
    Debug "Pattern found at byte position: " + Str(BytePosFound)
  EndIf
  
  ; Break if end is reached
  If *New = *End
    Break
  EndIf
  
  ; Update rolling hash
  TxtHash = TxtHash * 257 + *New\a - (*Old\a * Power)
  *Old + 1
  *New + 1
  
ForEver
Last edited by wilbert on Wed Sep 04, 2019 8:15 am, edited 4 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

Thank you!

To confirm the effectiveness of this approach, here is speed test "rolling vs usual fast 32-bit hash calculation"
Keep in mind that this is a comparison hash functions speed test not the search algorithm!
(which in itself can be significantly improved and in the tests was used the simplest Tichy’s [Tic84] greedy algorithm)

I don’t know if it’s possible to optimize the code even more, but the current numbers are already impressive ~x2 faster!

Code: Select all

Search in 1MB
Pattern found at byte position: 1048550  Time: 260 ms
Pattern found at byte position: 1048550  Time: 480 ms

Search in 2MB
Pattern found at byte position: 2097126  Time: 486 ms
Pattern found at byte position: 2097126  Time: 955 ms

Search in 3MB
Pattern found at byte position: 3145702  Time: 727 ms
Pattern found at byte position: 3145702  Time: 1430 ms

Search in 4MB
Pattern found at byte position: 4194278  Time: 967 ms
Pattern found at byte position: 4194278  Time: 1904 ms

Search in 5MB
Pattern found at byte position: 5242854  Time: 1211 ms
Pattern found at byte position: 5242854  Time: 2389 ms

Search in 12MB
Pattern found at byte position: 12582886  Time: 3 seconds
Pattern found at byte position: 12582886  Time: 6 seconds

Search in 16MB
Pattern found at byte position: 16777190  Time: 4 seconds
Pattern found at byte position: 16777190  Time: 8 seconds

Search in 20MB
Pattern found at byte position: 20971494  Time: 5 seconds
Pattern found at byte position: 20971494  Time: 10 seconds

Search in 26MB
Pattern found at byte position: 27262950  Time: 6 seconds
Pattern found at byte position: 27262950  Time: 12 seconds

Search in 30MB
Pattern found at byte position: 31457254  Time: 7 seconds
Pattern found at byte position: 31457254  Time: 14 seconds

Search in 34MB
Pattern found at byte position: 35651558  Time: 8 seconds
Pattern found at byte position: 35651558  Time: 16 seconds

Search in 42MB
Pattern found at byte position: 44040166  Time: 11 seconds
Pattern found at byte position: 44040166  Time: 20 seconds

Search in 45MB
Pattern found at byte position: 47185894  Time: 11 seconds
Pattern found at byte position: 47185894  Time: 22 seconds

Search in 50MB
Pattern found at byte position: 52428774  Time: 12 seconds
Pattern found at byte position: 52428774  Time: 25 seconds
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

I think you forgot to turn the debugger off. That's very important when doing timings.
Here's an updated example which seeks for 3 different patterns of 32 bytes within 50 megabyte of data.
If you would have a lot of patterns, it makes sense to use the lowest bits of the hashes to place them into different buckets to reduce the amount of compares.
Converting the search info asm could further improve the speed.

Code: Select all

DisableDebugger

Procedure.l MPow(Base, Exponent)
  ; Fast computation of
  ; Base^Exponent Mod 2^32
  !     mov   edx, [p.v_Base]
  !     mov   ecx, [p.v_Exponent]
  !     mov   eax, 1
  !.l0: test  ecx, 1
  !     jz    .l1
  !     imul  eax, edx
  !.l1: imul  edx, edx
  !     shr   ecx, 1
  !     jnz   .l0
  ProcedureReturn
EndProcedure

Procedure.l RKHash(*Memory, Size)
  ; Rabin-Karp hash
  ; Mul 257, Mod 2^32
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !     mov   rdx, [p.p_Memory]
    !     mov   rcx, [p.v_Size]
    !     push  rbx
    !     xor   eax, eax
    !     test  rcx, rcx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [rdx]
    !     add   rdx, 1
    !     add   eax, ebx
    !     sub   rcx, 1
    !     jnz   .l0
    !.l1: pop   rbx
  CompilerElse
    !     mov   edx, [p.p_Memory]
    !     mov   ecx, [p.v_Size]
    !     push  ebx
    !     xor   eax, eax
    !     test  ecx, ecx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [edx]
    !     add   edx, 1
    !     add   eax, ebx
    !     sub   ecx, 1
    !     jnz   .l0
    !.l1: pop   ebx
  CompilerEndIf
  ProcedureReturn
EndProcedure



; >> Rabin-Karp rolling hash example <<

; Allocate 50 megabyte of data
; and 3 patterns of 32 bytes each

BigDataSize = 52428800; 50 Megabyte
PatternSize = 32; 32 Bytes

*BigData = AllocateMemory(BigDataSize, #PB_Memory_NoClear)
*Pattern1 = AllocateMemory(PatternSize, #PB_Memory_NoClear)
*Pattern2 = AllocateMemory(PatternSize, #PB_Memory_NoClear)
*Pattern3 = AllocateMemory(PatternSize, #PB_Memory_NoClear)

; Fill BigData and Patterns with random bytes
RandomSeed(0)
RandomData(*BigData, BigDataSize)
RandomData(*Pattern1, PatternSize)
RandomData(*Pattern2, PatternSize)
RandomData(*Pattern3, PatternSize)

; Inject patterns into BigData at different positions
CopyMemory(*Pattern1, *BigData + 45, PatternSize)
CopyMemory(*Pattern1, *BigData + 167, PatternSize)
CopyMemory(*Pattern2, *BigData + 3297, PatternSize)
CopyMemory(*Pattern2, *BigData + BigDataSize - 1000, PatternSize)
CopyMemory(*Pattern3, *BigData + BigDataSize - 100, PatternSize)


Power.l = MPow(257, PatternSize)
BigDataHash.l = RKHash(*BigData, PatternSize)
Pattern1Hash.l = RKHash(*Pattern1, PatternSize)
Pattern2Hash.l = RKHash(*Pattern2, PatternSize)
Pattern3Hash.l = RKHash(*Pattern3, PatternSize)

OpenWindow(0, 0, 0, 420, 320, "Rabin-Karp rolling hash example", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
EditorGadget(0, 10, 10, 400, 300)

*TailOfSlidingWindow.Ascii = *BigData
*HeadOfSlidingWindow.Ascii = *BigData + PatternSize
*EndOfBigData = *BigData + BigDataSize

t1 = ElapsedMilliseconds()

Repeat
  
  ; Compare hash
  Select BigDataHash
    Case Pattern1Hash:
      If CompareMemory(*Pattern1, *TailOfSlidingWindow, PatternSize)
        AddGadgetItem(0, -1, "Found pattern1 at position: " + Str(*TailOfSlidingWindow - *BigData))
      EndIf
    Case Pattern2Hash:
      If CompareMemory(*Pattern2, *TailOfSlidingWindow, PatternSize)
        AddGadgetItem(0, -1, "Found pattern2 at position: " + Str(*TailOfSlidingWindow - *BigData))
      EndIf
    Case Pattern3Hash:
      If CompareMemory(*Pattern3, *TailOfSlidingWindow, PatternSize)
        AddGadgetItem(0, -1, "Found pattern3 at position: " + Str(*TailOfSlidingWindow - *BigData))
      EndIf
  EndSelect
      
  ; Break if end is reached
  If *HeadOfSlidingWindow = *EndOfBigData
    Break
  EndIf
  
  ; Update rolling hash
  BigDataHash = BigDataHash * 257 + *HeadOfSlidingWindow\a - (*TailOfSlidingWindow\a * Power)
  *TailOfSlidingWindow + 1
  *HeadOfSlidingWindow + 1
  
ForEver

t2 = ElapsedMilliseconds()
AddGadgetItem(0, -1, #CRLF$ + "Processing time in seconds: " + StrD((t2 - t1)*0.001, 3))

Repeat
  Event = WaitWindowEvent()
Until Event = #PB_Event_CloseWindow

FreeMemory(*BigData)
FreeMemory(*Pattern1)
FreeMemory(*Pattern2)
FreeMemory(*Pattern3)
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

wilbert wrote:I think you forgot to turn the debugger off
Not forgot I just didn’t :)
I wanted to show how many times one way is faster than other.

Here is results without debugger:

Code: Select all

Search 26 byte pattern in the end of xxMB data

Search in 1MB
Pattern found at byte position: 1048550  Time: 8.00 ms
Pattern found at byte position: 1048550  Time: 15.00 ms

Search in 4MB
Pattern found at byte position: 4194278  Time: 29.00 ms
Pattern found at byte position: 4194278  Time: 60.00 ms

Search in 8MB
Pattern found at byte position: 8388582  Time: 55.00 ms
Pattern found at byte position: 8388582  Time: 116.00 ms

Search in 16MB
Pattern found at byte position: 16777190  Time: 109.00 ms
Pattern found at byte position: 16777190  Time: 231.00 ms

Search in 24MB
Pattern found at byte position: 25165798  Time: 166.00 ms
Pattern found at byte position: 25165798  Time: 356.00 ms

Search in 32MB
Pattern found at byte position: 33554406  Time: 218.00 ms
Pattern found at byte position: 33554406  Time: 463.00 ms

Search in 50MB
Pattern found at byte position: 52428774  Time: 343.00 ms
Pattern found at byte position: 52428774  Time: 724.00 ms

Search in 600MB
Pattern found at byte position: 629145574  Time: 4.13 seconds
Pattern found at byte position: 629145574  Time: 8.69 seconds

Search in 1000MB
Pattern found at byte position: 1048575974  Time: 6.89 seconds
Pattern found at byte position: 1048575974  Time: 14.48 seconds
Same ~2x faster 8)

Also note that for comparison I used the fastest assembler hash that I had, I think if we compared with the standard 32-bit hashing algos, the difference will be tens of times or more :wink:
So well done! Thx!
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5353
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Rabin-Karp rolling hash example

Post by Kwai chang caine »

Hello WILBERT :D
W10 X64 / v5.70 x86

Results for this code
viewtopic.php?p=541368#p541368
Found pattern1 at position: 45
Found pattern1 at position: 167
Found pattern2 at position: 3297
Found pattern2 at position: 52427800
Found pattern3 at position: 52428700

Processing time in seconds: 0.235
:wink:
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

Kwai chang caine wrote:Results for this code
viewtopic.php?p=541368#p541368
Found pattern1 at position: 45
Found pattern1 at position: 167
Found pattern2 at position: 3297
Found pattern2 at position: 52427800
Found pattern3 at position: 52428700

Processing time in seconds: 0.235
:wink:
That's the result it should give. :)
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

A little bit more tests just for fun.

Code: Select all

Search 26 byte pattern in the end of xxMB data

vs MurmurHash3 (asm version) ~x4 faster
Search in 1MB
Pattern found at byte position: 1048550    Time:  7.00 ms
Pattern found at byte position: 1048550    Time: 29.00 ms
Search in 100MB
Pattern found at byte position: 104857574  Time: 693.00 ms
Pattern found at byte position: 104857574  Time:   2.58 seconds

vs Adler32 (asm version) ~more than x8 faster
Search in 1MB
Pattern found at byte position: 1048550  Time: 7.00 ms
Pattern found at byte position: 1048550  Time: 62.00 ms
Search in 100MB
Pattern found at byte position: 104857574  Time: 697.00 ms
Pattern found at byte position: 104857574  Time: 5.81 seconds
And what's more important it have approximately same number of collisions as XXH32, MurmurHash3, Meiyan and obviously totally beats Adler32
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

It seems I didn’t quite understand how to cook Rabin-Karp :|
I wrote a quick, slightly messy example, but everything should be clear with it:

Code: Select all

;DisableDebugger

; Create 2 equal mem blocks
Define ASize.i = 1024*10
Define *A = AllocateMemory(ASize) 
RandomData(*A, ASize)
Define BSize.i = ASize 
Define *B = AllocateMemory(BSize) 
CopyMemory(*A, *B, BSize)
;
Structure HashTable
	Offset.i
	WeakHash.l 
EndStructure
;
Global Size.i = ASize ; just testing 
Global ChunkSize.i
Global LastChunk.i
;
Global ChunkSize = Size/64
Global Dim HT.HashTable(Size/ChunkSize)

Procedure.l MPow(Base, Exponent)
  ; Fast computation of
  ; Base^Exponent Mod 2^32
  !     mov   edx, [p.v_Base]
  !     mov   ecx, [p.v_Exponent]
  !     mov   eax, 1
  !.l0: test  ecx, 1
  !     jz    .l1
  !     imul  eax, edx
  !.l1: imul  edx, edx
  !     shr   ecx, 1
  !     jnz   .l0
  ProcedureReturn
EndProcedure

Procedure.l RKHash(*Memory, Size)
  ; Rabin-Karp hash
  ; Mul 257, Mod 2^32
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !     mov   rdx, [p.p_Memory]
    !     mov   rcx, [p.v_Size]
    !     push  rbx
    !     xor   eax, eax
    !     test  rcx, rcx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [rdx]
    !     add   rdx, 1
    !     add   eax, ebx
    !     sub   rcx, 1
    !     jnz   .l0
    !.l1: pop   rbx
  CompilerElse
    !     mov   edx, [p.p_Memory]
    !     mov   ecx, [p.v_Size]
    !     push  ebx
    !     xor   eax, eax
    !     test  ecx, ecx
    !     jz    .l1
    !.l0: mov   ebx, eax
    !     shl   ebx, 8
    !     add   eax, ebx
    !     movzx ebx, byte [edx]
    !     add   edx, 1
    !     add   eax, ebx
    !     sub   ecx, 1
    !     jnz   .l0
    !.l1: pop   ebx
  CompilerEndIf
  ProcedureReturn
EndProcedure

Procedure.q BinarySearch(Hash.l)
  ;
  Protected Position.q  = -1
  Protected Start.q     = 0
  Protected ArraySize.q = ArraySize(HT())
  Protected Middle.q
  
  While Start <= ArraySize
    ; We find the middle of the research space. 
    Middle = Round((Start + ArraySize) / 2,0)
    ; Keep half of right or left of the search space
    If Hash < HT(Middle)\WeakHash
      ArraySize = Middle - 1
      
    ElseIf Hash > HT(Middle)\WeakHash
      Start = Middle + 1
      ;If it's equal, we found - return the middle position    
    Else    
      Start = ArraySize + 1
      Position = Middle                
    EndIf 
  Wend
  ;
  ProcedureReturn Position
EndProcedure 

Procedure GenerateHashTable(*Mem)

  Protected BlocksCount = Size/ChunkSize 
  LastChunk   = Size - (BlocksCount * ChunkSize)
  Protected x  
  
  BlocksCount - 1
  For x = 0 To BlocksCount 
    HT(x)\Offset   = x*ChunkSize
    HT(x)\WeakHash = RKHash(*Mem+HT(x)\Offset, ChunkSize) ; looks like we need to generate here not calc?
  Next 
  
  ; LastChunk - compare later byte-wise or hash it here anyway...
  
; For x = 0 To ArraySize(HT())-1
;   Debug Hex(HT(x)\WeakHash, #PB_Long) ; get 0 sometimes (mabe magic number involve or just a bug in old code)
; Next  

EndProcedure

Procedure WalkHashTable(*Mem)
  Protected Size.i
  Protected Hash.l
  Protected Equal.i
  Protected x.i
  Protected Position.i
  Protected Power.l = MPow(257, ChunkSize) ; make it global sounds sane
  Size - ChunkSize - 1 
  Hash = RKHash(*Mem, ChunkSize) ; zero step
  Protected *A.Ascii = *Mem             ;+ 1         
  Protected *B.Ascii = *Mem + ChunkSize ;+ 1   
  
  For x = 1 To Size 
    Hash = Hash * 257 +  *A\a - (*B\a * Power) 
    ;
    Position = BinarySearch(Hash) ; just for speedup
    If Not Position = -1                                                                ; [!] first (init one) hash out of processing here...  ignoring this fact for this test, other must match anyway
      ;If Position >= 1 And Position < ArraySize(HT())-1
        ; handle zeroed blocks & some other patterns* \removed
        ; heuristic and merge matches \removed     
      ;EndIf   
        Equal +1 
      EndIf 
    *A + 1
    *B + 1
  Next  
  
  ; [!] LastChunk code here 
  
  PrintN("Equals: "+Equal+" \ "+ArraySize(HT())) ; must be a lot of dummy equals - it's ok for unhandled output
EndProcedure

OpenConsole()
GenerateHashTable(*A)
WalkHashTable(*B)
Input()

In a nutshell - if successful, we must get a large number of matches, now we have 0.
How to adjust calculations for hashes match?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

While I did post this Rabin-Karp example, I'm not an expert on the subject.
I believe Idle knows a lot more about it.
One thing however I do notice in your code is that it looks like you are trying a binary search on unsorted data.
That won't work. A binary search requires the data to be sorted.
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

wilbert wrote:binary search on unsorted data
SortStructuredArray(HT(), #PB_Sort_Ascending, OffsetOf(HashTable\WeakHash), TypeOf(HashTable\WeakHash))
Sry, just forgot to copy this line from original code.

The whole idea with a rolling hash was conceived for use in this context...
Ok, I will try to figure it out.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

I'm not sure what you are exactly trying to do.
As far as I understand, a rolling hash is very useful if for example a few bytes are added to or removed from a binary file.
That way you can slide the window one byte at a time and track where a certain part of the original file went. This wouldn't work if you compare the two files after every 16 bytes or so.
If the data you want to compare is text or contains for example zero bytes on a regular base, you could compute a fixed hash of every line or parts between the zero bytes. That way you wouldn't need a rolling hash.
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

wilbert wrote:This wouldn't work if you compare the two files after every 16 bytes or so
Am I doing that?

WalkHashTable() slides the window one byte at a time and validate it by HashTable
Number\size of chunks hashed in HashTable can be any values.
Optimal block size ~16 - 64 bytes but it could be either 1024, why not? (in the example above it's 160 bytes).
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

Everything wrote:Am I doing that?
No, sorry. It was just a general remark.

I looked a bit closer at your code.
There's a few problems with your WalkHashTable procedure ...
The size you need to walk over is equal to MemorySize(*Mem) - ChunkSize .
In the code you posted it wasn't.
The other problem is that with a rolling hash like

Code: Select all

Hash = Hash * 257 +  *A\a - (*B\a * Power)
*A needs to be the bigger value. You mixed those two up.
So you either need to swap *A and *B inside the line above or make *A equal to *Mem + ChunkSize and *B to *Mem.
Windows (x64)
Raspberry Pi OS (Arm64)
Everything
Enthusiast
Enthusiast
Posts: 224
Joined: Sat Jul 07, 2018 6:50 pm

Re: Rabin-Karp rolling hash example

Post by Everything »

wilbert wrote:MemorySize(*Mem) - ChunkSize
Another "quick and dirty copy paste" inadvertence. My bad.
wilbert wrote:need to swap *A and *B
Here it is! Thank you!

BTW
Am I understand right that we can't really use "fast hash calc" in GenerateHashTable() because of ChunkSize stepping?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Rabin-Karp rolling hash example

Post by wilbert »

Everything wrote:BTW
Am I understand right that we can't really use "fast hash calc" in GenerateHashTable() because of ChunkSize stepping?
Sorry but I don't understand your question; especially what you mean with "fast hash calc" :(
Can you explain in a bit more detail ?
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply