Murmur3 hash
Posted: Sat Nov 07, 2015 12:22 pm
ive been trying to write the Murmur3 hash in Purebasic, google suggests it hasnt been done before but it seems like a simple enough algorithm and from my reading today it has a lot of excellent qualities such as collision resistance, distribution, and exceptional speed mainly as it works on 4 bytes at a time - this should be the fastest hash Purebasic has ever seen if benchmarks are anything to go by, and we probably wont bother with CRC32 again!
This is unfinished so perhaps i shouldnt be posting it to Tips&Tricks just yet lol (i did set In Progress in the topic!), but i'm sure we'll get there soon!!!
It is currently working fine for one of the test vectors, so the code is nearrrrly there, but im at the stage now where im struggling to work out the bits i've gotten wrong lol
my version is based on code from:
https://en.wikipedia.org/wiki/MurmurHash
http://code.google.com/p/smhasher/wiki/MurmurHash3
http://smhasher.googlecode.com/svn/trun ... rHash3.cpp
http://programmers.stackexchange.com/qu ... -and-speed
and I found this useful page with lots of test vectors...
http://stackoverflow.com/questions/1474 ... st-vectors
currently mine just works against one of them, "aaaa", returning the expected hash. There are some issues still to deal with, because it works with Unsigned 32bit, so Purebasic's 32bit signed Long is throwing me off a bit there. The "tail" section in particular needs some work (that section isn't used in this demo as "aaaa" is a perfect 4-byte chunk, so there's no tail)
As you can see she's not TOOOO hard... just finicky in dotting the i's and crossing the t's heehee

Anyway I post it here to hopefully spur some interest in it!
[EDIT] see wilbert's excellent asm implementation here!
This is unfinished so perhaps i shouldnt be posting it to Tips&Tricks just yet lol (i did set In Progress in the topic!), but i'm sure we'll get there soon!!!
It is currently working fine for one of the test vectors, so the code is nearrrrly there, but im at the stage now where im struggling to work out the bits i've gotten wrong lol
my version is based on code from:
https://en.wikipedia.org/wiki/MurmurHash
http://code.google.com/p/smhasher/wiki/MurmurHash3
http://smhasher.googlecode.com/svn/trun ... rHash3.cpp
http://programmers.stackexchange.com/qu ... -and-speed
and I found this useful page with lots of test vectors...
http://stackoverflow.com/questions/1474 ... st-vectors
currently mine just works against one of them, "aaaa", returning the expected hash. There are some issues still to deal with, because it works with Unsigned 32bit, so Purebasic's 32bit signed Long is throwing me off a bit there. The "tail" section in particular needs some work (that section isn't used in this demo as "aaaa" is a perfect 4-byte chunk, so there's no tail)
As you can see she's not TOOOO hard... just finicky in dotting the i's and crossing the t's heehee


Anyway I post it here to hopefully spur some interest in it!
[EDIT] see wilbert's excellent asm implementation here!
Code: Select all
CompilerIf #PB_Compiler_Unicode = 1
CompilerError("Unicode not supported for this demo, use Ascii compile")
CompilerEndIf
Structure longx
l.Long[0]
EndStructure
Structure asciix
a.Ascii[0]
EndStructure
Procedure.l murmur3_32(key, llen.i, seed.l)
c1.l = $cc9e2d51
c2.l = $1b873593
r1.l = 15
r2.l = 13
m.l = 5
n.l = $e6546b64
k.l = 0
i.l = 0
hash.l = seed
nblocks.l = llen / 4
*blocks.longx = key
For i = 0 To nblocks-1
k = PeekL(*blocks\l[i]) <-- it seems i might be doing a memory read twice here instead of the intended once, as im having to use PeekL to read the value, otherwise we're using the address not the value
k * c1
k = (k << r1) | (k >> (32 - r1))
k * c2
hash = hash ! k
hash = ((hash << r2) | (hash >> (32 - r2))) * m + n
Next i
*tail.asciix = key + (nblocks * 4)
k1.l = 0
;i dont understand this part because only Case 1 modifies the hash variable
Select (llen & 3)
Case 3:
k1 = k1 ! (PeekA(*tail\a[2]) << 16)
Case 2:
k1 = k1 ! (PeekA(*tail\a[1]) << 8)
Case 1:
k1 = k1 ! (PeekA(*tail\a[0]))
k1 * c1
k1 << 15
k1 * c2
hash ! k1
EndSelect
hash ! llen
hash ! (hash >> 16)
hash * $85ebca6b
hash ! (hash >> 13)
hash * $c2b2ae35
hash ! (hash >> 16)
ProcedureReturn hash
EndProcedure
;// murmur3_32("aaaa", len=4, seed=$9747b28c) = $5A97808A according to http://stackoverflow.com/questions/14747343/murmurhash3-test-vectors
test.s = "aaaa"
llen.l = Len(test)
seed.l = $9747b28c ;0
hash.l = murmur3_32(@test, llen, seed)
Debug(Hex(hash,#PB_Long))