Page 1 of 2

Using hashes -- pb's vs xxhash and murmur3

Posted: Wed Sep 25, 2013 11:29 pm
by jassing
I have a custom patch app for a remote research station (satellite data connection = $$$/minute)
The files are huge; one problem I was having was with hashes for file pre-test and post-patch validation. (to ensure it was the expected file to patch, and the resulting file is what was expected)

The whole system works; but the hash calc times were killing me.
I read up on other hash algorithms and tested them. What I'm interested in -- does anyone have any 1st hand dealings with xxhash or murmur3? They seem to be well received. both are lightning fast compared to pb's crc32 and md5 (producing similar results)

Running some tests on an cheap laptop:
Read File Overhead: 34.41 seconds, File size: 775876608

xxHash
Result: 38939D0D
Time: 0.64
-vs-
PB's CRC32
Result: 7668A68B
Time: 7.91

-------

Murmur3 (128 bit)
Result: 526d08ff75ffff194e2f52ff485fff59
Time: 1.31
-vs-
PB's MD5 (128 bit)
Result: 7a5b70050b484ecaa5909c0d9abc0e6c
Time: 8.11

(SHA1 ran about 1/2 second longer for a 192bit result, which I was using, but I wanted to compare 128 to 128 bits)

PB's sha1 (192 bit)
Result: e5ac274db89f38718b25db4e1dd335bf9c1ded16
Time: 8.60

So on the surface, i'm leaning towards using them vs pb's hashes.
I am running some collision tests now; but so far, xxhash and murmur3 are on par crc32 and better than (less collisions) than md5

But Since I have no real world experience using them, before investing in them...
If anyone has any interest, I can post the test code.

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 12:52 am
by idle
you could try fnv32 inline while you read the file

Code: Select all

Global ghash

Macro InitHash() 
   ghash = 2166136261 
EndMacro    

Global byte.a

file.s = OpenFileRequester("read a file","","*.*",1)
If file <> "" 
   fn = ReadFile(#PB_Any,file) 
   If fn 
      InitHash() 
      While Not Eof(fn) 
        Byte = ReadByte(fn) 
        ghash ! byte 
        ghash *  $01000193 
     Wend 
    CloseFile(fn) 
EndIf    
  
MessageRequester("hash",Str(ghash)) 
EndIf 


Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 2:13 am
by jassing
fnv hash is horrible for collisions....

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 3:10 am
by sec
jassing wrote:fnv hash is horrible for collisions....
How about SHA3, http://www.purebasic.fr/english/viewtop ... 40&t=56420 ?

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 3:30 am
by jassing
if you have a sha3 algorithm that beats those times...

so I take it that no one,so far, has experience with xxhash or murmur3...?

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 3:33 am
by sec
jassing wrote:if you have a sha3 algorithm that beats those times...

so I take it that no one,so far, has experience with xxhash or murmur3...?
Do you want one xxhash version in PB? :)

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 5:04 am
by idle
jassing wrote:fnv hash is horrible for collisions....
I never found any problem with it 32 bit produces ~100 for 1,000,000 items
but then you can always check the file length

but the 64 bit shouldn't produce any collisions
though if you're running it on 32 bit it might slow you down a bit

Code: Select all

;test for collisions takes a while  
;then do a speed test 
;note choose a small file < 10 mb 

Global ghash.q

Macro InitFNV64() 
   ghash = 14695981039346656037
EndMacro   
Macro FNV64a(byte) 
     ghash ! byte 
     ghash *  1099511628211
EndMacro 

Global byte.a 
Global size = 4*1024*1024
Global NewMap hashmap.q(size) 

For a = 1 To size/2 
   InitFNV64() 
   For b = 0 To 1023 
      byte = Random(255) 
      FNV64a(byte) 
   Next 
   shash.s = Str(ghash) 
   If Not FindMapElement(hashmap(),shash)   
      AddMapElement(hashmap(),shash) 
      hashmap() = ghash 
   Else 
       count+1  
   EndIf 
Next 

MessageRequester("collisions",Str(count) + " from  " + Str(a-1) + " iterations :"  + shash) 

file.s = OpenFileRequester("read a file","","*.*",1)

If file <> "" 
   For a = 1 To 10    
      fn = ReadFile(#PB_Any,file) 
      If fn 
         flen = Lof(fn)
         InitFNV64() 
         st = ElapsedMilliseconds() 
         While Not Eof(fn) 
            Byte = ReadByte(fn) 
            FNV64a(byte) 
         Wend 
         et + (ElapsedMilliseconds() - st)
         CloseFile(fn) 
      EndIf    
   Next   
   
   For a = 1 To 10    
      fn = ReadFile(#PB_Any,file) 
      If fn 
        st1 = ElapsedMilliseconds() 
         While Not Eof(fn) 
            Byte = ReadByte(fn) 
          Wend 
         et1 + (ElapsedMilliseconds() - st1) 
         CloseFile(fn) 
      EndIf    
   Next   
   et2 = (et - et1) / 10
   
   rmbps.f = ((flen/(1024*1024)) / (et1/10)) * 1000
   mbps.f = ((flen/(1024*1024)) / (et/10)) * 1000  
   MessageRequester("hash","read mbps : " + StrF(rmbps,2) +  " hash mbps : " + StrF(mbps,2) + " avg hash time  : " + StrF(et2,6) + " milliseconds") 
EndIf 



Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Thu Sep 26, 2013 6:17 am
by wilbert
jassing wrote:if you have a sha3 algorithm that beats those times...

so I take it that no one,so far, has experience with xxhash or murmur3...?
MurmurHash3
http://www.purebasic.fr/english/viewtop ... 66#p375166

In the same thread there's also an implementation of the Meiyan hash algorithm
http://www.purebasic.fr/english/viewtop ... 38#p376038

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 6:35 am
by wilbert
@Jassing, did you already port xxHash to PureBasic ?
I've been looking into the code and it doesn't seem overly complicated and does seem very fast.
I'm considering making a PB version if you haven't done already but there doesn't seem to be any test strings to verify if an implementation would be correct.

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 7:35 am
by jassing
wilbert wrote:@Jassing, did you already port xxHash to PureBasic ?
No, I compiled it w/C as a static lib, then import the function...
[I've been looking into the code and it doesn't seem overly complicated and does seem very fast.
It's fast -- and has lower rates of collision...
I'm considering making a PB version if you haven't done already but there doesn't seem to be any test strings to verify if an implementation would be correct.
You can test it against my lib if you want... Or you can just compile it as C / exe and test your own strings against your pb version.

I was surprised at some of the pb-based hash results -- some were fast, some were horrendously slow, even asm ones... Could be cpu optimizations; but since the lib is working for me; I'm fine with that. Would be interesting to compare times...

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 8:10 am
by idle
Did you bother to test my code above?

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 6:27 pm
by jassing
idle wrote:Did you bother to test my code above?
If you are asking me, and the fnv code, yes, I tested it -- and for my situation, had too many collisions.

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 9:06 pm
by idle
jassing wrote:
idle wrote:Did you bother to test my code above?
If you are asking me, and the fnv code, yes, I tested it -- and for my situation, had too many collisions.
you mean the fnva64 code 5 posts up?

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Fri Sep 27, 2013 10:33 pm
by jassing
Yes, too slow, your time calculation for the hash does not take into account the time to build the table.
Using a static small (38meg) file for both tests using pb 5.20 x64.

fvn64 (64 bit hash) - about 58.45 seconds (w/ table build) (PB Code)
murmur3 (128 bit hash) - .08 seconds (C code, staticly linked)

Ignoring the table-build time, fvn64 was still slower (3.84 seconds)

The biggest difference is tha the murmur3 was writen in C... And yes, the murmur3 time included memory allocation..

test was performed using x64 pb 5.20

Pre test:
Open,Read, and close test file 10 times

FNV64:
disable debugger, start timer
calc table
open file
calc hash
stop timer
close file
enabledebugger show results

Murmur3:
start timer
open file
allocate memory needed
read file to memory
close file
perform calculations
stop timer
display results

I can run the tests in reverse order (murmur3, then fnv64 and results are similar)
Note: Murmur3 had debugger enabled for it's whole run...

Altering your code to not read byte-by-byte to memory didn't escape the long table-build time, and didn't help times enough (.93 seconds)

fnv64 is a fine hash; it's faster than crc32 (ignoring table build time) and gives you 2x hash size. But murmur3 is still faster and provides a hash size double that of fnv64.

Re: Using hashes -- pb's vs xxhash and murmur3

Posted: Sat Sep 28, 2013 1:00 am
by idle
Don't really know why murmor is faster than FNV
but I would have thought processing a hash inline while reading the file would be faster
than doing it from memory.

Code: Select all

Procedure FNV64a(*buf,len) 
   !mov rax, 14695981039346656037
   !mov r8, [p.v_len]
   !mov rcx, [p.p_buf]  
   !mov r9, 1099511628211
   !xor r10, r10
   !fnv64a_loop:
   !mov r10b, [rcx]
   !xor rax, r10
   !mul r9
   !inc rcx
   !dec r8
   !jnz fnv64a_loop
  ProcedureReturn 
EndProcedure