It is currently Fri Jun 22, 2018 7:59 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Hash Functions
PostPosted: Wed Aug 24, 2011 12:45 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3255
Location: New Zealand
FNV32a

Code:
Procedure.q FNV32(*pdata.i,len)
 !mov esi,  [p.p_pdata]     
 !mov ecx, [p.v_len]       
 !mov eax, 2166136261 
 !mov edi, 0x01000193   
 !xor ebx, ebx
 !ntbyte:
 !mov bl, [esi]
 !xor eax, ebx
 !mul edi
 !inc esi
 !dec ecx
 !jnz ntbyte
 ProcedureReturn
 
EndProcedure 

_________________
Got winter blues?
Enjoy a Caravan Trip into, "The Land of Grey and Pink", wine and punk weed optional!
https://www.youtube.com/watch?v=D5iX9YhCCp8


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sat Aug 27, 2011 3:38 pm 
Offline
Addict
Addict
User avatar

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1243
You need to preserve ebx, edi and esi.
Otherwise it could cause unexpected behaviour.

Code:
Procedure.q FNV32(*pdata.i,len)
!push ebx
!push edi
!push esi

!mov esi,  [p.p_pdata+12]     
!mov ecx, [p.v_len+12]       
!mov eax, 2166136261
!mov edi, 0x01000193   
!xor ebx, ebx
!ntbyte:
!mov bl, [esi]
!xor eax, ebx
!mul edi
!inc esi
!dec ecx
!jnz ntbyte

!pop esi
!pop edi
!pop ebx
ProcedureReturn

EndProcedure 


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun Aug 28, 2011 5:51 am 
Offline
Addict
Addict
User avatar

Joined: Fri Sep 21, 2007 5:52 am
Posts: 3255
Location: New Zealand
thanks

_________________
Got winter blues?
Enjoy a Caravan Trip into, "The Land of Grey and Pink", wine and punk weed optional!
https://www.youtube.com/watch?v=D5iX9YhCCp8


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Mon Feb 27, 2012 2:45 pm 
Offline
New User
New User

Joined: Thu Feb 23, 2012 1:06 pm
Posts: 8
Thorium wrote:
You need to preserve ebx, edi and esi.
Otherwise it could cause unexpected behaviour.

Code:
Procedure.q FNV32(*pdata.i,len)
!push ebx
!push edi
!push esi

!mov esi,  [p.p_pdata+12]     
!mov ecx, [p.v_len+12]       
!mov eax, 2166136261
!mov edi, 0x01000193   
!xor ebx, ebx
!ntbyte:
!mov bl, [esi]
!xor eax, ebx
!mul edi
!inc esi
!dec ecx
!jnz ntbyte

!pop esi
!pop edi
!pop ebx
ProcedureReturn

EndProcedure 


Nice, but why not PUSHAD/POPAD ?


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Mon Feb 27, 2012 5:02 pm 
Offline
Addict
Addict
User avatar

Joined: Sat Aug 15, 2009 6:59 pm
Posts: 1243
No specific reason, you can use them as well.


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Mon Feb 27, 2012 7:44 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
MurmurHash 3 (32 bit version)
http://code.google.com/p/smhasher/

Code:
; **********************************************
; * MurmurHash3 was written by Austin Appleby, *
; * and is placed in the public domain.        *
; * The author disclaims copyright to this     *
; * source code.                               *
; *                                            *
; * PureBasic conversion by Wilbert            *
; * Last update : 2012/02/29                   *
; **********************************************

Procedure.l MurmurHash3(*key, len.l, seed.l = 0)
  EnableASM
  MOV eax, seed
  MOV ecx, len
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key
    !push ebx
    !push ecx
  CompilerElse
    MOV rdx, *key
    !push rbx
    !push rcx
  CompilerEndIf
  !mov ebx, eax
  !sub ecx, 4
  !js mh3_tail
  ; body
  !mh3_body_loop:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov eax, [edx]
    !add edx, 4
  CompilerElse
    !mov eax, [rdx]
    !add rdx, 4
  CompilerEndIf
  !imul eax, 0xcc9e2d51
  !rol eax, 15
  !imul eax, 0x1b873593
  !xor ebx, eax
  !rol ebx, 13
  !imul ebx, 5
  !add ebx, 0xe6546b64
  !sub ecx, 4
  !jns mh3_body_loop
  ; tail
  !mh3_tail:
  !xor eax, eax
  !add ecx, 3
  !js mh3_finalize
  !jz mh3_t1
  !dec ecx
  !jz mh3_t2
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov al, [edx + 2]
    !shl eax, 16
    !mh3_t2: mov ah, [edx + 1]
    !mh3_t1: mov al, [edx]
  CompilerElse
    !mov al, [rdx + 2]
    !shl eax, 16
    !mh3_t2: mov ah, [rdx + 1]
    !mh3_t1: mov al, [rdx]
  CompilerEndIf
  !imul eax, 0xcc9e2d51
  !rol eax, 15
  !imul eax, 0x1b873593
  !xor ebx, eax
  ; finalization
  !mh3_finalize:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ecx
  CompilerElse
    !pop rcx
  CompilerEndIf
  !xor ebx, ecx
  !mov eax, ebx
  !shr ebx, 16
  !xor eax, ebx
  !imul eax, 0x85ebca6b
  !mov ebx, eax
  !shr ebx, 13
  !xor eax, ebx
  !imul eax, 0xc2b2ae35
  !mov ebx, eax
  !shr ebx, 16
  !xor eax, ebx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ebx
  CompilerElse
    !pop rbx 
  CompilerEndIf
  DisableASM
  ProcedureReturn
EndProcedure


Last edited by wilbert on Wed Feb 29, 2012 6:28 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Tue Feb 28, 2012 6:09 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
While I was converting MurmurHash3, I noticed the FNV32a procedure can be made significantly faster.
First a remark, the return type of FNV32a should be a long instead of a quad.

Here's the improved FNV32a code
Code:
Procedure.l FNV32a(*key, len.l)
  EnableASM
  MOV edx, *key
  MOV ecx, len
  !push ebx
  !mov eax, 2166136261
  !fnv32a_loop:
  !movzx ebx, byte [edx]
  !xor eax, ebx
  !imul eax, 0x01000193
  !inc edx
  !dec ecx
  !jnz fnv32a_loop
  !pop ebx
  DisableASM
  ProcedureReturn
EndProcedure

I also created an alternative one for zero terminated strings that doesn't require a length parameter
Code:
Procedure FNV32s(*key)
  EnableASM
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key
  CompilerElse
    MOV rdx, *key
  CompilerEndIf
  !mov eax, 2166136261
  !jmp fnv32s_entry
  !fnv32s_loop:
  !xor eax, ecx
  !imul eax, 0x01000193
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    CompilerIf #PB_Compiler_Unicode
      !add edx, 2   
      !fnv32s_entry:
      !movzx ecx, word [edx]
    CompilerElse
      !add edx, 1
      !fnv32s_entry:
      !movzx ecx, byte [edx]
    CompilerEndIf
  CompilerElse
    CompilerIf #PB_Compiler_Unicode
      !add rdx, 2   
      !fnv32s_entry:
      !movzx ecx, word [rdx]
    CompilerElse
      !add rdx, 1
      !fnv32s_entry:
      !movzx ecx, byte [rdx]
    CompilerEndIf
  CompilerEndIf
  !cmp ecx, 0
  !jne fnv32s_loop
  DisableASM
  ProcedureReturn
EndProcedure


A speed comparison of the FNV32 code with the 'push' fix added by Thorium, the modified version I made and MurmurHash3.
Code:
s.s = "The quick brown fox jumps over the lazy dog"
s_len.l = Len(s)
cycles.l = 10000000

cycles - 1
tm1 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = FNV32(@s, s_len)
Next
tm2 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = FNV32a(@s, s_len)
Next
tm3 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = MurmurHash3(@s, s_len)
Next
tm4 = ElapsedMilliseconds()

result.s = "FNV32 : " + Str(tm2 - tm1) + Chr(10)
result   + "FNV32a : " + Str(tm3 - tm2) + Chr(10)
result   + "MurmurHash3 : " + Str(tm4 - tm3)

MessageRequester("Result", result)

The results of this test on my computer :
FNV32 : 796
FNV32a : 499
MurmurHash3 : 280


Last edited by wilbert on Sun Mar 11, 2012 1:44 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Fri Mar 09, 2012 4:53 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
Very fast hash algorithm
Code:
; ******************************************
; * The Meiyan hash algorithm              *
; * was written by Sanmayce                *
; * http://www.sanmayce.com/Fastest_Hash/  *
; *                                        *
; * PureBasic conversion by Wilbert        *
; * Last update : 2012/03/09               *
; ******************************************

Procedure MeiyanHash(*key, len.l)
  EnableASM
  MOV ecx, len
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key
    !push ebx
    !mov ebx, edx
  CompilerElse
    MOV r8, *key
  CompilerEndIf
  !mov eax, 2166136261
  !sub ecx, 8
  !js meiyan_tail
  ; body
  !meiyan_body_loop:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov edx, [ebx]
    !rol edx, 5
    !xor eax, edx
    !mov edx, [ebx + 4]
    !add ebx, 8
  CompilerElse
    !mov edx, [r8]
    !rol edx, 5
    !xor eax, edx
    !mov edx, [r8 + 4]
    !add r8, 8
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !sub ecx, 8
  !jns meiyan_body_loop
  ; tail
  !meiyan_tail:
  !add ecx, 8
  !test ecx, 4
  !jz meiyan_t2
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, word [ebx]
    !xor eax, edx
    !imul eax, 709607
    !movzx edx, word [ebx + 2]
    !add ebx, 4
  CompilerElse
    !movzx edx, word [r8]
    !xor eax, edx
    !imul eax, 709607
    !movzx edx, word [r8 + 2]
    !add r8, 4
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t2:
  !test ecx, 2
  !jz meiyan_t3
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, word [ebx]
    !add ebx, 2
  CompilerElse
    !movzx edx, word [r8]
    !add r8, 2
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t3:
  !test ecx, 1
  !jz meiyan_t4
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, byte [ebx]
  CompilerElse
    !movzx edx, byte [r8]
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t4:
  !mov edx, eax
  !shr edx, 16
  !xor eax, edx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ebx
  CompilerEndIf
  DisableASM
  ProcedureReturn
EndProcedure

MurmurHash3 seems to have less collisions compared to MeiyanHash.


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Thu Apr 16, 2015 6:53 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
Larson hash
Code:
Procedure.l LarsonHash(*String); Paul Larson hash
  !xor eax, eax
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_String]
    !jmp larsonhash_l1
    !larsonhash_l0:
    !imul eax, 101
    CompilerIf #PB_Compiler_Unicode
      !add rdx, 2
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, word [rdx]
    CompilerElse
      !add rdx, 1
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, byte [rdx]
    CompilerEndIf   
  CompilerElse
    !mov edx, [p.p_String]
    !jmp larsonhash_l1
    !larsonhash_l0:
    !imul eax, 101
    CompilerIf #PB_Compiler_Unicode
      !add edx, 2
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, word [edx]
    CompilerElse
      !add edx, 1
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, byte [edx]
    CompilerEndIf
  CompilerEndIf
  !test ecx, ecx
  !jnz larsonhash_l0
  ProcedureReturn
EndProcedure



s.s = "Paul Larson hash test"
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  hash.l = LarsonHash(@s)
Next
t2 = ElapsedMilliseconds()
MessageRequester(Str(hash),Str(t2-t1))



FastHash64 (x64 only)
Code:
; FastHash64 algorithm by Zilong Tan

Procedure.q FastHash64(*Buffer, Len, Seed.q=0)
  !mov r10, 0x2127599bf4325c37
  !mov r11, 0x880355f21e6d1965
  !mov rdx, [p.p_Buffer]
  !mov rcx, [p.v_Len]
  !mov rax, rcx         ; h = seed ^ (len * m);
  !imul rax, r11
  !xor rax, [p.v_Seed]
  !sub rcx, 8
  !jc .l1
  ; 8 byte loop 
  !.l0:
  !mov r8, [rdx]        ; v = *pos++;
  !add rdx, 8
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  !sub rcx, 8
  !jnc .l0
  ; remaining bytes
  !.l1:
  !add rcx, 8
  !jz .l5
  !xor r8, r8
  !test rcx, 4
  !jz .l2
  ; get 4 bytes
  !mov r8d, [rdx]
  !add rdx, 4
  !ror r8, 32
  !.l2:
  !test rcx, 2
  !jz .l3
  ; get 2 bytes
  !movzx r9d, word [rdx]
  !add rdx, 2
  !xor r8, r9
  !ror r8, 16
  !.l3:
  !test rcx, 1
  !jz .l4
  ; get 1 byte
  !movzx r9d, byte [rdx]
  !xor r8, r9
  !ror r8, 8
  !.l4:
  !and rcx, 7
  !shl rcx, 3
  !rol r8, cl
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  ; -- mix(h) start --
  !.l5:
  !mov r9, rax
  !shr r9, 23
  !xor rax, r9
  !imul rax, r10
  !mov r9, rax
  !shr r9, 47
  !xor rax, r9
  ; -- mix end --
  ProcedureReturn       ; return mix(h);
EndProcedure

_________________
MacOS 10.13 High Sierra, PB 5.60 x64


Last edited by wilbert on Fri May 25, 2018 5:15 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun May 20, 2018 7:51 am 
Offline
User
User

Joined: Wed Apr 18, 2018 8:24 am
Posts: 25
Thanks for providing all these hash functions!

@wilbert
Is it possible to use one of them (in this case the MurmurHash3 one) on files (instead of strings) as well?

I need to compare larger binary files if they are not equal and the PB inbuild file hashing routines are just
a bit slow for these kind of comparisons (e.g. the md5 file fingerprint needs >5 seconds to calculate the
hash of a 1 GB file and the SHA-1 isnt't that much faster). Using just the file size as a check if files are
different is not enough, their names can be different...


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun May 20, 2018 4:10 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
forumuser wrote:
@wilbert
Is it possible to use one of them (in this case the MurmurHash3 one) on files (instead of strings) as well?

To process files, the procedure would have to be split into different parts so that multiple blocks of data can be processed before the finalize function is called.
Are you compiling for 32 or 64 bit ?
There seems to be a very fast hash function for 64 bit named XXH64.

Another thing you could consider is to use two hashes for files larger than 1MB; one for the first MB and one for the remaining part.
That way you could first compare the first part of a file and only continue with the remaining part if the hashes of the first part are equal.

_________________
MacOS 10.13 High Sierra, PB 5.60 x64


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun May 20, 2018 4:32 pm 
Offline
User
User

Joined: Wed Apr 18, 2018 8:24 am
Posts: 25
Quote:
Are you compiling for 32 or 64 bit ?

Unfortunately, for both

Quote:
There seems to be a very fast hash function for 64 bit named XXH64

https://cyan4973.github.io/xxHash/

Wow, yeah, that's really fast! Do you plan to implement these algorithms (XXH64 / XXH32)?

Quote:
Another thing you could consider is to use two hashes for files larger than 1MB; one for the first MB and one for the remaining part.
That way you could first compare the first part of a file and only continue with the remaining part if the hashes of the first part are equal.

That sounds like a viable option as well. Do you have any sample code lying around that demonstrates how this should be done in practice?

Regards,
Jack


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun May 20, 2018 7:58 pm 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
forumuser wrote:
Wow, yeah, that's really fast! Do you plan to implement these algorithms (XXH64 / XXH32)?

I could.
XXH32 would be about 2.5x faster compared to MurmurHash3 and could be used with both the 32 and 64 bit version of PB.
XXH64 would be quite a bit faster but it could only be used with PB 64 bit and the generated hashes are different from XXH32.

forumuser wrote:
That sounds like a viable option as well. Do you have any sample code lying around that demonstrates how this should be done in practice?

Unfortunately not.
You would have to allocate a memory buffer, load the first part of a file into it and use Fingerprint to generate the hash.

_________________
MacOS 10.13 High Sierra, PB 5.60 x64


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Sun May 20, 2018 8:39 pm 
Offline
User
User

Joined: Wed Apr 18, 2018 8:24 am
Posts: 25
Quote:
XXH32 would be about 2.5x faster compared to MurmurHash3 and could be used with both the 32 and 64 bit version of PB
XXH64 would be quite a bit faster but it could only be used with PB 64 bit and the generated hashes are different from XXH32


I guess both would serve their purpose but ofc it's up to you (if at all) if you have the time / motivation to implement only one or both...

This should be a correct implementation, right? Tested it on the 2GB file itself and the first part as a single file separately (split with HxD under Windows, at $10000). Both hashes are identical.

Code:
; Read a part of a file into memory and calculate the checksum
; Default size = 1 MB and default cipher = MD5
Procedure.s GetHashFromFilePart(file.s, size.q=$100000, cipher=#PB_Cipher_MD5)
  Protected.i hFile
  Protected.s hash = ""
  Protected *buffer

  hFile = ReadFile(#PB_Any, file)
  If hFile
    *buffer = AllocateMemory(size)
    If *buffer
      If ReadData(hFile, *buffer, size)
        hash = StringFingerprint(PeekS(*buffer, size), cipher)
      Else
        Debug "Could not read data!"
      EndIf
      FreeMemory(*buffer)
    Else
      Debug "Couldn't allocate the requested memory!"
    EndIf
    CloseFile(hFile)
  EndIf

  ProcedureReturn PeekS(@hash)
EndProcedure

UseMD5Fingerprint()
Debug GetHashFromFilePart("D:\2GB.iso")


Top
 Profile  
Reply with quote  
 Post subject: Re: Hash Functions
PostPosted: Mon May 21, 2018 6:26 am 
Offline
PureBasic Expert
PureBasic Expert

Joined: Sun Aug 08, 2004 5:21 am
Posts: 3145
Location: Netherlands
forumuser wrote:
I guess both would serve their purpose but ofc it's up to you (if at all) if you have the time / motivation to implement only one or both...

I'm trying XXH32.
viewtopic.php?f=12&t=70735

forumuser wrote:
This should be a correct implementation, right? Tested it on the 2GB file itself and the first part as a single file separately (split with HxD under Windows, at $10000). Both hashes are identical.


It's possible a file is less than 1MB and in that case you should process only the bytes that are read.
Therefore I would do it like this ...

Code:
; Read a part of a file into memory and calculate the checksum
; Default size = 1 MB and default cipher = MD5
Procedure.s GetHashFromFilePart(file.s, size.q=$100000, cipher=#PB_Cipher_MD5, bits=0)
  Protected hFile.i, bytesRead.q
  Protected Dim buffer.a(size-1)
  hFile = ReadFile(#PB_Any, file)
  If hFile
    bytesRead = ReadData(hFile, @buffer(), size)
    CloseFile(hFile)
    ProcedureReturn Fingerprint(@buffer(), bytesRead, cipher, bits)
  Else
    ProcedureReturn ""
  EndIf
EndProcedure

_________________
MacOS 10.13 High Sierra, PB 5.60 x64


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye