Hash Functions

Bare metal programming in PureBasic, for experienced users
User avatar
idle
Always Here
Always Here
Posts: 5018
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Hash Functions

Post by idle »

FNV32a

Code: Select all

Procedure.q FNV32(*pdata.i,len)
 !mov esi,  [p.p_pdata]     
 !mov ecx, [p.v_len]        
 !mov eax, 2166136261  
 !mov edi, 0x01000193   
 !xor ebx, ebx
 !ntbyte:
 !mov bl, [esi]
 !xor eax, ebx
 !mul edi
 !inc esi
 !dec ecx
 !jnz ntbyte
 ProcedureReturn 
 
EndProcedure  
Windows 11, Manjaro, Raspberry Pi OS
Image
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: Hash Functions

Post by Thorium »

You need to preserve ebx, edi and esi.
Otherwise it could cause unexpected behaviour.

Code: Select all

Procedure.q FNV32(*pdata.i,len)
!push ebx
!push edi
!push esi

!mov esi,  [p.p_pdata+12]     
!mov ecx, [p.v_len+12]       
!mov eax, 2166136261 
!mov edi, 0x01000193   
!xor ebx, ebx
!ntbyte:
!mov bl, [esi]
!xor eax, ebx
!mul edi
!inc esi
!dec ecx
!jnz ntbyte

!pop esi
!pop edi
!pop ebx
ProcedureReturn

EndProcedure  
User avatar
idle
Always Here
Always Here
Posts: 5018
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Hash Functions

Post by idle »

thanks
Windows 11, Manjaro, Raspberry Pi OS
Image
evilgravedigger
New User
New User
Posts: 8
Joined: Thu Feb 23, 2012 1:06 pm

Re: Hash Functions

Post by evilgravedigger »

Thorium wrote:You need to preserve ebx, edi and esi.
Otherwise it could cause unexpected behaviour.

Code: Select all

Procedure.q FNV32(*pdata.i,len)
!push ebx
!push edi
!push esi

!mov esi,  [p.p_pdata+12]     
!mov ecx, [p.v_len+12]       
!mov eax, 2166136261 
!mov edi, 0x01000193   
!xor ebx, ebx
!ntbyte:
!mov bl, [esi]
!xor eax, ebx
!mul edi
!inc esi
!dec ecx
!jnz ntbyte

!pop esi
!pop edi
!pop ebx
ProcedureReturn

EndProcedure  
Nice, but why not PUSHAD/POPAD ?
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: Hash Functions

Post by Thorium »

No specific reason, you can use them as well.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

MurmurHash 3 (32 bit version)
http://code.google.com/p/smhasher/

Code: Select all

; **********************************************
; * MurmurHash3 was written by Austin Appleby, *
; * and is placed in the public domain.        *
; * The author disclaims copyright to this     *
; * source code.                               *
; *                                            *
; * PureBasic conversion by Wilbert            *
; * Last update : 2012/02/29                   *
; **********************************************

Procedure.l MurmurHash3(*key, len.l, seed.l = 0)
  EnableASM
  MOV eax, seed
  MOV ecx, len
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key 
    !push ebx
    !push ecx
  CompilerElse
    MOV rdx, *key
    !push rbx
    !push rcx
  CompilerEndIf
  !mov ebx, eax
  !sub ecx, 4
  !js mh3_tail
  ; body
  !mh3_body_loop:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov eax, [edx]
    !add edx, 4
  CompilerElse
    !mov eax, [rdx]
    !add rdx, 4
  CompilerEndIf
  !imul eax, 0xcc9e2d51
  !rol eax, 15
  !imul eax, 0x1b873593
  !xor ebx, eax
  !rol ebx, 13
  !imul ebx, 5
  !add ebx, 0xe6546b64
  !sub ecx, 4
  !jns mh3_body_loop
  ; tail
  !mh3_tail:
  !xor eax, eax
  !add ecx, 3
  !js mh3_finalize
  !jz mh3_t1
  !dec ecx
  !jz mh3_t2
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov al, [edx + 2]
    !shl eax, 16
    !mh3_t2: mov ah, [edx + 1]
    !mh3_t1: mov al, [edx]
  CompilerElse
    !mov al, [rdx + 2]
    !shl eax, 16
    !mh3_t2: mov ah, [rdx + 1]
    !mh3_t1: mov al, [rdx]
  CompilerEndIf
  !imul eax, 0xcc9e2d51
  !rol eax, 15
  !imul eax, 0x1b873593
  !xor ebx, eax
  ; finalization
  !mh3_finalize:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ecx
  CompilerElse
    !pop rcx
  CompilerEndIf
  !xor ebx, ecx
  !mov eax, ebx
  !shr ebx, 16
  !xor eax, ebx
  !imul eax, 0x85ebca6b
  !mov ebx, eax
  !shr ebx, 13
  !xor eax, ebx
  !imul eax, 0xc2b2ae35
  !mov ebx, eax
  !shr ebx, 16
  !xor eax, ebx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ebx
  CompilerElse
    !pop rbx  
  CompilerEndIf
  DisableASM
  ProcedureReturn
EndProcedure
Last edited by wilbert on Wed Feb 29, 2012 6:28 am, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

While I was converting MurmurHash3, I noticed the FNV32a procedure can be made significantly faster.
First a remark, the return type of FNV32a should be a long instead of a quad.

Here's the improved FNV32a code

Code: Select all

Procedure.l FNV32a(*key, len.l)
  EnableASM
  MOV edx, *key
  MOV ecx, len
  !push ebx
  !mov eax, 2166136261 
  !fnv32a_loop:
  !movzx ebx, byte [edx]
  !xor eax, ebx
  !imul eax, 0x01000193
  !inc edx
  !dec ecx
  !jnz fnv32a_loop
  !pop ebx
  DisableASM
  ProcedureReturn
EndProcedure
I also created an alternative one for zero terminated strings that doesn't require a length parameter

Code: Select all

Procedure FNV32s(*key)
  EnableASM
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key
  CompilerElse
    MOV rdx, *key
  CompilerEndIf
  !mov eax, 2166136261
  !jmp fnv32s_entry
  !fnv32s_loop:
  !xor eax, ecx
  !imul eax, 0x01000193
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    CompilerIf #PB_Compiler_Unicode
      !add edx, 2    
      !fnv32s_entry:
      !movzx ecx, word [edx]
    CompilerElse
      !add edx, 1
      !fnv32s_entry:
      !movzx ecx, byte [edx]
    CompilerEndIf
  CompilerElse
    CompilerIf #PB_Compiler_Unicode
      !add rdx, 2    
      !fnv32s_entry:
      !movzx ecx, word [rdx]
    CompilerElse
      !add rdx, 1
      !fnv32s_entry:
      !movzx ecx, byte [rdx]
    CompilerEndIf
  CompilerEndIf
  !cmp ecx, 0
  !jne fnv32s_loop
  DisableASM
  ProcedureReturn
EndProcedure
A speed comparison of the FNV32 code with the 'push' fix added by Thorium, the modified version I made and MurmurHash3.

Code: Select all

s.s = "The quick brown fox jumps over the lazy dog"
s_len.l = Len(s)
cycles.l = 10000000

cycles - 1
tm1 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = FNV32(@s, s_len)
Next
tm2 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = FNV32a(@s, s_len)
Next
tm3 = ElapsedMilliseconds()
For i = 0 To cycles
  a.l = MurmurHash3(@s, s_len)
Next
tm4 = ElapsedMilliseconds()

result.s = "FNV32 : " + Str(tm2 - tm1) + Chr(10)
result   + "FNV32a : " + Str(tm3 - tm2) + Chr(10)
result   + "MurmurHash3 : " + Str(tm4 - tm3) 

MessageRequester("Result", result)
The results of this test on my computer :
FNV32 : 796
FNV32a : 499
MurmurHash3 : 280
Last edited by wilbert on Sun Mar 11, 2012 1:44 pm, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

Very fast hash algorithm

Code: Select all

; ******************************************
; * The Meiyan hash algorithm              *
; * was written by Sanmayce                *
; * http://www.sanmayce.com/Fastest_Hash/  *
; *                                        *
; * PureBasic conversion by Wilbert        *
; * Last update : 2012/03/09               *
; ******************************************

Procedure MeiyanHash(*key, len.l)
  EnableASM
  MOV ecx, len
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    MOV edx, *key
    !push ebx
    !mov ebx, edx
  CompilerElse
    MOV r8, *key
  CompilerEndIf
  !mov eax, 2166136261
  !sub ecx, 8
  !js meiyan_tail
  ; body
  !meiyan_body_loop:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov edx, [ebx]
    !rol edx, 5
    !xor eax, edx
    !mov edx, [ebx + 4]
    !add ebx, 8
  CompilerElse
    !mov edx, [r8]
    !rol edx, 5
    !xor eax, edx
    !mov edx, [r8 + 4]
    !add r8, 8
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !sub ecx, 8
  !jns meiyan_body_loop
  ; tail
  !meiyan_tail:
  !add ecx, 8
  !test ecx, 4
  !jz meiyan_t2
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, word [ebx]
    !xor eax, edx
    !imul eax, 709607
    !movzx edx, word [ebx + 2]
    !add ebx, 4
  CompilerElse
    !movzx edx, word [r8]
    !xor eax, edx
    !imul eax, 709607
    !movzx edx, word [r8 + 2]
    !add r8, 4
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t2:
  !test ecx, 2
  !jz meiyan_t3
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, word [ebx]
    !add ebx, 2
  CompilerElse
    !movzx edx, word [r8]
    !add r8, 2
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t3:
  !test ecx, 1
  !jz meiyan_t4
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !movzx edx, byte [ebx]
  CompilerElse
    !movzx edx, byte [r8]
  CompilerEndIf
  !xor eax, edx
  !imul eax, 709607
  !meiyan_t4:
  !mov edx, eax
  !shr edx, 16
  !xor eax, edx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !pop ebx
  CompilerEndIf
  DisableASM
  ProcedureReturn
EndProcedure
MurmurHash3 seems to have less collisions compared to MeiyanHash.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

GoodOAAT

Code: Select all

; GoodOAAT algorithm by Sokolov Yura aka funny-falcon

Procedure.l GoodOAAT(*key.Ascii, len, seed.l = 0)
  !mov edx, [p.v_seed]
  !mov ecx, [p.v_len]
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rax, [p.p_key]
    !push rbx
    !push rsi
    !mov rsi, rax
  CompilerElse
    !mov eax, [p.p_key]
    !push ebx
    !push esi
    !mov esi, eax
  CompilerEndIf  
  !mov eax, edx
  !xor edx, 0x3b00
  !rol eax, 15
  !sub ecx, 1
  !jc .l1
  !.l0:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !movzx ebx, byte [rsi]
    !add rsi, 1
  CompilerElse
    !movzx ebx, byte [esi]
    !add esi, 1
  CompilerEndIf   
  !add edx, ebx
  !lea edx, [edx+edx*8]
  !add eax, edx
  !rol eax, 7
  !lea eax, [eax+eax*4]
  !sub ecx, 1
  !jnc .l0
  !.l1:
  !mov ecx, eax
  !xor edx, eax
  !rol ecx, 14
  !add edx, ecx
  !mov ecx, edx
  !xor eax, edx
  !rol ecx, 26
  !add eax, ecx
  !mov ecx, eax
  !xor edx, eax
  !rol ecx, 5
  !add edx, ecx
  !mov ecx, edx
  !xor eax, edx
  !rol ecx, 24
  !add eax, ecx
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !pop rsi
    !pop rbx
  CompilerElse
    !pop esi
    !pop ebx
  CompilerEndIf   
  ProcedureReturn
EndProcedure

Larson hash

Code: Select all

Procedure.l LarsonHash(*String); Paul Larson hash
  !xor eax, eax
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_String]
    !jmp larsonhash_l1
    !larsonhash_l0:
    !imul eax, 101
    CompilerIf #PB_Compiler_Unicode
      !add rdx, 2
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, word [rdx]
    CompilerElse
      !add rdx, 1
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, byte [rdx]
    CompilerEndIf    
  CompilerElse
    !mov edx, [p.p_String]
    !jmp larsonhash_l1
    !larsonhash_l0:
    !imul eax, 101
    CompilerIf #PB_Compiler_Unicode
      !add edx, 2
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, word [edx]
    CompilerElse
      !add edx, 1
      !add eax, ecx
      !larsonhash_l1:
      !movzx ecx, byte [edx]
    CompilerEndIf 
  CompilerEndIf
  !test ecx, ecx
  !jnz larsonhash_l0
  ProcedureReturn
EndProcedure



s.s = "Paul Larson hash test"
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  hash.l = LarsonHash(@s)
Next
t2 = ElapsedMilliseconds()
MessageRequester(Str(hash),Str(t2-t1))

FastHash64 (x64)
updated implementation

Code: Select all

; FastHash64 algorithm by Zilong Tan

Procedure.q FastHash64(*Buffer, Len, Seed.q=0)
  !mov r10, 0x2127599bf4325c37
  !mov r11, 0x880355f21e6d1965
  !mov rdx, [p.p_Buffer]
  !mov rcx, [p.v_Len]
  !mov rax, rcx         ; h = seed ^ (len * m);
  !imul rax, r11
  !xor rax, [p.v_Seed]
  !jmp .l1
  ; 8 byte loop  
  !.l0:
  !mov r8, [rdx]        ; v = *pos++;
  !add rdx, 8
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  !.l1:
  !sub rcx, 8
  !jnc .l0
  ; remaining bytes
  !and ecx, 7
  !jz .l5
  !xor r8, r8
  !btr ecx, 2
  !jnc .l2
  ; get 4 bytes
  !mov r8d, [rdx + rcx]
  !.l2:
  !btr ecx, 1
  !jnc .l3
  ; get 2 bytes
  !shl r8, 16
  !mov r8w, [rdx + rcx]
  !.l3:
  !btr ecx, 0
  !jnc .l4
  ; get 1 byte
  !shl r8, 8
  !mov r8b, [rdx]
  !.l4:
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  ; -- mix(h) start --
  !.l5:
  !mov r9, rax
  !shr r9, 23
  !xor rax, r9
  !imul rax, r10
  !mov r9, rax
  !shr r9, 47
  !xor rax, r9
  ; -- mix end --
  ProcedureReturn       ; return mix(h);
EndProcedure

old implementation

Code: Select all

; FastHash64 algorithm by Zilong Tan

Procedure.q FastHash64(*Buffer, Len, Seed.q=0)
  !mov r10, 0x2127599bf4325c37
  !mov r11, 0x880355f21e6d1965
  !mov rdx, [p.p_Buffer]
  !mov rcx, [p.v_Len]
  !mov rax, rcx         ; h = seed ^ (len * m);
  !imul rax, r11
  !xor rax, [p.v_Seed]
  !sub rcx, 8
  !jc .l1
  ; 8 byte loop  
  !.l0:
  !mov r8, [rdx]        ; v = *pos++;
  !add rdx, 8
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  !sub rcx, 8
  !jnc .l0
  ; remaining bytes
  !.l1:
  !add rcx, 8
  !jz .l5
  !xor r8, r8
  !test rcx, 4
  !jz .l2
  ; get 4 bytes
  !mov r8d, [rdx]
  !add rdx, 4
  !ror r8, 32
  !.l2:
  !test rcx, 2
  !jz .l3
  ; get 2 bytes
  !movzx r9d, word [rdx]
  !add rdx, 2
  !xor r8, r9
  !ror r8, 16
  !.l3:
  !test rcx, 1
  !jz .l4
  ; get 1 byte
  !movzx r9d, byte [rdx]
  !xor r8, r9
  !ror r8, 8
  !.l4:
  !and rcx, 7
  !shl rcx, 3
  !rol r8, cl
  ; -- mix(v) start --
  !mov r9, r8
  !shr r9, 23
  !xor r8, r9
  !imul r8, r10
  !mov r9, r8
  !shr r9, 47
  !xor r8, r9
  ; -- mix end --
  !xor rax, r8          ; h ^= mix(v);
  !imul rax, r11        ; h *= m;
  ; -- mix(h) start --
  !.l5:
  !mov r9, rax
  !shr r9, 23
  !xor rax, r9
  !imul rax, r10
  !mov r9, rax
  !shr r9, 47
  !xor rax, r9
  ; -- mix end --
  ProcedureReturn       ; return mix(h);
EndProcedure

FastHash64 (x86)

Code: Select all

; FastHash64 algorithm by Zilong Tan

Macro FH64___mix_v()
  ; v ^= v >> 23;
  !mov	 ebx, eax
  !shrd	 ebx, edx, 23
  !xor	 eax, ebx
  !mov	 ebx, edx
  !shr	 ebx, 23
  !xor	 edx, ebx
  ; v *= 0x2127599bf4325c37ULL;
  !imul	 edx, 0xf4325c37
  !imul	 ebx, eax, 0x2127599b
  !add	 ebx, edx
  !mov   edx, 0xf4325c37
  !mul   edx
  !add   edx, ebx
  ; v ^= v >> 47;
  !mov	 ebx, edx
  !shr	 ebx, 15
  !xor	 eax, ebx
EndMacro

Macro FH64___h_mul_m()
  ; h *= m;
  !imul	 ebp, 0x1e6d1965
  !imul	 ebx, edi, 0x880355f2
  !mov	 eax, 0x1e6d1965
  !add	 ebp, ebx
  !mul	 edi
  !add   ebp, edx
  !mov	 edi, eax
EndMacro

Macro FH64___h_xor_v()
  ; h ^= v;  
  !xor   ebp, edx
  !xor   edi, eax
EndMacro

Procedure.q FastHash64(*buf, size, seed.q=0)
  !mov   edx, [p.p_buf]
  !mov   ecx, [p.v_size]
  !lea   eax, [p.v_seed]
  !push  ebp
  !push  edi
  !push  esi
  !push  ebx
  !mov   esi, edx
  ; h = seed
  !mov   ebp, [eax+4]
  !mov   edi, [eax]  
  ; v = len * m
  !imul	 ebx, ecx, 0x880355f2
  !mov	 eax, 0x1e6d1965
  !mul	 ecx
  !add   edx, ebx
  FH64___h_xor_v()
  !sub   ecx, 8
  !jc    .l1
  ; >> main loop <<
  !.l0:
  !mov   edx, [esi+4]
  !mov   eax, [esi]
  !add   esi, 8
  FH64___mix_v()
  FH64___h_xor_v()  
  FH64___h_mul_m()
  !sub   ecx, 8
  !jnc   .l0
  ; >> remaining bytes <<
  !.l1:
  !and   ecx, 7
  !jz    .l6
  !xor   edx, edx
  !cmp   ecx, 4
  !jb    .l3
  !je    .l2
  ; >> 4-7 bytes <<
  !mov	 edx, [esi+ecx-4]
  !imul  ecx, -8
  !shr   edx, cl
  !.l2:
  !mov   eax, [esi]
  !jmp   .l5
  ; >> 0-3 bytes <<
  !.l3:
  !xor   eax, eax
  !test  ecx, 2
  !jz    .l4
  !movzx eax, word [esi+ecx-2]
  !.l4:
  !test  ecx, 1
  !jz    .l5
  !movzx ebx, byte [esi]
  !shl   eax, 8
  !or    eax, ebx
  !.l5:
  FH64___mix_v()
  FH64___h_xor_v()  
  FH64___h_mul_m()  
  !.l6:
  ; v = h
  !mov   eax, edi
  !mov   edx, ebp
  FH64___mix_v()
  !pop   ebx
  !pop   esi
  !pop   edi
  !pop   ebp
  ProcedureReturn
EndProcedure
Last edited by wilbert on Wed Aug 23, 2023 5:22 am, edited 6 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Re: Hash Functions

Post by forumuser »

Thanks for providing all these hash functions!

@wilbert
Is it possible to use one of them (in this case the MurmurHash3 one) on files (instead of strings) as well?

I need to compare larger binary files if they are not equal and the PB inbuild file hashing routines are just
a bit slow for these kind of comparisons (e.g. the md5 file fingerprint needs >5 seconds to calculate the
hash of a 1 GB file and the SHA-1 isnt't that much faster). Using just the file size as a check if files are
different is not enough, their names can be different...
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

forumuser wrote:@wilbert
Is it possible to use one of them (in this case the MurmurHash3 one) on files (instead of strings) as well?
To process files, the procedure would have to be split into different parts so that multiple blocks of data can be processed before the finalize function is called.
Are you compiling for 32 or 64 bit ?
There seems to be a very fast hash function for 64 bit named XXH64.

Another thing you could consider is to use two hashes for files larger than 1MB; one for the first MB and one for the remaining part.
That way you could first compare the first part of a file and only continue with the remaining part if the hashes of the first part are equal.
Windows (x64)
Raspberry Pi OS (Arm64)
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Re: Hash Functions

Post by forumuser »

Are you compiling for 32 or 64 bit ?
Unfortunately, for both
There seems to be a very fast hash function for 64 bit named XXH64
https://cyan4973.github.io/xxHash/

Wow, yeah, that's really fast! Do you plan to implement these algorithms (XXH64 / XXH32)?
Another thing you could consider is to use two hashes for files larger than 1MB; one for the first MB and one for the remaining part.
That way you could first compare the first part of a file and only continue with the remaining part if the hashes of the first part are equal.
That sounds like a viable option as well. Do you have any sample code lying around that demonstrates how this should be done in practice?

Regards,
Jack
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

forumuser wrote:Wow, yeah, that's really fast! Do you plan to implement these algorithms (XXH64 / XXH32)?
I could.
XXH32 would be about 2.5x faster compared to MurmurHash3 and could be used with both the 32 and 64 bit version of PB.
XXH64 would be quite a bit faster but it could only be used with PB 64 bit and the generated hashes are different from XXH32.
forumuser wrote:That sounds like a viable option as well. Do you have any sample code lying around that demonstrates how this should be done in practice?
Unfortunately not.
You would have to allocate a memory buffer, load the first part of a file into it and use Fingerprint to generate the hash.
Windows (x64)
Raspberry Pi OS (Arm64)
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Re: Hash Functions

Post by forumuser »

XXH32 would be about 2.5x faster compared to MurmurHash3 and could be used with both the 32 and 64 bit version of PB
XXH64 would be quite a bit faster but it could only be used with PB 64 bit and the generated hashes are different from XXH32
I guess both would serve their purpose but ofc it's up to you (if at all) if you have the time / motivation to implement only one or both...

This should be a correct implementation, right? Tested it on the 2GB file itself and the first part as a single file separately (split with HxD under Windows, at $10000). Both hashes are identical.

Code: Select all

; Read a part of a file into memory and calculate the checksum
; Default size = 1 MB and default cipher = MD5
Procedure.s GetHashFromFilePart(file.s, size.q=$100000, cipher=#PB_Cipher_MD5)
  Protected.i hFile
  Protected.s hash = ""
  Protected *buffer

  hFile = ReadFile(#PB_Any, file)
  If hFile
    *buffer = AllocateMemory(size)
    If *buffer
      If ReadData(hFile, *buffer, size)
        hash = StringFingerprint(PeekS(*buffer, size), cipher)
      Else
        Debug "Could not read data!"
      EndIf
      FreeMemory(*buffer)
    Else
      Debug "Couldn't allocate the requested memory!"
    EndIf
    CloseFile(hFile)
  EndIf

  ProcedureReturn PeekS(@hash)
EndProcedure

UseMD5Fingerprint()
Debug GetHashFromFilePart("D:\2GB.iso")
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Hash Functions

Post by wilbert »

forumuser wrote:I guess both would serve their purpose but ofc it's up to you (if at all) if you have the time / motivation to implement only one or both...
I'm trying XXH32.
viewtopic.php?f=12&t=70735
forumuser wrote:This should be a correct implementation, right? Tested it on the 2GB file itself and the first part as a single file separately (split with HxD under Windows, at $10000). Both hashes are identical.
It's possible a file is less than 1MB and in that case you should process only the bytes that are read.
Therefore I would do it like this ...

Code: Select all

; Read a part of a file into memory and calculate the checksum
; Default size = 1 MB and default cipher = MD5
Procedure.s GetHashFromFilePart(file.s, size.q=$100000, cipher=#PB_Cipher_MD5, bits=0)
  Protected hFile.i, bytesRead.q
  Protected Dim buffer.a(size-1)
  hFile = ReadFile(#PB_Any, file)
  If hFile
    bytesRead = ReadData(hFile, @buffer(), size)
    CloseFile(hFile)
    ProcedureReturn Fingerprint(@buffer(), bytesRead, cipher, bits)
  Else
    ProcedureReturn ""
  EndIf
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply