SHA512/384 Fingerprint and FileFingerprint
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
I can make the code use a macro, what I don't understand is how to write the asm as a macro instead of a procedure. I think that's what I want to do but remember I may not know what I'm talking about.
BERESHEIT
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
I just wrote them inline into the existing macros:
But the speed doesn't improve 
I had to make too temporary vars in the R() macro because I don't know how to access a structured variable with a static array in asm. If there's a way to directly use the vars from the asm then I would be able to trim some fat.
Code: Select all
Macro R(t)
t2 = w\w[t-2]
t15 = w\w[t-15]
!mov eax, [p.v_t2]
!mov edx, eax
!mov ecx, eax
!ror eax, 17
!ror edx, 19
!shr ecx, 10
!xor eax, edx
!xor eax, ecx
!mov [p.v_t2], eax
!mov eax, [p.v_t15]
!mov edx, eax
!mov ecx, eax
!ror eax, 7
!ror edx, 18
!shr ecx, 3
!xor eax, edx
!xor eax, ecx
!mov [p.v_t15], eax
w\w[t] = t2 + w\W[t - 7] + t15 + w\w[t - 16]
EndMacro
Macro P(a,b,c,d,e,f,g,h,x,K)
temp1 = h + F1(e,f,g) + k + x
!mov eax, [p.v_#e]
!mov edx, eax
!mov ecx, eax
!ror eax, 6
!ror edx, 11
!ror ecx, 25
!xor eax, edx
!xor eax, ecx
!add [p.v_temp1], eax
temp2 = F0(a,b,c)
!mov eax, [p.v_#a]
!mov edx, eax
!mov ecx, eax
!ror eax, 2
!ror edx, 13
!ror ecx, 22
!xor eax, edx
!xor eax, ecx
!add [p.v_temp2], eax
d + temp1
h = temp1 + temp2
EndMacro

I had to make too temporary vars in the R() macro because I don't know how to access a structured variable with a static array in asm. If there's a way to directly use the vars from the asm then I would be able to trim some fat.
BERESHEIT
Re: SHA512/384 Fingerprint and FileFingerprint
Your code seems fine
As for the speed, I had a similar experience with the Endian64 function.
I converted it to a macro inside the process function since it is called there the most and couldn't see any significant speed difference.
I don't see much room inside your code for significant speed improvement.
Is it still much slower compared to the other library you referred to ?

As for the speed, I had a similar experience with the Endian64 function.
I converted it to a macro inside the process function since it is called there the most and couldn't see any significant speed difference.
I don't see much room inside your code for significant speed improvement.
Is it still much slower compared to the other library you referred to ?
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
It's the Hashslash file hasher found here: http://www.softpedia.com/get/System/Fil ... lash.shtml
My code is now around 40% slower compared to 350% slower before you guys started helping with the asm.
But then that may have nothing to do with this code per se. Maybe they're using File Mapping and simply reading the data in faster.
My code is now around 40% slower compared to 350% slower before you guys started helping with the asm.
But then that may have nothing to do with this code per se. Maybe they're using File Mapping and simply reading the data in faster.
BERESHEIT
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
ACK!! I am such a retard. It's a wonder I can feed myself. When I did the code change on the sigmas and tested the speed there was no difference. But I have two versions of the SHA256 program, called sha256.pbi and sha256.pb for use as an include and for compiling with tailbite, respectively. I just noticed my test code had the line IncludeFile "sha256.pb" at the top and I was working all this time on sha256.pbi never knowing that my tests didn't reflect my code changes.
So I reverted to the inline asm inside the macros as above and it did indeed make about a 20% improvement in the speed of the code.
All's well that ends well I guess.
So I reverted to the inline asm inside the macros as above and it did indeed make about a 20% improvement in the speed of the code.
All's well that ends well I guess.
BERESHEIT
Re: SHA512/384 Fingerprint and FileFingerprint
ACK!! I am such a retard. It's a wonder I can feed myself

It's actually easy to do (not notice that the include is the other version)
another area that could be optimized is here maybe it can use a lookup table
instead of LCase(Rset(Hex
Code: Select all
For j = 0 To iterations
PokeS( *output+j*2, LCase( RSet( Hex( PeekA(*sha512sum+j) ), 2, "0" ) ) )
Next
Windows 11, Manjaro, Raspberry Pi OS


Re: SHA512/384 Fingerprint and FileFingerprint
Netmaestro, I realized the plain asm approach you used in your other SHA routines that are 32 bit might be faster after all for 64 bit compared to SSE2.
The speed increase that was visible was in the block tests that handle two quads at a time but that isn't the case anymore.
Here's my plain ASM approach with sigma macros and the suggestion Thorium did earlier to use some extra SSE2 registers for temporary storage.
The speed difference between Sigma01 and Sigma23 as a procedure compared to as a function is very little so procedures might be better if the size of the compiled code is important.
The speed increase that was visible was in the block tests that handle two quads at a time but that isn't the case anymore.
Here's my plain ASM approach with sigma macros and the suggestion Thorium did earlier to use some extra SSE2 registers for temporary storage.
Code: Select all
;====================================================================================
;
; Library Commands: sha512FingerPrint()
; sha512FileFingerPrint()
; sha384FingerPrint()
; sha384FileFingerPrint()
;
; Author: Lloyd Gallant (netmaestro)
;
; Contributors: Thanks to wilbert, Danilo, thorium, infratec and idle
; for their help with the asm routines
; and to Christopher Devine for the
; c code this program is based on.
;
; Date: August 15, 2011
; Target Compiler: Purebasic 4 and up
; Target OS: Windows, Linux, MacOS
;
; License: GNU General Public License
;
; This program is free software; you can redistribute it and/or modify
; it under the terms of the GNU General Public License As published by
; the Free Software Foundation; either version 2 of the License, or
; (at your option) any later version.
;
; This program is distributed in the hope that it will be useful,
; but WITHOUT ANY WARRANTY; without even the implied warranty of
; MERCHANTABILITY Or FITNESS For A PARTICULAR PURPOSE. See the
; GNU General Public License For more details.
;
; The logic for this program is based on sha256.c found here:
;
; http://www.spale.com/download/scrypt/scrypt1.0/
;
; You can test the accuracy of this program by comparing results with
; test data at:
;
; http://csrc.nist.gov/groups/ST/toolkit/documents/Examples/SHA2_Additional.pdf
;
;===================================================================================
;
;
; Usage:
;
; result$ = sha512Fingerprint(*address, length, [ ,*progress ] )
; result$ = sha512FileFingerprint(file$, [ ,*progress ] )
;
; result$ = sha384Fingerprint(*address, length, [ ,*progress ] )
; result$ = sha384FileFingerprint(file$, [ ,*progress ] )
;
; Progress callback function:
;
; Procedure MyCallBack(value.i)
; ; value is 0 to 100 representing percentage completed
; Endprocedure
;
;
;================================================================
; STRUCTURES
;================================================================
Structure sha512_context
total.q
state.q [8]
buffer.a [128]
EndStructure
Structure UINT8_BUFFER
b.a[128]
EndStructure
Structure UINT64_BUFFER
w.q[80]
EndStructure
Structure msglen
lowpart.q
highpart.q
EndStructure
;================================================================
; HELPER MACROS
;================================================================
Macro F0(x,y,z)
((x & y) | (z & (x | y)))
EndMacro
Macro F1(x,y,z)
(z ! (x & (y ! z)))
EndMacro
Macro SIGMA01() ; By wilbert
!movd xmm0, ebx
!mov eax,dword [v_SigmaTemp02]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp02 + 4]
; rotate right 1
!shrd eax, edx, 1
; rotate right 8
!shrd ebx, edx, 8
; shift right 7
!shrd ecx, edx, 7
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp02], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 1
!shrd ebx, eax, 8
!shr ecx, 7
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp02 + 4], edx
!mov eax,dword [v_SigmaTemp13]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp13 + 4]
; rotate right 19
!shrd eax, edx, 19
; rotate right 61 = left 3
!shld ebx, edx, 3
; shift right 6
!shrd ecx, edx, 6
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp13], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 19
!shld ebx, eax, 3
!shr ecx, 6
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp13 + 4], edx
!movd ebx, xmm0
EndMacro
Global.q SigmaTemp02, SigmaTemp13
Macro SIGMA23() ; By wilbert
!movd xmm0, ebx
!mov eax,dword [v_SigmaTemp02]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp02 + 4]
; rotate right 28
!shrd eax, edx, 28
; rotate right 34 = left 30
!shld ebx, edx, 30
; rotate right 39 = left 25
!shld ecx, edx, 25
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp02], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 28
!shld ebx, eax, 30
!shld ecx, eax, 25
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp02 + 4], edx
!mov eax,dword [v_SigmaTemp13]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp13 + 4]
; rotate right 14
!shrd eax, edx, 14
; rotate right 18
!shrd ebx, edx, 18
; rotate right 41 = left 23
!shld ecx, edx, 23
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp13], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 14
!shrd ebx, eax, 18
!shld ecx, eax, 23
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp13 + 4], edx
!movd ebx, xmm0
EndMacro
Procedure.q ChgEnd64Addr__() ; By idle
; small adaptation by Wilbert to
; work around a Purebasic OS X bug
!mov eax,ChgEnd64_start
ProcedureReturn
!ChgEnd64_start:
!mov eax, [esp + 8]
!mov edx, [esp + 4]
!bswap eax
!bswap edx
!ret
EndProcedure
PrototypeC.q ProtoChgEnd64(value.q)
Global ChangeEndian64.ProtoChgEnd64 = ChgEnd64Addr__()
Macro R(t)
SigmaTemp02 = w\w[t-15]
SigmaTemp13 = w\w[t-2]
Sigma01()
w\w[t] = SigmaTemp13 + w\W[t - 7] + SigmaTemp02 + w\w[t - 16]
EndMacro
Macro P(a,b,c,d,e,f,g,h,x,K)
SigmaTemp02 = a
SigmaTemp13 = e
Sigma23()
temp1 = h + SigmaTemp13 + F1(e,f,g) + k + x
temp2 = SigmaTemp02 + F0(a,b,c)
d + temp1
h = temp1 + temp2
EndMacro
;================================================================
; LOCAL FUNCTIONS
;================================================================
Procedure sha384_starts(*ctx.sha512_context )
*ctx\state[0] = $cbbb9d5dc1059ed8
*ctx\state[1] = $629a292a367cd507
*ctx\state[2] = $9159015a3070dd17
*ctx\state[3] = $152fecd8f70e5939
*ctx\state[4] = $67332667ffc00b31
*ctx\state[5] = $8eb44a8768581511
*ctx\state[6] = $db0c2e0d64f98fa7
*ctx\state[7] = $47b5481dbefa4fa4
EndProcedure
Procedure sha512_starts( *ctx.sha512_context )
*ctx\state[0] = $6a09e667f3bcc908
*ctx\state[1] = $bb67ae8584caa73b
*ctx\state[2] = $3c6ef372fe94f82b
*ctx\state[3] = $a54ff53a5f1d36f1
*ctx\state[4] = $510e527fade682d1
*ctx\state[5] = $9b05688c2b3e6c1f
*ctx\state[6] = $1f83d9abfb41bd6b
*ctx\state[7] = $5be0cd19137e2179
EndProcedure
Procedure sha512_process( *ctx.sha512_context, *bytes.UINT64_BUFFER)
W.UINT64_BUFFER
Define.q A, B, C, D, E, F, G, H
Define.q temp1, temp2
For i=0 To 15
w\w[i] = ChangeEndian64( *bytes\w[i] )
Next
A = *ctx\state[0]
B = *ctx\state[1]
C = *ctx\state[2]
D = *ctx\state[3]
E = *ctx\state[4]
F = *ctx\state[5]
G = *ctx\state[6]
H = *ctx\state[7]
For ii=16 To 79
R(ii)
Next
P( A, B, C, D, E, F, G, H, w\w[ 0], $428A2F98D728AE22 )
P( H, A, B, C, D, E, F, G, w\w[ 1], $7137449123EF65CD )
P( G, H, A, B, C, D, E, F, w\w[ 2], $B5C0FBCFEC4D3B2F )
P( F, G, H, A, B, C, D, E, w\w[ 3], $E9B5DBA58189DBBC )
P( E, F, G, H, A, B, C, D, w\w[ 4], $3956C25BF348B538 )
P( D, E, F, G, H, A, B, C, w\w[ 5], $59F111F1B605D019 )
P( C, D, E, F, G, H, A, B, w\w[ 6], $923F82A4AF194F9B )
P( B, C, D, E, F, G, H, A, w\w[ 7], $AB1C5ED5DA6D8118 )
P( A, B, C, D, E, F, G, H, w\w[ 8], $D807AA98A3030242 )
P( H, A, B, C, D, E, F, G, w\w[ 9], $12835B0145706FBE )
P( G, H, A, B, C, D, E, F, w\w[10], $243185BE4EE4B28C )
P( F, G, H, A, B, C, D, E, w\w[11], $550C7DC3D5FFB4E2 )
P( E, F, G, H, A, B, C, D, w\w[12], $72BE5D74F27B896F )
P( D, E, F, G, H, A, B, C, w\w[13], $80DEB1FE3B1696B1 )
P( C, D, E, F, G, H, A, B, w\w[14], $9BDC06A725C71235 )
P( B, C, D, E, F, G, H, A, w\w[15], $C19BF174CF692694 )
P( A, B, C, D, E, F, G, H, w\w[16], $E49B69C19EF14AD2 )
P( H, A, B, C, D, E, F, G, w\w[17], $EFBE4786384F25E3 )
P( G, H, A, B, C, D, E, F, w\w[18], $0FC19DC68B8CD5B5 )
P( F, G, H, A, B, C, D, E, w\w[19], $240CA1CC77AC9C65 )
P( E, F, G, H, A, B, C, D, w\w[20], $2DE92C6F592B0275 )
P( D, E, F, G, H, A, B, C, w\w[21], $4A7484AA6EA6E483 )
P( C, D, E, F, G, H, A, B, w\w[22], $5CB0A9DCBD41FBD4 )
P( B, C, D, E, F, G, H, A, w\w[23], $76F988DA831153B5 )
P( A, B, C, D, E, F, G, H, w\w[24], $983E5152EE66DFAB )
P( H, A, B, C, D, E, F, G, w\w[25], $A831C66D2DB43210 )
P( G, H, A, B, C, D, E, F, w\w[26], $B00327C898FB213F )
P( F, G, H, A, B, C, D, E, w\w[27], $BF597FC7BEEF0EE4 )
P( E, F, G, H, A, B, C, D, w\w[28], $C6E00BF33DA88FC2 )
P( D, E, F, G, H, A, B, C, w\w[29], $D5A79147930AA725 )
P( C, D, E, F, G, H, A, B, w\w[30], $06CA6351E003826F )
P( B, C, D, E, F, G, H, A, w\w[31], $142929670A0E6E70 )
P( A, B, C, D, E, F, G, H, w\w[32], $27B70A8546D22FFC )
P( H, A, B, C, D, E, F, G, w\w[33], $2E1B21385C26C926 )
P( G, H, A, B, C, D, E, F, w\w[34], $4D2C6DFC5AC42AED )
P( F, G, H, A, B, C, D, E, w\w[35], $53380D139D95B3DF )
P( E, F, G, H, A, B, C, D, w\w[36], $650A73548BAF63DE )
P( D, E, F, G, H, A, B, C, w\w[37], $766A0ABB3C77B2A8 )
P( C, D, E, F, G, H, A, B, w\w[38], $81C2C92E47EDAEE6 )
P( B, C, D, E, F, G, H, A, w\w[39], $92722C851482353B )
P( A, B, C, D, E, F, G, H, w\w[40], $A2BFE8A14CF10364 )
P( H, A, B, C, D, E, F, G, w\w[41], $A81A664BBC423001 )
P( G, H, A, B, C, D, E, F, w\w[42], $C24B8B70D0F89791 )
P( F, G, H, A, B, C, D, E, w\w[43], $C76C51A30654BE30 )
P( E, F, G, H, A, B, C, D, w\w[44], $D192E819D6EF5218 )
P( D, E, F, G, H, A, B, C, w\w[45], $D69906245565A910 )
P( C, D, E, F, G, H, A, B, w\w[46], $F40E35855771202A )
P( B, C, D, E, F, G, H, A, w\w[47], $106AA07032BBD1B8 )
P( A, B, C, D, E, F, G, H, w\w[48], $19A4C116B8D2D0C8 )
P( H, A, B, C, D, E, F, G, w\w[49], $1E376C085141AB53 )
P( G, H, A, B, C, D, E, F, w\w[50], $2748774CDF8EEB99 )
P( F, G, H, A, B, C, D, E, w\w[51], $34B0BCB5E19B48A8 )
P( E, F, G, H, A, B, C, D, w\w[52], $391C0CB3C5C95A63 )
P( D, E, F, G, H, A, B, C, w\w[53], $4ED8AA4AE3418ACB )
P( C, D, E, F, G, H, A, B, w\w[54], $5B9CCA4F7763E373 )
P( B, C, D, E, F, G, H, A, w\w[55], $682E6FF3D6B2B8A3 )
P( A, B, C, D, E, F, G, H, w\w[56], $748F82EE5DEFB2FC )
P( H, A, B, C, D, E, F, G, w\w[57], $78A5636F43172F60 )
P( G, H, A, B, C, D, E, F, w\w[58], $84C87814A1F0AB72 )
P( F, G, H, A, B, C, D, E, w\w[59], $8CC702081A6439EC )
P( E, F, G, H, A, B, C, D, w\w[60], $90BEFFFA23631E28 )
P( D, E, F, G, H, A, B, C, w\w[61], $A4506CEBDE82BDE9 )
P( C, D, E, F, G, H, A, B, w\w[62], $BEF9A3F7B2C67915 )
P( B, C, D, E, F, G, H, A, w\w[63], $C67178F2E372532B )
P( A, B, C, D, E, F, G, H, w\w[64], $CA273ECEEA26619C )
P( H, A, B, C, D, E, F, G, w\w[65], $D186B8C721C0C207 )
P( G, H, A, B, C, D, E, F, w\w[66], $EADA7DD6CDE0EB1E )
P( F, G, H, A, B, C, D, E, w\w[67], $F57D4F7FEE6ED178 )
P( E, F, G, H, A, B, C, D, w\w[68], $06F067AA72176FBA )
P( D, E, F, G, H, A, B, C, w\w[69], $0A637DC5A2C898A6 )
P( C, D, E, F, G, H, A, B, w\w[70], $113F9804BEF90DAE )
P( B, C, D, E, F, G, H, A, w\w[71], $1B710B35131C471B )
P( A, B, C, D, E, F, G, H, w\w[72], $28DB77F523047D84 )
P( H, A, B, C, D, E, F, G, w\w[73], $32CAAB7B40C72493 )
P( G, H, A, B, C, D, E, F, w\w[74], $3C9EBE0A15C9BEBC )
P( F, G, H, A, B, C, D, E, w\w[75], $431D67C49C100D4C )
P( E, F, G, H, A, B, C, D, w\w[76], $4CC5D4BECB3E42B6 )
P( D, E, F, G, H, A, B, C, w\w[77], $597F299CFC657E2A )
P( C, D, E, F, G, H, A, B, w\w[78], $5FCB6FAB3AD6FAEC )
P( B, C, D, E, F, G, H, A, w\w[79], $6C44198C4A475817 )
*ctx\state[0] + A
*ctx\state[1] + B
*ctx\state[2] + C
*ctx\state[3] + D
*ctx\state[4] + E
*ctx\state[5] + F
*ctx\state[6] + G
*ctx\state[7] + H
EndProcedure
Procedure shaQuad_update( *ctx.sha512_context, *input, length, jobsize.q, *callback )
Static totalprocessed.d=0
Define.l left, fill
If length=0 : ProcedureReturn : EndIf
left = *ctx\total & 127
fill = 128-left
*ctx\total + length
If left And (length >= fill)
CopyMemory( *input, @*ctx\buffer[0]+left, fill )
sha512_process( *ctx, @*ctx\buffer[0] )
length - fill
*input + fill
left = 0
EndIf
While length >= 128
sha512_process( *ctx, *input )
length - 128
*input + 128
totalprocessed+128
If *callback
progress = Int(totalprocessed/jobsize*100)
CallFunctionFast(*callback, progress )
EndIf
Wend
If length
CopyMemory( *input, @*ctx\buffer[0]+left, length )
EndIf
EndProcedure
Procedure shaQuad_finish( *ctx.sha512_context, *digest.UINT64_BUFFER, jobsize.q, full, *callback )
Define.l last, padn
msglen.msglen
sha512_padding.UINT8_BUFFER
sha512_padding\b[0]=$80
msglen\highpart = ChangeEndian64(jobsize<<3)
last = *ctx\total & 127
If last<112
padn = 112-last
Else
padn = 240-last
EndIf
shaQuad_update( *ctx, @sha512_padding, padn, padn, *callback )
shaQuad_update( *ctx, @msglen, 16, 16, *callback )
For i=0 To 5
*digest\w[i] = ChangeEndian64(*ctx\state[i])
Next
If full
For i=6 To 7
*digest\w[i] = ChangeEndian64(*ctx\state[i])
Next
EndIf
EndProcedure
;================================================================
; EXPORTED FUNCTIONS
;================================================================
Procedure.s shaQuadFingerprint(*datapointer, Length, full, *callback=0) ; Data address, data size, [ ,<procaddress> ]
*ctx.sha512_context = AllocateMemory(SizeOf(sha512_context))
*sha512sum = AllocateMemory(64)
Protected digest.s
If full
sha512_starts( *ctx )
Else
sha384_starts( *ctx )
EndIf
shaQuad_update( *ctx, *datapointer, Length, Length, *callback )
shaQuad_finish( *ctx, *sha512sum, Length, full, *callback )
*output = AllocateMemory(129)
If full
iterations = 63
Else
iterations = 47
EndIf
For j = 0 To iterations
PokeS( *output+j*2, LCase( RSet( Hex( PeekA(*sha512sum+j) ), 2, "0" ) ) )
Next
digest = PeekS(*output)
FreeMemory(*output)
FreeMemory(*sha512sum)
FreeMemory(*ctx)
ProcedureReturn digest
EndProcedure
ProcedureDLL.s sha512Fingerprint(*datapointer, Length, *callback=0) ; Data address, data size, [ ,<procaddress> ]
ProcedureReturn shaQuadFingerprint(*datapointer, Length, 1, *callback)
EndProcedure
ProcedureDLL.s sha384Fingerprint(*datapointer, Length, *callback=0) ; Data address, data size, [ ,<procaddress> ]
ProcedureReturn shaQuadFingerprint(*datapointer, Length, 0, *callback)
EndProcedure
Procedure.s shaQuadFileFingerprint(filename.s, full, *callback=0) ; filename$, [ ,<procaddress> ]
Protected digest.s
Protected *datapointer, *output, *ctx.sha512_context
Protected bytesread, fresult
Protected jobsize.q
*ctx = AllocateMemory(SizeOf(sha512_context))
*sha512sum = AllocateMemory(64)
*output = AllocateMemory(129)
*datapointer = AllocateMemory(4096)
fresult = OpenFile(#PB_Any, filename)
If fresult
jobsize = Lof(fresult)
If full
sha512_starts( *ctx )
Else
sha384_starts( *ctx )
EndIf
While Not Eof(fresult)
bytesread = ReadData(fresult, *datapointer, 4096)
shaQuad_update( *ctx, *datapointer, bytesread, jobsize, *callback )
Wend
shaQuad_finish( *ctx, *sha512sum, jobsize, full, *callback )
If full
iterations = 63
Else
iterations = 47
EndIf
For j = 0 To iterations
PokeS( *output+j*2, LCase( RSet( Hex( PeekA(*sha512sum+j) ), 2, "0" ) ) )
Next
digest = PeekS(*output)
CloseFile(fresult)
EndIf
FreeMemory(*datapointer)
FreeMemory(*output)
FreeMemory(*sha512sum)
FreeMemory(*ctx)
ProcedureReturn digest
EndProcedure
ProcedureDLL.s sha512FileFingerprint(filename.s, *callback=0) ; Data address, data size, [ ,<procaddress> ]
ProcedureReturn shaQuadFileFingerprint(filename.s, 1, *callback)
EndProcedure
ProcedureDLL.s sha384FileFingerprint(filename.s, *callback=0) ; Data address, data size, [ ,<procaddress> ]
ProcedureReturn shaQuadFileFingerprint(filename.s, 0, *callback)
EndProcedure
Last edited by wilbert on Wed Aug 17, 2011 7:00 am, edited 1 time in total.
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
Thanks wilbert, seems something needs fixed:


BERESHEIT
Re: SHA512/384 Fingerprint and FileFingerprint
It didn't complain on OS X.
Apparently there are small differences.
Does it help specifying the operand size like this or not at all ?
Apparently there are small differences.
Does it help specifying the operand size like this or not at all ?
Code: Select all
Macro SIGMA01() ; By wilbert
!movd xmm0, ebx
!mov eax,dword [v_SigmaTemp02]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp02 + 4]
; rotate right 1
!shrd eax, edx, 1
; rotate right 8
!shrd ebx, edx, 8
; shift right 7
!shrd ecx, edx, 7
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp02], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 1
!shrd ebx, eax, 8
!shr ecx, 7
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp02 + 4], edx
!mov eax,dword [v_SigmaTemp13]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp13 + 4]
; rotate right 19
!shrd eax, edx, 19
; rotate right 61 = left 3
!shld ebx, edx, 3
; shift right 6
!shrd ecx, edx, 6
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp13], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 19
!shld ebx, eax, 3
!shr ecx, 6
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp13 + 4], edx
!movd ebx, xmm0
EndMacro
Global.q SigmaTemp02, SigmaTemp13
Macro SIGMA23() ; By wilbert
!movd xmm0, ebx
!mov eax,dword [v_SigmaTemp02]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp02 + 4]
; rotate right 28
!shrd eax, edx, 28
; rotate right 34 = left 30
!shld ebx, edx, 30
; rotate right 39 = left 25
!shld ecx, edx, 25
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp02], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 28
!shld ebx, eax, 30
!shld ecx, eax, 25
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp02 + 4], edx
!mov eax,dword [v_SigmaTemp13]
!mov ebx, eax
!mov ecx, eax
!movd xmm1, eax
!mov edx,dword [v_SigmaTemp13 + 4]
; rotate right 14
!shrd eax, edx, 14
; rotate right 18
!shrd ebx, edx, 18
; rotate right 41 = left 23
!shld ecx, edx, 23
!xor eax, ebx
!xor eax, ecx
!mov dword [v_SigmaTemp13], eax
!movd eax, xmm1
!mov ebx, edx
!mov ecx, edx
!shrd edx, eax, 14
!shrd ebx, eax, 18
!shld ecx, eax, 23
!xor edx, ebx
!xor edx, ecx
!mov dword [v_SigmaTemp13 + 4], edx
!movd ebx, xmm0
EndMacro
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
Thanks, it works fine now. Previous version took 17.2 seconds to process 50mb, this one takes 15.2 seconds. Not a huge improvement, but a significant improvement to be sure. Thanks again, this will be implemented (until you think up something better
)

BERESHEIT
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
I did gain some speed by writing inlined asm for the SHA256 (32bit wordsize) F0 and F1 macros as well. But 64bit has me stumped. Can you do those? I appreciate your help, some of this stuff I just can't manage on my own just yet. Posted code is updated to your latest speed gain.
BERESHEIT
Re: SHA512/384 Fingerprint and FileFingerprint
I can't check such macro's with local variables like that on OS X because of the PB bug I mentioned earlier but the approach is not very different from that for SHA256.
Logic functions like XOR and AND, don't carry information from one dword to another so you can simply process the low dword and high dword of a quad in two phases.
As for a quicker approach, what probably would be faster is to use registers for A, B, C, D, E, F, G, H instead of memory locations but that would require writing almost the entire sha512_process procedure using ASM.
Logic functions like XOR and AND, don't carry information from one dword to another so you can simply process the low dword and high dword of a quad in two phases.
As for a quicker approach, what probably would be faster is to use registers for A, B, C, D, E, F, G, H instead of memory locations but that would require writing almost the entire sha512_process procedure using ASM.
- netmaestro
- PureBasic Bullfrog
- Posts: 8451
- Joined: Wed Jul 06, 2005 5:42 am
- Location: Fort Nelson, BC, Canada
Re: SHA512/384 Fingerprint and FileFingerprint
Not really such a big undertaking since it's mostly assembler now. Suppose I start with the 32bit wordsize version, what registers would I use for that? I need 8 at once, the Purebasic doc says I can only have eax, edx and ecx.
BERESHEIT
Re: SHA512/384 Fingerprint and FileFingerprint
You have 8
xmm0 - xmm7
Since you don't seem to mind using SSE, you can use them.
movd xmm0, eax
movd eax, xmm0
That's how you can store and retrieve the normal registers like eax, edx etc.
You can use the normal registers to work with like you do now and use the SSE registers for storage.

xmm0 - xmm7
Since you don't seem to mind using SSE, you can use them.
movd xmm0, eax
movd eax, xmm0
That's how you can store and retrieve the normal registers like eax, edx etc.
You can use the normal registers to work with like you do now and use the SSE registers for storage.
Re: SHA512/384 Fingerprint and FileFingerprint
On x64 you actualy have double the amount of registers. ^^