Page 1 of 1
Faster MD5 File Hashes.
Posted: Wed May 01, 2013 3:05 am
by jassing
I almost posted this in tips/tricks -- but not sure if it's faster 'everywhere' -- here at the office, I tested files from 1 meg to 800 megs, and using the chunked approach was faster, not by a great margin, but faster -- and if doing md5's on lots of files; any little help counts.
I even tested with updating a progress bar in a window and it was still faster...
Could this be checked on linux/mac by some folks (and windows) to see if is universally faster? I tested winxp,2003,7 both 32 and 64 bit os's; ranging on 7 year old single core machines to brand new quad core system.
It may have been my test files, which were just random files I grabbed off the server...
Code: Select all
Global gnChunkSize=(1024 * 32) ; 32 seemed to be more universally usable; experiment at will.
; there is a 'sweet spot' -- it becomes slower if too small or too big of a buffer is used.
Procedure.s iMD5FileFingerprint( cFile.s )
Protected cHash.s, nBytes, hFile, *pDataChunk, hMD5
Shared gnChunkSize
If FileSize(cFile)>-1
hFile = ReadFile(#PB_Any, cFile, #PB_File_SharedRead)
If hfile
FileBuffersSize(hFile, gnChunkSize )
*pDataChunk = AllocateMemory( gnChunkSize )
If *pDataChunk
hMD5 = ExamineMD5Fingerprint(#PB_Any)
If hMD5
While Not Eof(hFile)
; You could update a progress bar...
nBytes = ReadData(hFile, *p, gnChunkSize)
NextFingerprint(hMD5, *p, nBytes)
Wend
cHash = FinishFingerprint(hMD5)
Else
Debug "Failed to init md5"
EndIf
FreeMemory(*pDataChunk)
Else
Debug "Failed to allocate "+Str(gnChunkSize)+" bytes of memory"
EndIf
CloseFile(hFile)
Else
Debug "Failed to openf ile"
EndIf
Else
Debug "File does Not exist"
EndIf
ProcedureReturn cHash
EndProcedure
Re: Faster MD5 File Hashes.
Posted: Wed May 01, 2013 2:02 pm
by infratec
Hi,
no time to test it yet,
but one point:
Your CloseFile(hfile) is at the wrong place.
Bernd
Re: Faster MD5 File Hashes.
Posted: Wed May 01, 2013 3:10 pm
by jassing
infratec wrote:Hi,
no time to test it yet,
but one point:
Your CloseFile(hfile) is at the wrong place.
Bernd
aye - it is indeed -- thanks -- I had re-ordered the things that could fail and missed that...
Re: Faster MD5 File Hashes.
Posted: Wed May 01, 2013 4:51 pm
by infratec
Hi,
just made a few tests:
You are right
With the following code
Code: Select all
#LargeFile$ = "t:\tmp\ISO-Images\UltimateBootCD\ubcd503.iso"
#gnChunkSize = 1024 * 32
Procedure.s iMD5FileFingerprint(cFile.s)
Protected cHash.s, nBytes, hFile, *pDataChunk, hMD5
hFile = ReadFile(#PB_Any, cFile, #PB_File_SharedRead)
If hfile
*pDataChunk = AllocateMemory(#gnChunkSize)
If *pDataChunk
hMD5 = ExamineMD5Fingerprint(#PB_Any)
If hMD5
While Not Eof(hFile)
; You could update a progress bar...
nBytes = ReadData(hFile, *pDataChunk, #gnChunkSize)
NextFingerprint(hMD5, *pDataChunk, nBytes)
Wend
cHash = FinishFingerprint(hMD5)
Else
Debug "Failed to init md5"
EndIf
FreeMemory(*pDataChunk)
Else
Debug "Failed to allocate "+Str(#gnChunkSize)+" bytes of memory"
EndIf
CloseFile(hFile)
Else
Debug "Failed to open file"
EndIf
ProcedureReturn cHash
EndProcedure
Define.i StartTime, EndTime, i
DisableDebugger
StartTime = ElapsedMilliseconds()
For i = 0 To 10
iMD5FileFingerprint(#LargeFile$)
Next i
EndTime = ElapsedMilliseconds()
MessageRequester("Time", Str(EndTime - StartTime))
StartTime = ElapsedMilliseconds()
For i = 0 To 10
MD5FileFingerprint(#LargeFile$)
Next i
EndTime = ElapsedMilliseconds()
MessageRequester("Time", Str(EndTime - StartTime))
and a 300MB file your 'home made' solution needs <27s
and the PB one needs >31s
But in my case it is independent of the FileBufferSize() command, so I removed it.
It has also no effect if I use FileBufferSize(#PB_Default, ...) infront of the PB version.
Also larger sizes for the buffer has no effect or a negative one.
Maybe this depends on the used harddisk (internal cache)
Maybe Fred should use the 'handmade' version
Bernd
P.S.: Your current listing does not work (*p should be *pDataChunk)
Re: Faster MD5 File Hashes.
Posted: Wed May 01, 2013 7:50 pm
by freak
The PB version does the same thing. Buffer size is 1mb.
Re: Faster MD5 File Hashes.
Posted: Wed May 01, 2013 8:04 pm
by infratec
If I use
Code: Select all
FileBufferSize(#PB_Default, 1024 * 1024)
in front of all, the 'home made' version needs >37s and the PB version needs still > 31s.
With a small FileBufferSize() or the default one, the 'home made' version is definately faster.
Maybe it is not useful to use a large file buffer.
Bernd
Re: Faster MD5 File Hashes.
Posted: Thu May 02, 2013 10:02 am
by Fred
When using a larger buffersize, the read data goes in the buffer first and then its copied to the supplied buffer. When the buffer is small and the ReadData() is larger, the read data is put directly in the supplied buffer, that's why it's faster. Buffered read are useful when doing a lot of small read, like with ReadByte() or ReadString()
Re: Faster MD5 File Hashes.
Posted: Tue Dec 08, 2015 6:54 pm
by Karl-Uwe Frank
Just changed the code below and now it take only 0,4600000083 seconds for the 100MB file and 3,3949999809 seconds seconds for the 680MB file, which is faster than the md5 program of the OS. Is that possible?
Code: Select all
;-----------------------------------------------------------
;
; Calculate the MD5 digest of a file
;
;-----------------------------------------------------------
DisableDebugger
OpenConsole()
If (CountProgramParameters() < 1) ; Check If a Parameter is passed through
PrintN("Please pass a file name.")
End 1
EndIf
UseMD5Fingerprint()
#BufferSize = 16384
*Buffer = AllocateMemory(#BufferSize)
Define.s FileName = ProgramParameter(0)
Define.w readBufferSize = #BufferSize
Define.q readByteRemain = 0
Define readByte.w = 0
Define MD5digest.s{32}
Define.d t1 = ElapsedMilliseconds()
If (ReadFile(0, FileName))
If (*Buffer) And (StartFingerprint(0, #PB_Cipher_MD5))
readByteRemain = FileSize(FileName)
While readByteRemain > 0
If (readBufferSize > readByteRemain) : readBufferSize = readByteRemain : EndIf
readByte = ReadData(0, *Buffer, readBufferSize)
AddFingerprintBuffer(0, *Buffer, readByte)
readByteRemain = readByteRemain - readByte
Wend
CloseFile(0)
MD5digest = FinishFingerprint(0)
FreeMemory(*Buffer)
EndIf
Else
PrintN("File: "+ FileName +" not found")
EndIf
Define.d t2 = ElapsedMilliseconds()
PrintN("Elapsed: "+ StrF((t2-t1)/1000) + " seconds")
PrintN(MD5digest)
CloseConsole()
End 0
; IDE Options = PureBasic 5.40 LTS (MacOS X - x64)
; ExecutableFormat = Console
; CursorPosition = 1
; EnableAsm
; EnableXP
; Executable = md5sum
; DisableDebugger
; Compiler = PureBasic 5.40 LTS (MacOS X - x64)
Cheers,
Karl-Uwe
Re: Faster MD5 File Hashes.
Posted: Wed Dec 09, 2015 1:46 am
by Karl-Uwe Frank
A bit of a streamlining of the inner loop by eliminating unnecessary calculations of the reamining byte, etc., which will give some fraction of a second more speed depending on the file size.
Code: Select all
If (ReadFile(0, FileName))
If (*Buffer) And (StartFingerprint(0, #PB_Cipher_MD5))
readByte = ReadData(0, *Buffer, #BufferSize)
While readByte > 0
AddFingerprintBuffer(0, *Buffer, readByte)
readByte = ReadData(0, *Buffer, #BufferSize)
Wend
CloseFile(0)
MD5digest = FinishFingerprint(0)
FreeMemory(*Buffer)
EndIf
Else
PrintN("File: "+ FileName +" not found")
EndIf
Cheers,
Karl-Uwe