Page 1 of 1
Fast MD5 file routine?
Posted: Thu Jul 15, 2010 2:55 pm
by PB
I was sure there was a procedure somewhere here that did a blazingly fast MD5 hash on files,
which was MUCH faster than using PureBasic's own MD5FileFingerprint() command. A search
doesn't find it though? Did anyone copy it when it was here? I need to MD5 some large video
files and they take about 5 seconds with MD5FileFingerprint(), and I remember the other code
did those in about 1 second. I don't know why I didn't save the code at the time.

Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 3:09 pm
by KJ67
Coult it be this one you looked for?
p=236525#p236525
Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 3:23 pm
by PB
No, tried that one and it only saves a second or two on the PureBasic command.
This other code had a real big difference, it was amazing. Thanks anyway!
Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 3:36 pm
by freak
I think you are looking for FastFileMD5() from Rings's FastFile lib. I don't think there is a version around that works in recent PB versions.
Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 4:22 pm
by PB
Ah, if it were a lib, then that's why I no longer have it.
@Rings: Feel like sharing the source?

Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 4:29 pm
by netmaestro
That was written in 2003 and Fred has made some serious overhauls of almost everything since then. It's possible that the native command could compete now. Using 4.50 final MD5FileFingerprint on a 130 megabyte file (no debugger) is taking an average 1.5 seconds here. That seems pretty decent for a file that size, especially since it has to come off the HD. @Rings - can you still beat it today?
Re: Fast MD5 file routine?
Posted: Thu Jul 15, 2010 6:10 pm
by netmaestro
Ok, so much for that idea. I got 3.81 from the museum, that was the current version at the time, and tested against the same file. Got the same result, 1.5 seconds for the 130mb file. So one could conclude that the native function hasn't gotten any faster since 2003.
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 8:37 am
by gnozal
Iirc, FastFile used file mapping, so this code should be equivalent :
Code: Select all
Procedure.s FastMD5(Filename.s)
Protected FastMD5.s = "", FileHandle, MapFileHandle, ViewFileHandle, FileLength.q
FileLength = FileSize(Filename)
If FileLength > 0
FileHandle = CreateFile_(Filename, #GENERIC_READ, #FILE_SHARE_READ, 0, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL, 0)
If FileHandle
MapFileHandle = CreateFileMapping_(FileHandle, 0, #PAGE_READONLY, 0, 0, 0)
If MapFileHandle
ViewFileHandle = MapViewOfFile_(MapFileHandle, #FILE_MAP_READ, 0, 0, 0)
If ViewFileHandle
FastMD5 = MD5Fingerprint(ViewFileHandle, FileLength)
UnmapViewOfFile_(ViewFileHandle)
EndIf
CloseHandle_(MapFileHandle)
EndIf
CloseHandle_(FileHandle)
EndIf
EndIf
ProcedureReturn FastMD5.s
EndProcedure
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 10:16 am
by PB
Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.
(Oh, and in the code you gave, FileLength should be a quad, for big files).
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 10:50 am
by gnozal
PB wrote:Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.
Did it use some C library to perform the MD5 calculation ?
I did found some MD5 codes in the forum but they all seem slower than the genuine MD5Fingerprint() PB function.
PB wrote:(Oh, and in the code you gave, FileLength should be a quad, for big files).
Code edited, thanks.
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 1:41 pm
by freak
PB wrote:Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.
How are you testing that? Keep in mind that file system access is cached by the OS, so after you run the test once, further runs may be faster just because the data is already available. If you have lots of free ram, the entire file could have been cached.
Anyway, i don't think you can get any faster than a file mapping.
PB wrote:(Oh, and in the code you gave, FileLength should be a quad, for big files).
That won't make a difference. MapViewOfFile_() can only map chunks that fit into your program's address space. So you are limited to something under 2GB anyway in a 32bit program. If you want to work with larger files, you have to map them in several parts (or create a 64bit application).
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 2:21 pm
by PB
> How are you testing that?
I was doing like this:
Code: Select all
; Gnozal's FastMD5() procedure would go here.
DisableDebugger
file$="D:\TV Show (1 hr 24 min).mp4" ; About 443 MB.
s=GetTickCount_() : fast$=FastMD5(file$) : fast=GetTickCount_()-s
s=GetTickCount_() : slow$=MD5FileFingerprint(file$) : slow=GetTickCount_()-s
EnableDebugger
Debug "fast"
Debug fast$
Debug fast ; 1750 ms
Debug ""
Debug "slow"
Debug slow$
Debug slow ; 1954 ms
> [Quad for FileSize] won't make a difference
But FileSize returns a negative value for large files (I just tried it). Doesn't the
MD5Fingerprint(ViewFileHandle, FileLength) bit need a positive for FileLength?
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 2:27 pm
by PB
@Gnozal: Just tried your code on a 8.5 GB movie and it failed totally.
Here's the Debug Output using my code snippet from my post above:
Code: Select all
fast
0
slow
2ae6adb757539c197f4637335b27d114
108578
So it looks like I'll have to stick with PureBasic's version, for reliability.
Maybe that's why I didn't end up keeping the other fast snippet, too.
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 2:36 pm
by Rings
freak is toaly right, you cannot use fastfile (Filemapping) higher
than 2gb in a win32 environment.
Also Gnozal is right with his source, that is how i did that.
i would also stick with the Purebasic version.....
Re: Fast MD5 file routine?
Posted: Fri Jul 16, 2010 3:02 pm
by freak
Not much tested, but this should work with any file size: (ViewSize gives the amount of memory you want to map at once)
Code: Select all
Procedure.s FastMD5(Filename$, ViewSize = (1024*1024*1024))
Protected Result$ = ""
Protected File, Fingerprint, Mapping, Failure = #False
Protected Size.q, Offset.q = 0
Protected *View
Protected Info.SYSTEM_INFO
File = ReadFile(#PB_Any, Filename$)
If File
Fingerprint = ExamineMD5Fingerprint(#PB_Any)
If Fingerprint
; cannot create a 0-size view (and don't need to anyway)
Size = Lof(File)
If Size <> 0
Mapping = CreateFileMapping_(FileID(File), #Null, #PAGE_READONLY|#SEC_COMMIT, 0, 0, #Null)
If Mapping
If ViewSize > Size
; will map the entire file at once
ViewSize = Size
Else
; offsets must fit the allocation granularity, so round the ViewSize to it
GetSystemInfo_(@Info)
ViewSize - (ViewSize % Info\dwAllocationGranularity)
EndIf
While Size > 0
*View = MapViewOfFile_(Mapping, #FILE_MAP_READ, PeekL(@Offset + 4), PeekL(@Offset), ViewSize)
If *View
NextFingerprint(Fingerprint, *View, ViewSize)
UnmapViewOfFile_(*View)
Size - ViewSize
Offset + ViewSize
Else
Failure = #True
Break
EndIf
Wend
CloseHandle_(Mapping)
Else
Failure = #True
EndIf
EndIf
Result$ = FinishFingerprint(Fingerprint) ; call even on failure to free resources
EndIf
CloseFile(File)
EndIf
If Failure
ProcedureReturn ""
Else
ProcedureReturn Result$
EndIf
EndProcedure