Fast MD5 file routine?

Everything else that doesn't fall into one of the other PB categories.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Fast MD5 file routine?

Post by PB »

I was sure there was a procedure somewhere here that did a blazingly fast MD5 hash on files,
which was MUCH faster than using PureBasic's own MD5FileFingerprint() command. A search
doesn't find it though? Did anyone copy it when it was here? I need to MD5 some large video
files and they take about 5 seconds with MD5FileFingerprint(), and I remember the other code
did those in about 1 second. I don't know why I didn't save the code at the time. :(
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
KJ67
Enthusiast
Enthusiast
Posts: 218
Joined: Fri Jun 26, 2009 3:51 pm
Location: Westernmost tip of Norway

Re: Fast MD5 file routine?

Post by KJ67 »

Coult it be this one you looked for?
p=236525#p236525
The best preparation for tomorrow is doing your best today.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Fast MD5 file routine?

Post by PB »

No, tried that one and it only saves a second or two on the PureBasic command.
This other code had a real big difference, it was amazing. Thanks anyway!
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
freak
PureBasic Team
PureBasic Team
Posts: 5962
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Re: Fast MD5 file routine?

Post by freak »

I think you are looking for FastFileMD5() from Rings's FastFile lib. I don't think there is a version around that works in recent PB versions.
quidquid Latine dictum sit altum videtur
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Fast MD5 file routine?

Post by PB »

Ah, if it were a lib, then that's why I no longer have it.

@Rings: Feel like sharing the source? ;)
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8453
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Fast MD5 file routine?

Post by netmaestro »

That was written in 2003 and Fred has made some serious overhauls of almost everything since then. It's possible that the native command could compete now. Using 4.50 final MD5FileFingerprint on a 130 megabyte file (no debugger) is taking an average 1.5 seconds here. That seems pretty decent for a file that size, especially since it has to come off the HD. @Rings - can you still beat it today?
BERESHEIT
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8453
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Fast MD5 file routine?

Post by netmaestro »

Ok, so much for that idea. I got 3.81 from the museum, that was the current version at the time, and tested against the same file. Got the same result, 1.5 seconds for the 130mb file. So one could conclude that the native function hasn't gotten any faster since 2003.
BERESHEIT
gnozal
PureBasic Expert
PureBasic Expert
Posts: 4229
Joined: Sat Apr 26, 2003 8:27 am
Location: Strasbourg / France
Contact:

Re: Fast MD5 file routine?

Post by gnozal »

Iirc, FastFile used file mapping, so this code should be equivalent :

Code: Select all

Procedure.s FastMD5(Filename.s)
  Protected FastMD5.s = "", FileHandle, MapFileHandle, ViewFileHandle, FileLength.q
  FileLength = FileSize(Filename)
  If FileLength > 0
    FileHandle = CreateFile_(Filename, #GENERIC_READ, #FILE_SHARE_READ, 0, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL, 0)
    If FileHandle
      MapFileHandle = CreateFileMapping_(FileHandle, 0, #PAGE_READONLY, 0, 0, 0)
      If MapFileHandle
        ViewFileHandle = MapViewOfFile_(MapFileHandle, #FILE_MAP_READ, 0, 0, 0)
        If ViewFileHandle
          FastMD5 = MD5Fingerprint(ViewFileHandle, FileLength)
          UnmapViewOfFile_(ViewFileHandle)
        EndIf
        CloseHandle_(MapFileHandle)
      EndIf
      CloseHandle_(FileHandle)
    EndIf
  EndIf
  ProcedureReturn FastMD5.s
EndProcedure
Last edited by gnozal on Fri Jul 16, 2010 10:47 am, edited 1 time in total.
For free libraries and tools, visit my web site (also home of jaPBe V3 and PureFORM).
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Fast MD5 file routine?

Post by PB »

Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.

(Oh, and in the code you gave, FileLength should be a quad, for big files).
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
gnozal
PureBasic Expert
PureBasic Expert
Posts: 4229
Joined: Sat Apr 26, 2003 8:27 am
Location: Strasbourg / France
Contact:

Re: Fast MD5 file routine?

Post by gnozal »

PB wrote:Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.
Did it use some C library to perform the MD5 calculation ?
I did found some MD5 codes in the forum but they all seem slower than the genuine MD5Fingerprint() PB function.
PB wrote:(Oh, and in the code you gave, FileLength should be a quad, for big files).
Code edited, thanks.
For free libraries and tools, visit my web site (also home of jaPBe V3 and PureFORM).
freak
PureBasic Team
PureBasic Team
Posts: 5962
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Re: Fast MD5 file routine?

Post by freak »

PB wrote:Not bad gnozal, that gives me about 200ms faster than PureBasic's version.
But that's still not an amazing difference. I wonder if my memory is wrong?
I'm sure I used something that was blisteringly fast, because I distinctly
remember wondering why PureBasic didn't use that code instead. Dunno.
How are you testing that? Keep in mind that file system access is cached by the OS, so after you run the test once, further runs may be faster just because the data is already available. If you have lots of free ram, the entire file could have been cached.

Anyway, i don't think you can get any faster than a file mapping.
PB wrote:(Oh, and in the code you gave, FileLength should be a quad, for big files).
That won't make a difference. MapViewOfFile_() can only map chunks that fit into your program's address space. So you are limited to something under 2GB anyway in a 32bit program. If you want to work with larger files, you have to map them in several parts (or create a 64bit application).
quidquid Latine dictum sit altum videtur
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Fast MD5 file routine?

Post by PB »

> How are you testing that?

I was doing like this:

Code: Select all

; Gnozal's FastMD5() procedure would go here.

DisableDebugger

file$="D:\TV Show (1 hr 24 min).mp4" ; About 443 MB.

s=GetTickCount_() : fast$=FastMD5(file$) : fast=GetTickCount_()-s
s=GetTickCount_() : slow$=MD5FileFingerprint(file$) : slow=GetTickCount_()-s

EnableDebugger

Debug "fast"
Debug fast$
Debug fast ; 1750 ms
Debug ""
Debug "slow"
Debug slow$
Debug slow ; 1954 ms
> [Quad for FileSize] won't make a difference

But FileSize returns a negative value for large files (I just tried it). Doesn't the
MD5Fingerprint(ViewFileHandle, FileLength) bit need a positive for FileLength?
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Fast MD5 file routine?

Post by PB »

@Gnozal: Just tried your code on a 8.5 GB movie and it failed totally.
Here's the Debug Output using my code snippet from my post above:

Code: Select all

fast

0

slow
2ae6adb757539c197f4637335b27d114
108578
So it looks like I'll have to stick with PureBasic's version, for reliability.
Maybe that's why I didn't end up keeping the other fast snippet, too.
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
Rings
Moderator
Moderator
Posts: 1435
Joined: Sat Apr 26, 2003 1:11 am

Re: Fast MD5 file routine?

Post by Rings »

freak is toaly right, you cannot use fastfile (Filemapping) higher
than 2gb in a win32 environment.
Also Gnozal is right with his source, that is how i did that.

i would also stick with the Purebasic version.....
SPAMINATOR NR.1
freak
PureBasic Team
PureBasic Team
Posts: 5962
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Re: Fast MD5 file routine?

Post by freak »

Not much tested, but this should work with any file size: (ViewSize gives the amount of memory you want to map at once)

Code: Select all

Procedure.s FastMD5(Filename$, ViewSize = (1024*1024*1024))
  Protected Result$ = ""
  Protected File, Fingerprint, Mapping, Failure = #False
  Protected Size.q, Offset.q = 0
  Protected *View
  Protected Info.SYSTEM_INFO
  
  File = ReadFile(#PB_Any, Filename$)
  If File
    Fingerprint = ExamineMD5Fingerprint(#PB_Any)
    If Fingerprint

      ; cannot create a 0-size view (and don't need to anyway)
      Size = Lof(File)
      If Size <> 0
      
        Mapping = CreateFileMapping_(FileID(File), #Null, #PAGE_READONLY|#SEC_COMMIT, 0, 0, #Null)
        If Mapping
          
          If ViewSize > Size
            ; will map the entire file at once
            ViewSize = Size
          Else
            ; offsets must fit the allocation granularity, so round the ViewSize to it 
            GetSystemInfo_(@Info)
            ViewSize - (ViewSize % Info\dwAllocationGranularity)
          EndIf

          While Size > 0
            *View = MapViewOfFile_(Mapping, #FILE_MAP_READ, PeekL(@Offset + 4), PeekL(@Offset), ViewSize)
            If *View
              NextFingerprint(Fingerprint, *View, ViewSize)
              UnmapViewOfFile_(*View)
              Size - ViewSize
              Offset + ViewSize
            Else
              Failure = #True
              Break
            EndIf
          Wend
        
          CloseHandle_(Mapping)
        Else
          Failure = #True
        EndIf
      
      EndIf
            
      Result$ = FinishFingerprint(Fingerprint) ; call even on failure to free resources
    EndIf  
    CloseFile(File)
  EndIf
  
  If Failure
    ProcedureReturn ""
  Else
    ProcedureReturn Result$
  EndIf
EndProcedure
quidquid Latine dictum sit altum videtur
Post Reply