ImageHash module

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

ImageHash module

Post by wilbert »

A module to get a 64 bit hash from an image file.
See also http://www.purebasic.fr/english/viewtop ... 35#p436035
The bigger the Hamming distance between two hashes, the more likely it's a different image.
I tried to implement pHash ( http://www.phash.org ) but am not entirely sure if it is correct.

Hash1.q = ImageHash::pHash(Image1)
Hash2.q = ImageHash::pHash(Image2)
Debug ImageHash::HammingDistance(hash1, hash2)


Code: Select all

DeclareModule ImageHash; v0.1.2
  
  Declare.i HammingDistance(hash1.q, hash2.q)
  Declare.q pHash(imagefile.s, useMedian = #True)
  Declare.q dHash(imagefile.s)
  Declare.q dHash3(imagefile.s, edge = 0)
  
EndDeclareModule

Module ImageHash
  
  UseJPEGImageDecoder()
  UsePNGImageDecoder()
  
  ; *** Hamming distance ***
  
  Procedure.i HammingDistance(hash1.q, hash2.q)
    !mov ecx, [p.v_hash1]
    !xor ecx, [p.v_hash2]
    !mov edx, ecx
    !shr edx, 1
    !and edx, 0x55555555
    !sub ecx, edx
    !mov edx, ecx
    !shr edx, 2
    !and edx, 0x33333333
    !and ecx, 0x33333333
    !add ecx, edx
    !mov edx, ecx
    !shr edx, 4
    !add ecx, edx
    !and ecx, 0x0f0f0f0f
    !imul eax, ecx, 0x01010101
    !shr eax, 24
    !mov ecx, [p.v_hash1 + 4]
    !xor ecx, [p.v_hash2 + 4]
    !mov edx, ecx
    !shr edx, 1
    !and edx, 0x55555555
    !sub ecx, edx
    !mov edx, ecx
    !shr edx, 2
    !and edx, 0x33333333
    !and ecx, 0x33333333
    !add ecx, edx
    !mov edx, ecx
    !shr edx, 4
    !add ecx, edx
    !and ecx, 0x0f0f0f0f
    !imul ecx, 0x01010101
    !shr ecx, 24
    !add eax, ecx
    ProcedureReturn
  EndProcedure
  
  ; *** pHash ***
  
  Structure y32 : y.d[32] : EndStructure
  Structure m32 : x.y32[32] : EndStructure
  
  Global mutex = CreateMutex()
  Global *m0.m32 = AllocateMemory(24832, #PB_Memory_NoClear) & -256 + 256
  Global *m1.m32 = *m0 + 8192, *m2.m32 = *m0 + 16384
  
  Procedure GenerateDCTmatrix()
    Protected.i x, y, c.d = 1/Sqr(32)
    For x = 0 To 31 : *m0\x[x]\y[0] = c : Next
    For y = 1 To 31 : For x = 0 To 31
        *m0\x[x]\y[y] = 0.25*Cos(#PI*0.015625*y*(2*x+1))
    Next : Next
  EndProcedure
  GenerateDCTMatrix()
  
  Procedure.q pHash(imagefile.s, useMedian = #True)
    Protected.i img, c, x, y, i, sum.d, m.d, one.q, hash.q
    img = LoadImage(#PB_Any, imagefile)
    If img And ResizeImage(img, 32, 32)
      LockMutex(mutex)
      
      ; fill image matrix
      StartDrawing(ImageOutput(img))
      For y = 0 To 31
        For x = 0 To 31
          c = Point(x, y)
          *m1\x[x]\y[y] = 0.299*Red(c) + 0.587*Green(c) + 0.114*Blue(c)
        Next
      Next
      StopDrawing()
      FreeImage(img)
      
      ; multiply matrices and calculate median/mean value
      For y = 1 To 8
        For x = 0 To 31
          sum = 0
          For i = 0 To 31
            sum + *m0\x[i]\y[y] * *m1\x[x]\y[i]
          Next
          *m2\x[x]\y[y] = sum
        Next
      Next     
      
      m = 0
      For y = 1 To 8
        For x = 1 To 8
          sum = 0
          For i = 0 To 31
            sum + *m2\x[i]\y[y] * *m0\x[i]\y[x]
          Next
          *m1\x[x]\y[y] = sum
          m + sum
        Next
      Next
      If useMedian
        m = 0.5 * (*m1\x[8]\y[4] + *m1\x[1]\y[5])
      Else
        m * 0.015625
      EndIf
        
      ; build hash
      one = 1
      For y = 1 To 8
        For x = 1 To 8
          If *m1\x[x]\y[y] > m : hash | one : EndIf
          one << 1  
        Next
      Next
      
      UnlockMutex(mutex)
    EndIf
    ProcedureReturn hash
  EndProcedure
  
  ; *** dHash ***
  
  Procedure.q dHash(imagefile.s)
    Protected.i img, x, y, c, l0, l1, one.q, hash.q
    img = LoadImage(#PB_Any, imagefile)
    If img And ResizeImage(img, 9, 8)
      StartDrawing(ImageOutput(img))
      one = 1
      For y = 0 To 7
        c = Point(0, y)
        l0 = 0.299*Red(c) + 0.587*Green(c) + 0.114*Blue(c)
        For x = 1 To 8
          c = Point(x, y)
          l1 = 0.299*Red(c) + 0.587*Green(c) + 0.114*Blue(c)
          If l0 < l1 : hash | one : EndIf
          l0 = l1
          one << 1
        Next
      Next
      StopDrawing()
      FreeImage(img)
    EndIf
    ProcedureReturn hash
  EndProcedure
  
  ; *** dHash3 ***
  
  Global Dim sample_x(63)
  Global Dim sample_y(63)
  
  Procedure dHash3Init()
    Protected.i x, y, i
    For y = 0 To 7
      For x = 0 To 7
        i = (x & 4) >> 2 + (x & 2) << 1 + (x & 1) << 4 + (x!y & 4) >> 1 + (x!y & 2) << 2 + (x!y & 1) << 5
        sample_x(i) = x
        sample_y(i) = y
      Next
    Next
  EndProcedure
  dHash3Init()
  
  Procedure.q dHash3(imagefile.s, edge = 0)
    Protected.i img, b, c, r0, g0, b0, r1, g1, b1, hash.q
    
    img = LoadImage(#PB_Any, imagefile)
    If img And ResizeImage(img, 8 + edge << 1, 8 + edge << 1)
      StartDrawing(ImageOutput(img))
      g0 = Point(sample_x(31) + edge, sample_y(31) + edge) & $ff00
      b0 = Point(sample_x(47) + edge, sample_y(47) + edge) & $ff0000
      r0 = Point(sample_x(63) + edge, sample_y(63) + edge) & $ff
      For b = 0 To 63
        c = Point(sample_x(b) + edge, sample_y(b) + edge)
        If b < 32
          g1 = c & $ff00
          If g0 < g1 : hash | 1 << b : EndIf
          g0 = g1
        ElseIf b < 48
          b1 = c & $ff0000
          If b0 < b1 : hash | 1 << b : EndIf
          b0 = b1
        Else
          r1 = c & $ff
          If r0 < r1 : hash | 1 << b : EndIf
          r0 = r1
        EndIf
      Next
      StopDrawing()
      FreeImage(img)
    EndIf
    
    ProcedureReturn hash
  EndProcedure  
  
EndModule
Last edited by wilbert on Sun Jan 26, 2014 4:22 pm, edited 5 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

I'm happy to be your first "client" 8)
I hope, and ne sure not be alone :wink:

One thousand of thanks for your great job and sharing
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

I updated the code to v0.1.1.
The HammingDistance procedure has been updated with a faster version of it.
Kwaï chang caïne wrote:I'm happy to be your first "client" 8)
Looking forward to hearing if it works for you.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
VB6_to_PBx
Enthusiast
Enthusiast
Posts: 627
Joined: Mon May 09, 2011 9:36 am

Re: ImageHash module

Post by VB6_to_PBx »

could you give an Example how to use your HammingDistance procedure

or is it just simply

Hash1.q = ImageHash::pHash(Image1)
Hash2.q = ImageHash::pHash(Image2)
Debug ImageHash::HammingDistance(hash1, hash2)

Code: Select all

;        The bigger the Hamming distance between two hashes, the more likely it's a different image.
Hash1.q = ImageHash::pHash("C:\PureBASIC\_____New___Source_Codes\logo-ps.jpg")
Hash2.q = ImageHash::pHash("C:\PureBASIC\_____New___Source_Codes\PureBASIC.jpg")
Debug ImageHash::HammingDistance(hash1, hash2)
 
PureBasic .... making tiny electrons do what you want !

"With every mistake we must surely be learning" - George Harrison
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

You usually compare the Hamming distance against a threshold you find acceptable.
If for example you have an image in jpg format and a smaller thumbnail of it in png format, the Hamming distance procedure should return a low value.
If you have a database with hash values of your image collection, you could compare the hash of a single image against the database to find the closest matches. It can help you find duplicate images that are a little bit different because they are scaled down or the brightness has been altered a little.
Windows (x64)
Raspberry Pi OS (Arm64)
applePi
Addict
Addict
Posts: 1404
Joined: Sun Jun 25, 2006 7:28 pm

Re: ImageHash module

Post by applePi »

thanks wilbert, i have tried your code firstly after deleting the Module keywords ( since i have not reached this subject yet) and then calling the functions like this:
Hash1.q = pHash("cup0.jpg")
Hash2.q = pHash("chaos.jpg")
Debug HammingDistance(hash1, hash2)

when it works , i have encouraged to try your original code using :
Hash1.q = ImageHash::pHash("cup0.jpg")
Hash2.q = ImageHash::pHash("cup1.jpg")
Debug ImageHash::HammingDistance(hash1, hash2)

so much thanks, it is a treasure like your previous color counting
the pictures i have used is this
Image cup0
Image cup1
Image cup2
Image cup3
Image apple
Image chaos

and the results of comparing the pictures like this: ( note comparing a pict with a copy result in 0 )

cup0 cup0_copy ---- 0
cup0 cup1 --- 5
cup0 cup2 --- 23
cup0 cup3 --- 24
cup0 apple --- 28
cup0 chaos === 43
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

Wilbert wrote:Looking forward to hearing if it works for you.
Yes no problem, i currently do the test.
I have already coding the 12000 pictures, for the moment, if you agree i search with the first code, if the difference is just the speed, i prefer.
Else i'm forcing to recreate the file of the 12000 analyses, and this operation take when even a long time, because i add also other tests together with your code

One time that finish, i replace your old code with the new, for a better speed :wink:
Like promise i give to you news
Thanks a lot 8)
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

@applePi, thanks for testing :)

@KCC, the only difference between v0.1.0 and v0.1.1 should be the speed.
The updated HammingDistance procedure is about 5x faster. In real world situations however you probably won't notice it.
On my Core2Duo, the HammingDistance procedure itself can be run over 100 million times a second.
Querying a database is probably having more impact on how fast things are.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

Ok ...in fact my PC is like KCC ..he have only one heart :mrgreen:

I have beginning to try your splendid code, and apparently that works 80% well
This is an exemple where the result is "4" then the pictures is very differents :shock:
It's between "Disques 0006.jpg" and "Charlelie couture (Local rock).jpg"
After i have take a blue picture for see if your code continue to make error, and it's not the case, it return 27 so a big difference 8)
http://erdsjb.free.fr/PureStorage/Provi ... 220958.zip

I continue my tests :wink:
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

It looks like it could be solved by using the mean instead of the median value to compare against.
This would mean however that you have to recalculate all hashes.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

Never mind....it's that the tests... :wink:
I can, in first time, test on all the picture who not works, approximatively 20/30 %, and if that works i recalculate all the hashes :D
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

You can try if you like. I updated the code in the first post.
By default it still outputs the same hashes as before but there's an optional parameter now to the pHash procedure.
If you pass #False for useMedian, it will use the mean value instead. This will probably give you better results.
Be aware that the hashes from the two different options can't be compared with each other. The second argument to the pHash procedure has to be the same for both hashes you are going to compare.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

i have an idea, after, i can comparing pictures with the two methods, like this perhaps i have better result
Thanks, i try immediately 8)
ImageThe happiness is a road...
Not a destination
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: ImageHash module

Post by Kwai chang caine »

Waoooouhh !!!! That's very very better with #False parameter!!!! 8) 8)
Now perhaps 5/10% of error :D
If you have another idea for touch the perfection with the finger :lol:

This image have a low number but she are very different ...
http://erdsjb.free.fr/PureStorage/Provi ... 221553.zip
ImageThe happiness is a road...
Not a destination
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ImageHash module

Post by wilbert »

So far I have no other idea how to improve this one. I'll try some other things.
Keep in mind that a 100% score probably won't happen with a 64 bit hash. There's not very much room to store information.
The imgSeek application I mentioned in another thread, stores over 480 bytes of information per image as far as I can tell.
Windows (x64)
Raspberry Pi OS (Arm64)
Post Reply