Page 1 of 4

Search similitary of a picture in folder of thousand others

Posted: Thu Dec 19, 2013 10:40 pm
by Kwai chang caine
Hello at all :D

I have several thousand of pictures in a folder
And i want to search a picture in this this folder...

I have found several software who find the duplicate picture in the same folder, but nothing for search a picture in thousand others :(
I know also the GOOGLE picture search, but it's for search a picture in the web

When i have search on the web, i have found someone about "PQ code" but sorry it's a french web page :oops:
Apparently, it's a super algorhythm who convert a picture in only 20 bytes :shock:
Like that, you can store this in a database and search it in some milliseconds :shock: 8)
Probably the algorhythm used for google :?:

Someone know if it's possible to use in PB ???

French links
http://www.futura-sciences.com/magazine ... ons-35914/
http://emergences.inria.fr/2011/inria_n ... 6/PQ-CODES
http://www.futura-sciences.com/magazine ... ons-35914/

In english i believe it's the same thing, but not sure...i not really understand, the english spoken :oops:
http://videolectures.net/cvpr2010_jegou_ald/
http://lear.inrialpes.fr/pubs/2010/JDSP10/

Else, if someone know software (mainly freeware) to do this job ???

Thanks at you all and good day

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 1:01 am
by netmaestro
You could iterate through the folder(s) and do a crc32 file fingerprint on each picture. This is very fast and if you start with the crc32 of the picture you're looking for and compare you'll find it quickly if it's in the folder. Even if it's called something else, you'll find it because you aren't looking at filenames.

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 1:18 am
by Kwai chang caine
Yeees !!! it's a good idea !!! :D
This is already a good begining, i have not thinking at that 8)

But I was not precise enough in my question.
In fact it's for found a record.
I have a big collection of record, and sometime i buy a duplicate of one of record i already have :?
Since a long time i have numerous duplicate record, several hundred.
And my goal is found the title of the duplicate record, without writing it with my fingers :mrgreen:

I'm sure each record i must find is in the folder...of course..it's a duplicate :mrgreen:
But the problem, is it's the same picture, but not 100%, because it's not exactely the same record, it's a clone, sometime there is a label furthermore, or a pencil strokes, etc...
Furthermore sometime the brightness is not really the same, in this example the brightness of left is better, i'm sure this change also the CRC :cry:
See yourself, in this example the duplicate is a little bit destroyed (It's for that i have buy the same record) :

ImageImage

For the moment, only GOOGLE can find the record....but it's GOOGLE :mrgreen:

I'm affraid a CRC32 see this difference, even if she little :cry:
So perhaps the CRC will likeness if the difference is not big ????
I must try for see what is the result ....

Thanks a lot NETMAESTRO for your precious help 8)

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 9:19 am
by Kwai chang caine
A friend give to me a good idea :D
First i can convert the picture in grey scale, like this no need to compare colors....it's more simple. 8)

Sure, there remains the problem of brightness
But also another problem, i have thinking in my little head, that i have forgotten :oops:
The axis....because the record1 is not forcing to be in the same axis than the record2....even if it's some millimeters....
Obviously the CRC or a pixel compare not return the same result :(

For the resolution normally no problem..so i believe :oops:
Because the record 1 is scanned with the same resolution than all the record in the folder (100 ppp)

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 9:27 am
by netmaestro
When I saw your original question I thought you were looking for an exact copy of the picture. Looking for similar pictures is something else entirely. You would have to come up with some kind of routine that would look at maybe 100 pixels taken from different parts of the image and see if comparisons between them are close to comparisons done on the search image. For example, your grayscale idea is excellent as a start, then look at how the px at 5,5 compares to that at 50,50. Store the result of that comparison along with the results from several other comparisons and when you do the same comparisons on a searched image, if they all fall pretty close to those from the source image, you have a possible match. Just a shot in the dark really, I've never done this kind of thing before. There may be an algorithm for it searchable on the web.

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 10:01 am
by Little John
Hi KCC,

better change the title of this thread, so that people see what you really want.
This increases the chance that you'll get good answers. :-)

There are several ways to achieve your goal. The principle always is to greatly simplify the pictures, without loosing too much characteristic information. See e.g.:
http://purebasic.fr/english/viewtopic.php?f=12&t=54021
http://stackoverflow.com/questions/5962 ... any-images
http://stackoverflow.com/questions/1034 ... -detection

The most effective methods, which will generate the fewest false positive and false negative results, are probably methods that are based on wavelets.

Re: Search one picture in thousand others

Posted: Fri Dec 20, 2013 10:08 am
by Kwai chang caine
When I saw your original question I thought you were looking for an exact copy of the picture.
Yes it's my fault !!
How you want i find solution, when i'm not able to put the good question :cry: :oops: :oops:
For example, your grayscale idea is excellent as a start
Normal...the idea is not mine :mrgreen: :oops:
For example, your grayscale idea is excellent as a start, then look at how the px at 5,5 compares to that at 50,50. Store the result of that comparison along with the results from several other comparisons and when you do the same comparisons on a searched image, if they all fall pretty close to those from the source image, you have a possible match. Just a shot in the dark really, I've never done this kind of thing before
Yes but the problem of axis risk to make the more big problem surely :cry:
Because the brightness is perhaps more easy to jump...finally i hope :oops:
There may be an algorithm for it searchable on the web.
I continue to search software, because if it's exist a code on the web, surely it's in C....and the only C i know it's the two CC in KCC :oops:
Surely that exist, because GOOGLE do that since a long time....
But me i'm not GOOGLE...but i'm "GOGOLE" :lol: :lol: (If not exist in english....GOGOLE want say crazy, mongolian, ....in french)
Image

When even, ....the genius who have find this algorhythm.....is surely strong like
Image

@Thanks Little john
I do like you say, and try to change the title for something most explicit :oops:
And read the links you give to me 8)

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 20, 2013 10:39 am
by wilbert
From what I've read online, you might try the SURF algorithm.
http://en.wikipedia.org/wiki/SURF
Other algorithms that are mentioned are SIFT and PCA-SIFT.
I haven't got a clue how complicated it would be to implement.

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 20, 2013 10:55 am
by Kwai chang caine
Thanks a lot WILBERT for your link
I try to understand what you give to me...
Apparently several algo already exist...it's perhaps a chance for me 8)

Already a day i have found a super great code who search a picture into another in this forum
It's also another way to use, like NETMAESTRO say, search one or several little part of picture of Record1 can be possible..
Perhaps use several method in the same time !!!

It's strange like a simple thing for us the human is so hard for a machine :shock:
I'm happy...for a time, i'm more intelligent than a PC....it's not all the day.... :lol: :lol:

Since several days, i say to my machine, and can laugh at my pc :
"I'm more strong than you, little brain computer !!!"I'm more strong than you, na na neeeeereeee !!!"
Image

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 20, 2013 11:58 am
by PB
KCC, see this image comparer too: http://www.purebasic.fr/english/viewtop ... 12&t=50905

> It's strange like a simple thing for us the human is so hard for a machine

That's because we can think and adapt; computers can't. They only follow steps.

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 20, 2013 12:17 pm
by Kwai chang caine
Thanks PB for your interest 8)
You have right i have completely forgotten this code :oops:

But like we all say below, it's very difficult to compare
I have try this code with the two pictures below and even with one point of comparison it say : "it's not the same picture" :cry:
I have even convert the 2 pictures in grey scale, but it's the same result....so the problem is not the color..
Surely the fault of brightness, perhaps also the axis...

Thanks when even for your advice 8)

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 27, 2013 5:27 am
by idle
After bit of messing about this appears to work
but in the case of the two pictures presented the results may vary to much to make a match.

The problems as I see it are

1) color shifts
2) artefacts
3) scale
4) rotation

problems 3 and 4 can be solved by using the 3rd moment of Inertia, which is scale and rotation invariant
but it's not as robust when it comes to dealing with artefacts and colour shifts.

In the code below if the same picture is compared to a scaled or rotated version it should be within 2 decimal places
and it will provide some tolerance for small artefacts in the pictures.
To counter colour shifts eg brightness / contrast it's added a pass to obtain a colour scale factor
csf = (0-255) / (min-max)

Feel free to improve it.

Code: Select all

UseJPEGImageDecoder()
 UsePNGImageDecoder() 
 
 Procedure.d GetImageID(img,bColorScale=1)
    ;moments of intertia 
    ;author Idle 27/12/13 
    
    Protected x,y,px,r,g,b,col.d,csf.d,min,max 
    Protected x2,y2,sumX.d,sumY.d,sumXX.d,sumYY.d,sumXY.d 
    Protected RA.d,RB.d,ix.d,iy.d,ixy.d,Area.d,cx.d,cy.d,ta.d 
    If IsImage(img) 
       
     x2 = ImageWidth(img)
     y2 = ImageHeight(img) 
            
    If StartDrawing(ImageOutput(img)) 
       min=255 
       max = 0 
       
       If bColorScale 
       For x = 0 To x2 -1
          For y = 0 To y2 -1 
             px = Point(x,y)  
             r = Red(px)
             g= Green(px)
             b =Blue(px) 
             col = ((r+g+b) * 0.33333) 
             If col > max 
                max = col 
             EndIf 
             If col < min 
                min = col 
             EndIf 
          Next 
        Next   
        csf = (0-255) / (min-max) 
      Else 
         csf=1.0 
         min=0
      EndIf 
              
       For x = 0 To x2 -1
          For y = 0 To y2 -1
             
             px = Point(x,y)  
             r = Red(px)
             g= Green(px)
             b =Blue(px) 
             col = (((r+g+b) * 0.33333) - min) * csf 
             sumX = sumX + (x  * col)
             sumY = sumy + (y  * col) 
             Area + col   
             sumXX = sumxx + ((x*x)   * col) 
             sumYY = sumyy + ((y*y)  * col) 
             sumXY = sumxy + ((x*y)  * col) 
                         
          Next 
       Next      
       
       StopDrawing()
          
      
       If area 
          ;1st moments cx = centerX cy=centerY 
          cx = sumx / Area
          cy = sumy /Area
         ;2nd moments Ix Iy Ixy  
          ix = SumXX - (Area * (cx * cx))
          iY = SumYY - (Area * (cy * cy))
          ixY = Sumxy - (Area * (cx * cy))
          
          ;3rd moment shape parameter invarent to scale and rotation 
          ta= Sqr((2.0 * ix * iY) - (4.0 *(ixy * ixy)) * 0.5) 
          
          Ra = (((ix + iy)) + ta ) 
          Rb = (((ix + iy)) - ta ) 
          
          ProcedureReturn (Ra / Rb) 
          
       EndIf    
     EndIf   
  EndIf 
  
    EndProcedure    
 
Global  m1.d, m2.d
 
  Pattern.s = "JPG (*.jpg)|*.jpg;PNG (*.png)|*.png;"
  
  imgfile.s = OpenFileRequester("Please choose file to load", "", Pattern, 0)
  
  img1 = LoadImage(#PB_Any,imgfile)
  If img1 
     m1 = GetImageID(img1) 
      Debug "ID " + StrF(m1)
  EndIf 
   
  imgfile.s = OpenFileRequester("Please choose file to load", "", Pattern, 0)
  
  img2 = LoadImage(#PB_Any,imgfile)
  If img2 
     m2 = GetImageID(img2)
     Debug "ID " + StrF(m2)
  EndIf 
    
    


Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 27, 2013 11:19 pm
by heartbone
If anyone figures out a solution to this problem, the NSA will love you to death. :twisted:

Re: Search similitary of a picture in folder of thousand oth

Posted: Fri Dec 27, 2013 11:47 pm
by TheMexican
Wow this is really exciting!!!
Come on guys you can do it!!!

Who will be the expert at cracking this challenge!!!

Re: Search similitary of a picture in folder of thousand oth

Posted: Sat Dec 28, 2013 1:06 am
by IdeasVacuum
For that particular image, I think OCR could do it.