Fast recognize big file [Resolved]

Just starting out? Need help? Post your questions and find answers here.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Fast recognize big file [Resolved]

Post by Kwai chang caine »

Bonjour at all

I have a big list of files (Like everyone you will say to me :mrgreen: )
I want quicky recognize each one, without use his names
For that, i surely patch it with splendid code INFRATEC and MIJIKAÏ give to me 8) , but when even i have searched to recognize it without the patch for have one more security.

I say to me it's easy, i add the maximum of informations who never changing about each file in a sentence, example : "size+Extension+#PB_Date_Created+Size image (if it's a movie)
And i'm surprising, because event with all this informations in a sentence, there are numerous duplicate :shock:

So never mind, i say to me with a fingerprint of 10 000 bits of the beginning, the problem is surely done :D ....no duplicate :|
After i say i can read the file, and take a list of bits in HEXA and do a sentence with it....duplicate :cry:
The first time, i have take 100 bits at the begining, and always numerous duplicate :?
After i try to read a bit all the 20 000 bits and adding the HEXA in one sentence,

Code: Select all

Canal = ReadFile(#PB_Any, Fichier$)
         
  If Canal
   
   TailleFichier.l = FileSize(Fichier$)
   TailleSaut = Int(TailleFichier / 30)
   PhraseHexa$ = ""
       
   For b = 100000 To TailleFichier
   
    b + TailleSaut
    
    If b <= TailleFichier
     FileSeek(Canal, b)
     Bit = ReadByte(Canal)
     Hexa$ = Right(Hex(Bit), 2)
     PhraseHexa$ + Hexa$
    EndIf
     
   Next
      
   WriteStringN(CanalLog, Fichier$ + "|" + PhraseHexa$, #PB_UTF8)
   CloseFile(Canal)
   Delay(30)
      
but duplicate again :x

And the worst, if i add the sentence of information of the begining + the sentence of bits or fingerprint, it's always the same resul..dupicate, duplicate, duplicate :?

Obviously, i have also try to create a fingerprint of all the file, but for the with 1GO
This time ...

Image

Then if you have an idea for create a unique key with each file, or create and read a fingerprint relatively quickly...

Image

i would be really interested

Have a good day
Last edited by Kwai chang caine on Tue Jul 12, 2022 8:25 pm, edited 1 time in total.
ImageThe happiness is a road...
Not a destination
BarryG
Addict
Addict
Posts: 4121
Joined: Thu Apr 18, 2019 8:17 am

Re: Fast recognize big file

Post by BarryG »

I've used FileFingerprint() with #PB_Cipher_CRC32 and the first 10240 bytes (10 KB) in the past with good results. For better results, use #PB_Cipher_MD5 and a longer length (maybe 1 MB?). I don't fingerprint the entire file because large sizes can take quite a while to fingerprint (over a minute for 6 GB files).
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file

Post by Kwai chang caine »

Hello BarryG :D

Thanks for your interest, and answer 8)

Strange !!!! :shock:
I have also FingerPrint 1000, PB_Cipher_CRC32 and even PB_Cipher_MD5 and i have duplicate result with 18 000 files :|
Approximatively one hundred duplicate on 18000 :oops:
Perhaps (Surely ...like usually, i don't change a losing "team" :mrgreen:) i have do an error :oops:

The first time i have take 1000 the begining, and the second time, i have take the 1000 in the middle of the file, because i say to me perhaps a movie for example always begin with the same bits (A style of header) and perhaps each MP4 have the same begining...i don't know :oops:
And no need to ask what color is Henry IV's (French King) white horse :mrgreen:

Image

Code: Select all

For i = 1 To MaxFichiers
 
 Fichier$ = TabloFichiers(i)
 Canal = ReadFile(#PB_Any, Fichier$)
 
 If Canal
  
  TailleFichier.l = FileSize(Fichier$)
  TailleFichier = Abs(TailleFichier)
  DebutTampon = Int(TailleFichier / 2)
  *Ptr = AllocateMemory(TailleTampon)
  FileSeek(Canal, DebutTampon)
  ReadData(Canal, *Ptr, TailleTampon)
  PhraseHexa$ = ""
  
  For b = 0 To TailleTampon
   Bit = PeekB(*Ptr + b)
   Hexa$ = Right(Hex(Bit), 2)
   PhraseHexa$ + RSet(Hexa$, 2, "0")
  Next
  
  FreeMemory(*Ptr)
  CloseFile(Canal)
  .    
 Next
However, i have not understand why, sometime... i have negative FileSize :shock:
I have adding Abs() for fix that...but I must have done something stupid again :oops:
ImageThe happiness is a road...
Not a destination
User avatar
NicTheQuick
Addict
Addict
Posts: 1502
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Fast recognize big file

Post by NicTheQuick »

Long variables (TailleFichier.l) are limited to 32 bit and are signed, so they do not work for file sizes bigger than 2 GiB. Just use Quads or Integers. It makes no sense to use Long variables outside of structures.

This maybe also explains why you've got so many duplicates. If TailleFichier is negative the for loop simply will be skipped entirely.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Mijikai
Addict
Addict
Posts: 1517
Joined: Sun Sep 11, 2016 2:17 pm

Re: Fast recognize big file

Post by Mijikai »

Also, assuming you are using Windows u can give FileMapping a try
its usually a lot faster than the normal Open/Read File stuff.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file

Post by Kwai chang caine »

Hello at you two :D

@NicTheQuick
Thanks for your precious help 8)
I did not understand what you say for the long :shock:
Stupidely i use a long becausei tried to play the "master" and i thought it is a more LONG value :oops:
Thanks to you i use now a quad or integers (The defaut format PB variable) :wink:
That'll teach me to try to be like you the "MASTERS" of this site :mrgreen:

@Mijikai
Yes i use WINDOWS, it's already enough strong for little KCC, for not try to use LINUX or other :oops:
u can give FileMapping a try
Incredible :shock:
Never i believe, i can use FileMapping for open a file
I believe this function is only for share datas between 2 exe :oops:

Thanks a lot at you two 8)
ImageThe happiness is a road...
Not a destination
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file

Post by Kwai chang caine »

@MASTER NickTheQuick
You have right, it's really better when KCC don't play sorcerer's apprentice :oops:

Image

with things FRED have seriously make by default :|
I not have negative value now
Again thanks MASTER for your precious help 8)

@MASTER MIJIKAÏ

Image

I don't believe what it's happened :shock:
KCC is struck by the MASTER MIJIKAÏ effect
WonderMijikaï wrote:u can give FileMapping a try
I have searched a PB code for load the file, like you say to me, and nothing :|
Then KCC listening only to his courage, dared to go alone on the NET :?

Image

So...Kcc taked like a grown-up a real C code of a MASTER of C and dare to try to translate it, in his love PB language, with his only one neurone :oops:
It must be said, that nulls dare everything, that's even how we recognize them :mrgreen:
https://stackoverflow.com/questions/683 ... on-windows

Code: Select all

#include <stdio.h>
#include <Windows.h>

int main(int argc, char *argv[])
{
    TCHAR *lpFileName = TEXT("hello.txt");
    HANDLE hFile;
    HANDLE hMap;
    LPVOID lpBasePtr;
    LARGE_INTEGER liFileSize;

    hFile = CreateFile(lpFileName, 
        GENERIC_READ,                          // dwDesiredAccess
        0,                                     // dwShareMode
        NULL,                                  // lpSecurityAttributes
        OPEN_EXISTING,                         // dwCreationDisposition
        FILE_ATTRIBUTE_NORMAL,                 // dwFlagsAndAttributes
        0);                                    // hTemplateFile
    if (hFile == INVALID_HANDLE_VALUE) {
        fprintf(stderr, "CreateFile failed with error %d\n", GetLastError());
        return 1;
    }

    if (!GetFileSizeEx(hFile, &liFileSize)) {
        fprintf(stderr, "GetFileSize failed with error %d\n", GetLastError());
        CloseHandle(hFile);
        return 1;
    }

    if (liFileSize.QuadPart == 0) {
        fprintf(stderr, "File is empty\n");
        CloseHandle(hFile);
        return 1;
    }

    hMap = CreateFileMapping(
        hFile,
        NULL,                          // Mapping attributes
        PAGE_READONLY,                 // Protection flags
        0,                             // MaximumSizeHigh
        0,                             // MaximumSizeLow
        NULL);                         // Name
    if (hMap == 0) {
        fprintf(stderr, "CreateFileMapping failed with error %d\n", GetLastError());
        CloseHandle(hFile);
        return 1;
    }

    lpBasePtr = MapViewOfFile(
        hMap,
        FILE_MAP_READ,         // dwDesiredAccess
        0,                     // dwFileOffsetHigh
        0,                     // dwFileOffsetLow
        0);                    // dwNumberOfBytesToMap
    if (lpBasePtr == NULL) {
        fprintf(stderr, "MapViewOfFile failed with error %d\n", GetLastError());
        CloseHandle(hMap);
        CloseHandle(hFile);
        return 1;
    }

    // Display file content as ASCII charaters
    char *ptr = (char *)lpBasePtr;
    LONGLONG i = liFileSize.QuadPart;
    while (i-- > 0) {
        fputc(*ptr++, stdout);
    }

    UnmapViewOfFile(lpBasePtr);
    CloseHandle(hMap);
    CloseHandle(hFile);

    printf("\nDone\n");
}
And i don't say to you what's happened ? :shock:
WonderMijikaï wrote:Yes yes !!! say to me what's happened !!! :lol:
Ok..Ok... i say to you....i say to you.... MASTER :mrgreen:

For the first time of my life....so nearly 40 years of bad programmation
The code working !!!!!

Image

Probably like a potato in a washing machine...but it spins !!! :mrgreen:

Code: Select all

File$ = "D:\Video\TheMovieAtKcc.mkv"
UseMD5Fingerprint()
Size = FileSize(File$)
Name$ = "Kcc"
hFile = CreateFile_(@File$, #GENERIC_READ, 0, #Null, #OPEN_EXISTING, #FILE_ATTRIBUTE_NORMAL,0)

If hFile
  
 hMap = CreateFileMapping_(hFile, #Null, #PAGE_READONLY, 0,0,#Null)
 
 If hMap
 
  MapSize = GetFileSize_(hFile, 0)
  Debug MapSize
 
  *lpBasePtr = MapViewOfFile_(hMap, #FILE_MAP_READ, 0, 0, 0)
  
  If *lpBasePtr
   
   Debug *lpBasePtr
   HeureDebut = ElapsedMilliseconds()
   Debug Fingerprint(*lpBasePtr, MapSize, #PB_Cipher_MD5)
   Debug StrF((ElapsedMilliseconds() - HeureDebut) / 1000) + " secondes"
   UnmapViewOfFile_(*lpBasePtr);
   CloseHandle_(hMap);
   CloseHandle_(hFile);
   HeureDebut = ElapsedMilliseconds()
   Debug FileFingerprint(File$, #PB_Cipher_MD5)
   Debug StrF((ElapsedMilliseconds() - HeureDebut) / 1000) + " secondes"
   
  EndIf
  
 EndIf  

EndIf   
The bad new is (because with KCC, you see later, there are always a bad news in the end :oops:)
That works faster than previous Kcc cowpat 8)
But a little bit slower than FRED FileFingerprint() function i found just now, when i want have information about Fingerprint :oops:
Debugger wrote: MapSize = 812503724
*lpBasePtr = 59637760

Fingerprint = 36dbf8bce3dfa90a161d639b7091624c
in 1.7879999876 secondes

FileFingerprint = 36dbf8bce3dfa90a161d639b7091624c
in 1.7589999437 secondes
I believe there are no way for do that faster :wink:

Then ....one thousand of thanks at you two 8)
Last edited by Kwai chang caine on Tue Jul 12, 2022 8:28 pm, edited 6 times in total.
ImageThe happiness is a road...
Not a destination
User avatar
NicTheQuick
Addict
Addict
Posts: 1502
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Fast recognize big file

Post by NicTheQuick »

Can you please stop adding all these images and GIFs to your posts. They are a bit annoying in my opinion. :?

I have another question: What is your mother tongue and which translator do you use? Sometimes it is very complicated for me to understand what you want to say. I think it also has to do with the fact that English is not my first language.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file [Resolved]

Post by Kwai chang caine »

I'm french, and i use a google translate for parts of my sentences
Because translate all the text with GoogleTranslate is not more understandable , i have already tested with several members for say that.

The problem, even if you are englishman, apparently there are several english.
And a member who have english language in his land, have when even difficulty to understand others english :shock:
I believe it's complicated for everybody to understand others, if obvioulsy it's not two english man who talking between us :wink:
ImageThe happiness is a road...
Not a destination
User avatar
ar-s
Enthusiast
Enthusiast
Posts: 344
Joined: Sat Oct 06, 2007 11:20 pm
Location: France

Re: Fast recognize big file [Resolved]

Post by ar-s »

Hey KCC, you should try Deepl.com transator instead of google "random" translator :)
~Ar-S~
My Image Hoster for PB users
My webSite (french) with PB apps : LDVMULTIMEDIA
PB - 3.x / 5.7x / 6 - W11 x64 - Ryzen 7 3700x / #Rpi4

Code: Select all

r3p347 : 7ry : un71l d0n3 = 1
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file [Resolved]

Post by Kwai chang caine »

Hello ARS :D
Yes it'is perhaps better, but i have often see it's complicated for a machine to traduce a full text not really simple and transmit the same emotion :|
Have you see when a web page is automaticaly translating like on Amazon site, for exampler ?
The result is far to be perfect, and often i'm ask to me what the traduction want really mean :|
In fact, more the text is long...
And KCC and the SMS...it's not really a love story, you know me :wink:
And again, the forumers of this site never see my french posts .......
That's surely what you must have said to yourself... right? :wink: :mrgreen:
ImageThe happiness is a road...
Not a destination
User avatar
NicTheQuick
Addict
Addict
Posts: 1502
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Fast recognize big file [Resolved]

Post by NicTheQuick »

DeepL is the best free translation service I know. Way better than Google. Try it out. :D
I sometimes use it too. Btw I am from Germany so we are neighbors. In fact I am living just a few kilometers from the border to France.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5494
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Fast recognize big file [Resolved]

Post by Kwai chang caine »

In fact my problem is sometime to be faithful :|
About my wife, it's not really a problem... Although sometimes I think so!!! :lol:
But for other things, old objects, cars, etc... as long as it works (even partially) I keep it.
And it's very hard for me to change my ways :oops:
In fact I am living just a few kilometers from the border to France.
Incredible !!! :shock:
I'm at four hours of the deutch frontier.
I always wanted to go back to germany like in my youth, but it never happened. :|
Because, I adored your country that I lived for a year. 8)
I had even tried to learn your language which I find as beautiful as it is difficult
And all I have left today of this splendid language is : "Ein Apfelsaft und die Rechnung bitte" and also "Kommen ze mit mir tzu bet" :mrgreen:
ImageThe happiness is a road...
Not a destination
Post Reply