Page 1 of 1
Search for duplicate files
Posted: Thu Jun 23, 2022 3:51 am
by AZJIO
Search duplicates
Download:
yandex upload.ee
screenshot on linux
1. In the context menu, you can select deletion priority levels.
2. You can save CSV and then use CSV list.
updated
Added unlimited priority level selection
Added removal of an item from the search box
Added group item color (in ini)
Added PseudoHashSize parameter to ini
Added saving results to a file (to compare Linux and Windows results)
Re: Search for duplicate files
Posted: Thu Jun 23, 2022 4:42 pm
by Kwai chang caine
Thanks for sharing

Re: Search for duplicate files
Posted: Mon Jun 27, 2022 7:33 pm
by AZJIO
To increase the speed of pre-comparison of files, I used the division of the file length into 32 sections and read the data byte 32 times. Now if a series consists of 200 series of the same size, then instead of calculating the md5 of large files with a total size of 100 GB, I read 32 bytes from each file. It happens 10 times faster. And only after that I calculate md5, if the preliminary comparison still gives a suspicion that the files are the same.
I added the source code with the prefix PseudoHash.
Code: Select all
DisableDebugger
EnableExplicit
UseMD5Fingerprint()
Define Path$, StartTime, Res.s, md5$
Procedure.s GetPseudoHash(Path$, Shift.q)
Protected res$, length, file_id
file_id = ReadFile(#PB_Any, Path$)
If file_id
length = Lof(file_id)
FileSeek(file_id, 4, #PB_Relative)
While Eof(file_id) = 0
res$ + Hex(ReadByte(file_id), #PB_Byte)
FileSeek(file_id, Shift, #PB_Relative)
Wend
FileSeek(file_id, length - 1, #PB_Absolute)
res$ + Hex(ReadByte(file_id), #PB_Byte)
CloseFile(file_id)
EndIf
ProcedureReturn res$
EndProcedure
Path$ = "path_to_video"
StartTime=ElapsedMilliseconds()
md5$ = GetPseudoHash(Path$, FileSize(Path$) / 31)
Res = "hash time = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("hash_0", md5$ + #LF$ + #LF$ + Res)
Path$ = "path_to_movie_of_the_same_size_but_different_hash"
StartTime=ElapsedMilliseconds()
md5$ = FileFingerprint(Path$, #PB_Cipher_MD5)
Res = "hash time md5 = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("md5", md5$ + #LF$ + #LF$ + Res)
Re: Search for duplicate files
Posted: Tue Jun 28, 2022 5:55 am
by IceSoft
I got the warning:
Couldn't download - Virus detected
Can you provide the source only too?
Re: Search for duplicate files
Posted: Tue Jun 28, 2022 3:59 pm
by AZJIO
https://disk.yandex.ru/d/QvQ5oqebC69uZA
Will the antivirus allow you to compile?
Re: Search for duplicate files
Posted: Tue Jun 28, 2022 6:37 pm
by IceSoft
Sure. I can stopp it.
I see the source and can trust it
Re: Search for duplicate files
Posted: Tue Jun 28, 2022 9:29 pm
by AZJIO
IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it
At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.
Re: Search for duplicate files
Posted: Wed Jun 29, 2022 6:28 am
by IceSoft
AZJIO wrote: Tue Jun 28, 2022 9:29 pm
IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it
At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.
Avast, Trendmicro, Defender
Re: Search for duplicate files
Posted: Wed Jun 29, 2022 8:05 am
by AZJIO
Re: Search for duplicate files
Posted: Fri Jul 01, 2022 7:38 pm
by AZJIO
Update
Added filter/mask for files.
The Windows version does not show checkboxes for groups.