Search for duplicate files

AZJIO · Post by **AZJIO** » Thu Jun 23, 2022 3:51 am

Search duplicates

Download: yandex upload.ee

screenshot on linux

1. In the context menu, you can select deletion priority levels.
2. You can save CSV and then use CSV list.

updated
Added unlimited priority level selection
Added removal of an item from the search box
Added group item color (in ini)
Added PseudoHashSize parameter to ini
Added saving results to a file (to compare Linux and Windows results)

Kwai chang caine · Post by **Kwai chang caine** » Thu Jun 23, 2022 4:42 pm

Thanks for sharing

AZJIO · Post by **AZJIO** » Mon Jun 27, 2022 7:33 pm

To increase the speed of pre-comparison of files, I used the division of the file length into 32 sections and read the data byte 32 times. Now if a series consists of 200 series of the same size, then instead of calculating the md5 of large files with a total size of 100 GB, I read 32 bytes from each file. It happens 10 times faster. And only after that I calculate md5, if the preliminary comparison still gives a suspicion that the files are the same.
I added the source code with the prefix PseudoHash.

Code: Select all

DisableDebugger
EnableExplicit
UseMD5Fingerprint()
Define Path$, StartTime, Res.s, md5$

Procedure.s GetPseudoHash(Path$, Shift.q)
	Protected res$, length, file_id
	file_id = ReadFile(#PB_Any, Path$)
	If file_id
		length = Lof(file_id)
		FileSeek(file_id, 4, #PB_Relative)
	    While Eof(file_id) = 0
	        res$ + Hex(ReadByte(file_id), #PB_Byte)
			FileSeek(file_id, Shift, #PB_Relative)
	    Wend
		FileSeek(file_id, length - 1, #PB_Absolute)
	     res$ + Hex(ReadByte(file_id), #PB_Byte)
	    CloseFile(file_id)
	EndIf
	ProcedureReturn res$
EndProcedure

Path$ = "path_to_video"
StartTime=ElapsedMilliseconds()
md5$ =  GetPseudoHash(Path$, FileSize(Path$) / 31)
Res = "hash time = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("hash_0", md5$ + #LF$ + #LF$ + Res)

Path$ = "path_to_movie_of_the_same_size_but_different_hash"

StartTime=ElapsedMilliseconds()
md5$ = FileFingerprint(Path$, #PB_Cipher_MD5)
Res = "hash time md5 = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("md5", md5$ + #LF$ + #LF$ + Res)

IceSoft · Post by **IceSoft** » Tue Jun 28, 2022 5:55 am

I got the warning:
Couldn't download - Virus detected

Can you provide the source only too?

AZJIO · Post by **AZJIO** » Tue Jun 28, 2022 3:59 pm

https://disk.yandex.ru/d/QvQ5oqebC69uZA
Will the antivirus allow you to compile?

IceSoft · Post by **IceSoft** » Tue Jun 28, 2022 6:37 pm

AZJIO wrote: Tue Jun 28, 2022 3:59 pm https://disk.yandex.ru/d/QvQ5oqebC69uZA
Will the antivirus allow you to compile?

Sure. I can stopp it.
I see the source and can trust it

AZJIO · Post by **AZJIO** » Tue Jun 28, 2022 9:29 pm

IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it

At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.

IceSoft · Post by **IceSoft** » Wed Jun 29, 2022 6:28 am

AZJIO wrote: Tue Jun 28, 2022 9:29 pm
IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it
At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.

Avast, Trendmicro, Defender

AZJIO · Post by **AZJIO** » Wed Jun 29, 2022 8:05 am

IceSoft wrote: Wed Jun 29, 2022 6:28 am Avast, Trendmicro, Defender

https://www.virustotal.com/gui/file/a01 ... ?nocache=1
New: https://www.virustotal.com/gui/file/b1d ... ?nocache=1

AZJIO · Post by **AZJIO** » Fri Jul 01, 2022 7:38 pm

Update
Added filter/mask for files.
The Windows version does not show checkboxes for groups.

PureBasic Forums - English

Search for duplicate files

Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files

Re: Search for duplicate files