Bytessence DuplicateFinder (v1.3, 07Oct2011)

Inf0Byt3 · Post by **Inf0Byt3** » Mon Dec 20, 2010 9:32 pm

Bytessence DuplicateFinder is a lightweight duplicate file detection program written with PureBasic. It's pretty straightforward to use as the UI is in the form of a wizard. You can see the basic features below:

Features

* 100% spyware and malware free
* Very easy to use
* Wizard-like, multi-language user interface
* Unicode filename support
* Scans folders or file groups
* Optimized detection algorithms (file size, CRC32 and bytewise comparison)
* Duplicate detection by contents, dates, name, size
* Employs file filters (date, attributes, size) to minimize scan time
* Automatic duplicate file selector
* The duplicates can be copied, moved, recycled or passed as parameters to external programs
* Free e-mail support

Screenshots

Download links
For the portable (ZIP) version:
http://bytessence.com/download/bdf/BDF.zip

For the installer version:
http://bytessence.com/download/bdf/BDF.exe

Please let me know if you have any suggestions or if you find any bugs. Also, if you have time i'd really appreciate some translations (Unicode supported).

ts-soft · Post by **ts-soft** » Tue Dec 21, 2010 12:16 am

thx, looks good!

Greetings - Thomas

IdeasVacuum · Post by **IdeasVacuum** » Tue Dec 21, 2010 12:54 am

Thanks Inf0Byt3. The GUI design is very nice indeed.

IdeasVacuum · Post by **IdeasVacuum** » Tue Dec 21, 2010 1:59 am

...ran the program and the results are excellent! Runs very nicely in the background without disrupting other apps.

Some comments:

Action 'Recycle' should be 'Recycle Bin';
After completing stage 5 (Results), there does not seem to be a way to return to Stage 1;
It would be nice to be able to re-size the window full-screen when examining the results list.

Inf0Byt3 · Post by **Inf0Byt3** » Tue Dec 21, 2010 11:30 am

Thanks for having a look at it. I'll try to implement the suggestions in the next version of the program

.

Vera · Post by **Vera** » Tue Dec 21, 2010 1:06 pm

Hello Inf0Byt3,

thanks for
- sharing this neat programm

- ++ allowing portable mode
- allowing personal languages*
- the save modus of not needing to delete dublicates or just saving a list

Here's what I came across so far:

- irritating that top-buttons are no buttons
- unresizable result pane is too tiny to check the list comfortably, free resizing would be fine

- very frustrating that you can't go back from result-pane to do the next scan, (especially if 0 results in 0 seconds found, so one of the settings is missing

) or change any setting...
- with restarting it would have been nice if the selected folders would have been remembered
- scan of filetypes cannot be unchecked, but it doesn't matter if there's no entry

tip: on Scan zones \ File types
it's more natural to have the positive (including) question first, before a neglecting one (excluding). I would switch these two rows.
- additional popups on scan-filetypes strings showing an example of the insert-pattern would enhance the 'intuitiv' usage (e.g.: bat,cad,chm,...)

- the next-button on options-pane should be labled 'scan' because one thinks you'll see the scan-pane first before submitting (found out later: only half-good idea, as it's dependent on menu design)

- it would be nice to see as well how many files are within one item. e.g.: +mypic.png (4.67kb / 7)
- export to text: total duplicates doesn't corespond to found results, it's much much higher
- idea: export selected as text (?)

- automatic selector: you can't insert an exact value (1.02) to the size string-input, as dots or commas aren't allowed
- match-option: contents: do you really mean the contents or is this an MD5-check (value) ?

* I had joy translating the program-guide to german and reached about 75% allready. I'll hand it over as soon it's in alpha stage

greetings ~ Vera

Inf0Byt3 · Post by **Inf0Byt3** » Wed Dec 22, 2010 9:04 am

Hi Vera,

- irritating that top-buttons are no buttons
- unresizable result pane is too tiny to check the list comfortably, free resizing would be fine

I will try to see if I can make it autoresize... The problem is with the other gadgets in the other pages but there has to be a solution for this.

- very frustrating that you can't go back from result-pane to do the next scan, (especially if 0 results in 0 seconds found, so one of the settings is missing ) or change any setting...
- with restarting it would have been nice if the selected folders would have been remembered
- scan of filetypes cannot be unchecked, but it doesn't matter if there's no entry

tip: on Scan zones \ File types
it's more natural to have the positive (including) question first, before a neglecting one (excluding). I would switch these two rows.
- additional popups on scan-filetypes strings showing an example of the insert-pattern would enhance the 'intuitiv' usage (e.g.: bat,cad,chm,...)

These are easy to add

.

- export to text: total duplicates doesn't corespond to found results, it's much much higher

I've tested this, seems to work well. Note that the list contains all the files, including the original and all the clones (we can have more than 2).

- match-option: contents: do you really mean the contents or is this an MD5-check (value) ?

The match is done by contents, MD5 is not involved. Basically the matcher works like this:
-The scan is first performed at size level.
-The files that have the same size are hashed with CRC32.
-If the obtained hashes match, the files are compared at byte level.
This way should be 100% safe, false positives cannot occur.

* I had joy translating the program-guide to german and reached about 75% allready. I'll hand it over as soon it's in alpha stage

That's very nice, thank you for taking the time to translate BDF

.

I'll start implementing the suggestions for the next version.

Vera · Post by **Vera** » Thu Dec 23, 2010 1:02 am

Hi,

thanks for the explanation about 'contents'. And sorry for your extra effort - I simply got carried away by the wordmeaning and totally forgetting to look for it in your rich help file.

There's another concern about 'Match dublicates by':
there are 6 items: 3 short/single words and 3 double words. In the german language two of the doubles don't fit in (and I haven't found alternativ words yet). As this might easily happen with other languages as well, I came to the idea to change the arrangement. The single-words in the first column, which might then even be narrowed a bit and allows more space to the second column. [even 'modification date' doesn't fit on my PC with a slightly encreased dpi]

total duplicates doesn't corespond to found results
... Note that the list contains all the files, including the original and all the clones (we can have more than 2).

It really took me an hour to get that straight what you're calculating and where the trouble arises: it lies within the expression. If you read the following, wouldn't you think both would mean the same ?
RESULT: Found XX duplicate files
EXPORT: Total duplicates found YY
But: Export doesn't resume all found duplicates but all processed files. The information about the amount of 'doubles' gets lost, so I would suggest to give both values in the export file like e.g.: 'found XX duplicates (result) in YY files (total)'

~~~~~~~~~~~~~~~
I found a word that can't be translated: 'at' (in the export file)
l46=File export date: $P
=> File export date: 12/22/10 at 21:55:42
~~~~~~~~~~~~~~~

While scanning, there's an information-gui that I can't really see, as I only check smaller regions (I don't want to stress my harddrive too much with each test): it pops up and vanishes too quick. (I can't check the translation! Even starting a big search and stoping won't keep it open.)
Independant from this, wouldn't it be of interest to have time enough to study the results on this pane and exit it manually when 'satisfied'?
In case this would be an uncomfortable feature to some, could it be made an option like: 'autoclose process-gui on check-finish' ?

So far for tonight - christmas is demanding a lot of time as it's creeping closer

greetings ~ Vera

Inf0Byt3 · Post by **Inf0Byt3** » Thu Dec 23, 2010 9:42 am

thanks for the explanation about 'contents'. And sorry for your extra effort - I simply got carried away by the wordmeaning and totally forgetting to look for it in your rich help file.

Truth is the label wasn't too helpful because it's too short. I have to find a way to make the labels a bit bigger.

There's another concern about 'Match dublicates by':
there are 6 items: 3 short/single words and 3 double words. In the german language two of the doubles don't fit in (and I haven't found alternativ words yet). As this might easily happen with other languages as well, I came to the idea to change the arrangement. The single-words in the first column, which might then even be narrowed a bit and allows more space to the second column. [even 'modification date' doesn't fit on my PC with a slightly encreased dpi]

I was afraid this would happen. Will try to make space for the items.

It really took me an hour to get that straight what you're calculating and where the trouble arises: it lies within the expression. If you read the following, wouldn't you think both would mean the same ?
RESULT: Found XX duplicate files
EXPORT: Total duplicates found YY
But: Export doesn't resume all found duplicates but all processed files. The information about the amount of 'doubles' gets lost, so I would suggest to give both values in the export file like e.g.: 'found XX duplicates (result) in YY files (total)'

I see, you're definitely right. The expressions are ambiguous and don't refer to the same thing.

I found a word that can't be translated: 'at' (in the export file)
l46=File export date: $P
=> File export date: 12/22/10 at 21:55:42

Nice find, it will be fixed.

While scanning, there's an information-gui that I can't really see, as I only check smaller regions (I don't want to stress my harddrive too much with each test): it pops up and vanishes too quick. (I can't check the translation! Even starting a big search and stoping won't keep it open.)
Independant from this, wouldn't it be of interest to have time enough to study the results on this pane and exit it manually when 'satisfied'?
In case this would be an uncomfortable feature to some, could it be made an option like: 'autoclose process-gui on check-finish' ?

I'll see what i can do about this... Maybe make it dependent on the next button and add the option you mentioned.

So far for tonight - christmas is demanding a lot of time as it's creeping closer

Hehe it's indeed close. Thank you very much for all your time, it's much appreciated.

Vera · Post by **Vera** » Tue Dec 28, 2010 1:58 pm

Hello,

as announced above, I translated the programinterface to german and also the help file as well: BDF_de_chm.zip

Updated language files: BDF_1.1_de_chm.zip

It would be nice if some german members could have a look at it and let me know if it's suitable or if things should be expressed differently.

greetings ~ Vera

rsts · Post by **rsts** » Tue Dec 28, 2010 7:23 pm

Very nice Mr. Inf0Byt3.

Now I can clean up my music folders

cheers

Inf0Byt3 · Post by **Inf0Byt3** » Wed Dec 29, 2010 9:32 pm

Vera wrote:Hello,

as announced above, I translated the programinterface to german and also the help file as well: BDF_de_chm.zip

Thank you Vera, I'll have it uploaded to the site as soon as I can.

rsts wrote: Very nice Mr. Inf0Byt3.

Now I can clean up my music folders

cheers

Thanks rsts, I'm glad it's useful

.

Inf0Byt3 · Post by **Inf0Byt3** » Wed Jan 05, 2011 2:51 pm

Version 1.1 of BDF is now ready. Thanks to Vera for all the suggestions and the testing.

Changes in this version:

-Added a 'New Scan' button on the results page
-Added tooltips to some controls
-Added the possibility for the language files to open their own CHM help files
-Fixed a hardcoded translation label
-Fixed the translation manager not verifying the files before loading them
-Fixed the portable mode not using the default configuration values
-Moved the controls on the results page to make more space for the duplicates list
-Changed some ambiguous translation strings
-The duplicate group headers now also show the number of files they contain
-The statistics page will remain on the screen after the scan ends
-Both the scan and excluded files/folders are now remembered
-Made minor interface changes so translations can fit better
-Fixed other minor bugs

Vera · Post by **Vera** » Wed Jan 05, 2011 8:42 pm

Hello Inf0Byt3,

thanks for all these enhancements

especially 're-scanning' is very helpful (though I would rather jump to page 2) and the saving of presetted paths.
Also moving the controls is a good idea to give more space to the listview, but resizing will still be needed on the long run to comfortly cross-check, if all selections are proper.
... more via PM.

Now both german language files (ini & chm) are ready for use - download above ~ enjoy.

cheers ~ Vera

Rescator · Post by **Rescator** » Sat Jan 08, 2011 6:36 am

Inf0Byt3 wrote:Basically the matcher works like this:
-The scan is first performed at size level.
-The files that have the same size are hashed with CRC32.
-If the obtained hashes match, the files are compared at byte level.
This way should be 100% safe, false positives cannot occur.

Here's a tip!

If you are able to detect if the two files to be compared are on different devices. (two different HD's or DVD and HD etc.)
Then you could just do a byte compare as that should be slightly faster than doing CRC32 + bytecompare, as CRC32 reads every byte anyway.

However if the files are on the same device then doing CRC32 is faster as you do sequential read of one file then sequential of the other obviously.

Another tip is that you could also do CRC32 using two threads and use large filebuffers.

PureBasic Forums - English

Bytessence DuplicateFinder (v1.3, 07Oct2011)

Bytessence DuplicateFinder (v1.3, 07Oct2011)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Re: Bytessence DuplicateFinder (v1.1, 05Jan2010)

Re: Bytessence DuplicateFinder (v1.1, 05Jan2010)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)