Bytessence DuplicateFinder (v1.3, 07Oct2011)

Applications, Games, Tools, User libs and useful stuff coded in PureBasic
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Bytessence DuplicateFinder (v1.3, 07Oct2011)

Post by Inf0Byt3 »

Bytessence DuplicateFinder is a lightweight duplicate file detection program written with PureBasic. It's pretty straightforward to use as the UI is in the form of a wizard. You can see the basic features below:

Features
* 100% spyware and malware free
* Very easy to use
* Wizard-like, multi-language user interface
* Unicode filename support
* Scans folders or file groups
* Optimized detection algorithms (file size, CRC32 and bytewise comparison)
* Duplicate detection by contents, dates, name, size
* Employs file filters (date, attributes, size) to minimize scan time
* Automatic duplicate file selector
* The duplicates can be copied, moved, recycled or passed as parameters to external programs
* Free e-mail support
Screenshots
Image

Image

Image

Download links
For the portable (ZIP) version:
http://bytessence.com/download/bdf/BDF.zip

For the installer version:
http://bytessence.com/download/bdf/BDF.exe

Please let me know if you have any suggestions or if you find any bugs. Also, if you have time i'd really appreciate some translations (Unicode supported).
Last edited by Inf0Byt3 on Fri Oct 07, 2011 8:47 pm, edited 6 times in total.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by ts-soft »

:D
thx, looks good!

Greetings - Thomas
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by IdeasVacuum »

Thanks Inf0Byt3. The GUI design is very nice indeed.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by IdeasVacuum »

...ran the program and the results are excellent! Runs very nicely in the background without disrupting other apps.

Some comments:

Action 'Recycle' should be 'Recycle Bin';
After completing stage 5 (Results), there does not seem to be a way to return to Stage 1;
It would be nice to be able to re-size the window full-screen when examining the results list.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Inf0Byt3 »

Thanks for having a look at it. I'll try to implement the suggestions in the next version of the program :D.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
Vera
Addict
Addict
Posts: 858
Joined: Tue Aug 11, 2009 1:56 pm
Location: Essen (Germany)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Vera »

Hello Inf0Byt3,

thanks for
- sharing this neat programm :)
- ++ allowing portable mode
- allowing personal languages*
- the save modus of not needing to delete dublicates or just saving a list

Here's what I came across so far:

- irritating that top-buttons are no buttons
- unresizable result pane is too tiny to check the list comfortably, free resizing would be fine

- very frustrating that you can't go back from result-pane to do the next scan, (especially if 0 results in 0 seconds found, so one of the settings is missing :lol: ) or change any setting...
- with restarting it would have been nice if the selected folders would have been remembered
- scan of filetypes cannot be unchecked, but it doesn't matter if there's no entry

tip: on Scan zones \ File types
it's more natural to have the positive (including) question first, before a neglecting one (excluding). I would switch these two rows.
- additional popups on scan-filetypes strings showing an example of the insert-pattern would enhance the 'intuitiv' usage (e.g.: bat,cad,chm,...)

- the next-button on options-pane should be labled 'scan' because one thinks you'll see the scan-pane first before submitting (found out later: only half-good idea, as it's dependent on menu design)

- it would be nice to see as well how many files are within one item. e.g.: +mypic.png (4.67kb / 7)
- export to text: total duplicates doesn't corespond to found results, it's much much higher
- idea: export selected as text (?)

- automatic selector: you can't insert an exact value (1.02) to the size string-input, as dots or commas aren't allowed
- match-option: contents: do you really mean the contents or is this an MD5-check (value) ?

* I had joy translating the program-guide to german and reached about 75% allready. I'll hand it over as soon it's in alpha stage ;)

greetings ~ Vera
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Inf0Byt3 »

Hi Vera,
- irritating that top-buttons are no buttons
- unresizable result pane is too tiny to check the list comfortably, free resizing would be fine
I will try to see if I can make it autoresize... The problem is with the other gadgets in the other pages but there has to be a solution for this.
- very frustrating that you can't go back from result-pane to do the next scan, (especially if 0 results in 0 seconds found, so one of the settings is missing :lol: ) or change any setting...
- with restarting it would have been nice if the selected folders would have been remembered
- scan of filetypes cannot be unchecked, but it doesn't matter if there's no entry

tip: on Scan zones \ File types
it's more natural to have the positive (including) question first, before a neglecting one (excluding). I would switch these two rows.
- additional popups on scan-filetypes strings showing an example of the insert-pattern would enhance the 'intuitiv' usage (e.g.: bat,cad,chm,...)
These are easy to add :D.
- export to text: total duplicates doesn't corespond to found results, it's much much higher
I've tested this, seems to work well. Note that the list contains all the files, including the original and all the clones (we can have more than 2).
- match-option: contents: do you really mean the contents or is this an MD5-check (value) ?
The match is done by contents, MD5 is not involved. Basically the matcher works like this:
-The scan is first performed at size level.
-The files that have the same size are hashed with CRC32.
-If the obtained hashes match, the files are compared at byte level.
This way should be 100% safe, false positives cannot occur.
* I had joy translating the program-guide to german and reached about 75% allready. I'll hand it over as soon it's in alpha stage ;)
That's very nice, thank you for taking the time to translate BDF :D.

I'll start implementing the suggestions for the next version.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
Vera
Addict
Addict
Posts: 858
Joined: Tue Aug 11, 2009 1:56 pm
Location: Essen (Germany)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Vera »

Hi,

thanks for the explanation about 'contents'. And sorry for your extra effort - I simply got carried away by the wordmeaning and totally forgetting to look for it in your rich help file.

There's another concern about 'Match dublicates by':
there are 6 items: 3 short/single words and 3 double words. In the german language two of the doubles don't fit in (and I haven't found alternativ words yet). As this might easily happen with other languages as well, I came to the idea to change the arrangement. The single-words in the first column, which might then even be narrowed a bit and allows more space to the second column. [even 'modification date' doesn't fit on my PC with a slightly encreased dpi]
total duplicates doesn't corespond to found results
... Note that the list contains all the files, including the original and all the clones (we can have more than 2).
It really took me an hour to get that straight what you're calculating and where the trouble arises: it lies within the expression. If you read the following, wouldn't you think both would mean the same ?
RESULT: Found XX duplicate files
EXPORT: Total duplicates found YY

But: Export doesn't resume all found duplicates but all processed files. The information about the amount of 'doubles' gets lost, so I would suggest to give both values in the export file like e.g.: 'found XX duplicates (result) in YY files (total)'

~~~~~~~~~~~~~~~
I found a word that can't be translated: 'at' (in the export file)
l46=File export date: $P
=> File export date: 12/22/10 at 21:55:42
~~~~~~~~~~~~~~~

While scanning, there's an information-gui that I can't really see, as I only check smaller regions (I don't want to stress my harddrive too much with each test): it pops up and vanishes too quick. (I can't check the translation! Even starting a big search and stoping won't keep it open.)
Independant from this, wouldn't it be of interest to have time enough to study the results on this pane and exit it manually when 'satisfied'?
In case this would be an uncomfortable feature to some, could it be made an option like: 'autoclose process-gui on check-finish' ?

So far for tonight - christmas is demanding a lot of time as it's creeping closer ;)

greetings ~ Vera
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Inf0Byt3 »

thanks for the explanation about 'contents'. And sorry for your extra effort - I simply got carried away by the wordmeaning and totally forgetting to look for it in your rich help file.
Truth is the label wasn't too helpful because it's too short. I have to find a way to make the labels a bit bigger.
There's another concern about 'Match dublicates by':
there are 6 items: 3 short/single words and 3 double words. In the german language two of the doubles don't fit in (and I haven't found alternativ words yet). As this might easily happen with other languages as well, I came to the idea to change the arrangement. The single-words in the first column, which might then even be narrowed a bit and allows more space to the second column. [even 'modification date' doesn't fit on my PC with a slightly encreased dpi]
I was afraid this would happen. Will try to make space for the items.
It really took me an hour to get that straight what you're calculating and where the trouble arises: it lies within the expression. If you read the following, wouldn't you think both would mean the same ?
RESULT: Found XX duplicate files
EXPORT: Total duplicates found YY
But: Export doesn't resume all found duplicates but all processed files. The information about the amount of 'doubles' gets lost, so I would suggest to give both values in the export file like e.g.: 'found XX duplicates (result) in YY files (total)'
I see, you're definitely right. The expressions are ambiguous and don't refer to the same thing.
I found a word that can't be translated: 'at' (in the export file)
l46=File export date: $P
=> File export date: 12/22/10 at 21:55:42
Nice find, it will be fixed.
While scanning, there's an information-gui that I can't really see, as I only check smaller regions (I don't want to stress my harddrive too much with each test): it pops up and vanishes too quick. (I can't check the translation! Even starting a big search and stoping won't keep it open.)
Independant from this, wouldn't it be of interest to have time enough to study the results on this pane and exit it manually when 'satisfied'?
In case this would be an uncomfortable feature to some, could it be made an option like: 'autoclose process-gui on check-finish' ?
I'll see what i can do about this... Maybe make it dependent on the next button and add the option you mentioned.
So far for tonight - christmas is demanding a lot of time as it's creeping closer ;)
Hehe it's indeed close. Thank you very much for all your time, it's much appreciated.
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
Vera
Addict
Addict
Posts: 858
Joined: Tue Aug 11, 2009 1:56 pm
Location: Essen (Germany)

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Vera »

Hello,

as announced above, I translated the programinterface to german and also the help file as well: BDF_de_chm.zip

Updated language files: BDF_1.1_de_chm.zip

It would be nice if some german members could have a look at it and let me know if it's suitable or if things should be expressed differently.

greetings ~ Vera
Last edited by Vera on Fri Jan 07, 2011 2:20 pm, edited 3 times in total.
rsts
Addict
Addict
Posts: 2736
Joined: Wed Aug 24, 2005 8:39 am
Location: Southwest OH - USA

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by rsts »

Very nice Mr. Inf0Byt3.

Now I can clean up my music folders :)

cheers
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Inf0Byt3 »

Vera wrote:Hello,

as announced above, I translated the programinterface to german and also the help file as well: BDF_de_chm.zip
Thank you Vera, I'll have it uploaded to the site as soon as I can.
rsts wrote: Very nice Mr. Inf0Byt3.

Now I can clean up my music folders :)

cheers
Thanks rsts, I'm glad it's useful :).
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
Inf0Byt3
PureBasic Fanatic
PureBasic Fanatic
Posts: 2236
Joined: Fri Dec 09, 2005 12:15 pm
Location: Elbonia

Re: Bytessence DuplicateFinder (v1.1, 05Jan2010)

Post by Inf0Byt3 »

Version 1.1 of BDF is now ready. Thanks to Vera for all the suggestions and the testing.

Changes in this version:
-Added a 'New Scan' button on the results page
-Added tooltips to some controls
-Added the possibility for the language files to open their own CHM help files
-Fixed a hardcoded translation label
-Fixed the translation manager not verifying the files before loading them
-Fixed the portable mode not using the default configuration values
-Moved the controls on the results page to make more space for the duplicates list
-Changed some ambiguous translation strings
-The duplicate group headers now also show the number of files they contain
-The statistics page will remain on the screen after the scan ends
-Both the scan and excluded files/folders are now remembered
-Made minor interface changes so translations can fit better
-Fixed other minor bugs
None are more hopelessly enslaved than those who falsely believe they are free. (Goethe)
User avatar
Vera
Addict
Addict
Posts: 858
Joined: Tue Aug 11, 2009 1:56 pm
Location: Essen (Germany)

Re: Bytessence DuplicateFinder (v1.1, 05Jan2010)

Post by Vera »

Hello Inf0Byt3,

thanks for all these enhancements :)

especially 're-scanning' is very helpful (though I would rather jump to page 2) and the saving of presetted paths.
Also moving the controls is a good idea to give more space to the listview, but resizing will still be needed on the long run to comfortly cross-check, if all selections are proper.
... more via PM.

Now both german language files (ini & chm) are ready for use - download above ~ enjoy.

cheers ~ Vera
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Re: Bytessence DuplicateFinder (v1.0, 20Dec2010)

Post by Rescator »

Inf0Byt3 wrote:Basically the matcher works like this:
-The scan is first performed at size level.
-The files that have the same size are hashed with CRC32.
-If the obtained hashes match, the files are compared at byte level.
This way should be 100% safe, false positives cannot occur.
Here's a tip!

If you are able to detect if the two files to be compared are on different devices. (two different HD's or DVD and HD etc.)
Then you could just do a byte compare as that should be slightly faster than doing CRC32 + bytecompare, as CRC32 reads every byte anyway.

However if the files are on the same device then doing CRC32 is faster as you do sequential read of one file then sequential of the other obviously.

Another tip is that you could also do CRC32 using two threads and use large filebuffers.
Post Reply