PDA

View Full Version : Duplicate Files


OFSO
6th May 2013, 18:14
I'm sure I'm nbot the only one with duplicate files - mostly .jpg - stored here and there on my XP-PC. Can anyone recommend any good software to search* for such duplicates ?

* and ask DELETE THIS DUPLICATE FILE ?

mixture
6th May 2013, 20:30
OSFO,

Depends on how you define searching for duplicates.

Personally, I would look for software that does a crypto hash on files and presents you duplicates based on that.

Software that just goes by file names is asking for trouble.

Saab Dastard
6th May 2013, 22:48
Agree with mixture re: filename only.

Also you need to see date & time of files and file sizes to make any sort of meaningful comparison. Even then you will often have to open the different versions to decide which to keep - or possibly keep both.

And don't touch system files, there's duplicates for a reason! ;)

SD

Mac the Knife
7th May 2013, 18:07
VisiPics is good (and free) - VisiPics (http://www.visipics.info/index.php?title=Main_Page)

"VisiPics does more than just look for identical files, it goes beyond checksums to look for similar pictures and does it all with a simple user interface. First, you select the root folder or folders to find and catalogue all of your pictures. It then applies five image comparison filters in order to measure how close pairs of images on the hard drive are."

"Visipics....will detect two different resolution files of the same picture as a duplicate, or the same picture saved in different formats, or duplicates where only minor cosmetic changes have taken place"

"All detected duplicates are shown side by side with pertinent information such as file name, type and size being displayed. Its auto-select mode let you choose if you want to keep the higher resolution picture, space-saving filetype, smaller filesize or all of the above."

Another good one is dupeGuru Picture Edition - dupeGuru Picture Edition - JPG, PNG, TIFF, GIF, BMP duplicate scanner (http://www.hardcoded.net/dupeguru_pe/)

Again, it compares the actual images rather than CRCs - depending on how "hard" you set the filter it will find anything from vague similarities to exact matches (independent of file format).

If you find them useful then donate a few $ (I did) to keep free/open software going.

Mac

:ok:

mixture
7th May 2013, 19:28
Mac the Knife

As the various image search engines have demonstrated, anyone trying to tell you their software will find similar images is just talking through their backside.

The only thing you can do accurately is detect identical images, and you do that by using proper crypto hashes with sufficiently low collision rate such as SHA1. Not CRC as you are advocating.

Loose rivets
7th May 2013, 19:35
Gosh, while looking at this link I reached this:

Download.com wraps downloads in bloatware, lies about motivations | ExtremeTech (http://www.extremetech.com/computing/93504-download-com-wraps-downloads-in-bloatware-lies-about-motivations)

Mac the Knife
7th May 2013, 20:03
Gee mixture, I'm confused.

First of all I've never advocated using CRC for finding either identical or similar images.

Secondly, despite your cries of incredulity, the two apps I suggested certainly DO find similar images, certainly on my PCs and in the case of dupeGuru, on my Macs as well.

I, and the other users of these apps, are evidently insane to suggest such a thing (and suffering from aniloquy).

Why not confound yourself and try 'em?

Mac

:cool:

FullOppositeRudder
8th May 2013, 00:07
Seems to have a fairly good writeup:

Auslogics Duplicate File Finder 2.5.0.0 free download - Downloads - freeware, shareware, software trials, evaluations - PC & Tech Authority Downloads (http://downloads.pcauthority.com.au/article/22823-auslogics_duplicate_file_finder)

I have used it here occasionally with success.

F
O
R

Mac the Knife
8th May 2013, 22:41
I was a bit taken aback by 'ol mixture's rebuttal so I loaded up an old image directory that I know has a lot of dupes and semi-dupes (images that have been resized, edited, or saved in a different format).

Some of my dupes are deliberate, in that they're copied to more than one folder for reference - yes, I know its wasteful, but I'm not short of disk space.

On the Basic setting (as opposed to Strict or Loose) VisiPics looked at 34953 images in 31 minutes and found 11834 "duplicates" - on review, all are visually similar across a range of file formats - either real exact dupes or edits.

The results are easy to see as they are compared visually and you can choose which ones to move, ignore or discard.

"Autoselect" could have picked for me Uncompressed filetype a/o Lower resolution a/o Smaller filesize for deletion to the Recycle Bin - careful there! There is no option for creating links.

On the same folder dupeGuru is took more than twice as long to find rather more & more accurately - it seems to be using a different technique for matching images and it verifies results. Results are more finegrained and you can see the delta better - dupeGuru is more "tweakable", can use regexes, and can create symlinks or hardlinks as well as copy or move dupes to a new directory.

Both allow you to find similar images across a single or multiple folders and operate on the results visually in a reasonable GUI.

These are sharp tools - don't cut yourself!

What mix says.

"As the various image search engines have demonstrated, anyone trying to tell you their software will find similar images is just talking through their backside.

The only thing you can do accurately is detect identical images, and you do that by using proper crypto hashes with sufficiently low collision rate such as SHA1."

is wrong.

Mac



:cool:

seanbean
8th May 2013, 23:31
Have a look here - take your pick: Best Free Duplicate File Detector | Gizmo's Freeware Reviews (http://www.techsupportalert.com/best-free-duplicate-file-detector.htm)

mixture
9th May 2013, 06:59
What mix says.

"As the various image search engines have demonstrated, anyone trying to tell you their software will find similar images is just talking through their backside.

The only thing you can do accurately is detect identical images, and you do that by using proper crypto hashes with sufficiently low collision rate such as SHA1."

is wrong.



I don't have much spare time, so I can't test at the moment, but will try to find a few minutes this weekend.

In the interim, pending that, I will temporarily withdraw the first statement above. But my second statement remains accurate.

Crypto hashes with low collision rates are a proven guaranteed way to detect identical files. If you want the belt and braces approach you can also build a hash tree and compare that too.

Mac the Knife
9th May 2013, 21:33
"Crypto hashes with low collision rates are a proven guaranteed way to detect identical files. If you want the belt and braces approach you can also build a hash tree and compare that too."

Absolutely. Won't argue with that.

Mac

:cool:

Albert Square
19th May 2013, 15:44
I found Awesome Duplicate File Finder (Find it on Google, Free) to be useful. My wife kept making copies of Pics to send for printing. This Prog will show duplicate images side by side so you can decide which to delete. Also shows the percentage similarity so you can also delete unwanted pics when, say, you have taken two shots.
AS

Mac the Knife
19th May 2013, 17:07
Good find Albert!

ADFF (in spite of its silly name) is quick, with a simple interface.

On my fairly average system it scanned 3334 mixed image-files in 00:02:15 and found 237 similar files. 217 were pretty similar (>20% similarity) and 20 were not (<20% similarity).

Cons:

Its not very tweakable and there is little ability to automate moves or deletes.
But still a useful addition to the toolkit.

mixture is very evidently mistaken but he's keeping schtum....

Mac

:cool:

mixture
19th May 2013, 21:24
mixture is very evidently mistaken but he's keeping schtum....


Some of us lead lives in the real world outside of PPRuNe. End of.

Mac the Knife
19th May 2013, 22:19
We all do.

Some of us are able to acknowlege having been mistaken and some of us can't.

Hey ho!

lomapaseo
20th May 2013, 00:03
My duplicates Jpgs are truly duplicate files (similar name but identical size)

I would just search on size and cull by date. Course i Have about 20,000 JPGS so it might take some time. My solution is to just use a ext hard drive

mixture
20th May 2013, 06:47
Some of us are able to acknowlege having been mistaken and some of us can't.


Jeez Mac.... do I have to spell it out to you ....

I HAVE NOT HAD TIME TO TEST ANYTHING


Got it ?

Feline
20th May 2013, 08:15
I'm sure as hell not getting in between mac and mixture - that could easily become a gory and/or explosive experience!

For my money: Digital Volcano : Duplicate Cleaner Version 2.1.0

Works for me - and it's free!

And you can compare on same content, same file name. same create date, and/or same modified date ...