Go Back  PPRuNe Forums > Misc. Forums > Computer/Internet Issues & Troubleshooting
Reload this Page >

On the topic of duplicate files.....

Wikiposts
Search

Notices
Computer/Internet Issues & Troubleshooting Anyone with questions about the terribly complex world of computers or the internet should try here. NOT FOR REPORTING ISSUES WITH PPRuNe FORUMS! Please use the subforum "PPRuNe Problems or Queries."

On the topic of duplicate files.....

Thread Tools
 
Search this Thread
 
Old 14th June 2013 | 09:42
  #1 (permalink)  
Thread Starter
 
Joined: Aug 2002
Posts: 3,663
Likes: 0
From: Earth
On the topic of duplicate files.....

Mr Mac sent me a PM chasing me for a follow-up.

Evidently he is still stewing in his own juices about my comments on a historical thread (here).

Background

Mr OSFO said he was looking for software that would search for duplicate images on his computer and highlight these for deletion.

Scope

Mr OSFO specifically used the word duplicate.

I'm rather partial any of the definitions provided by the Oxford Dictionary of the word duplicate, for example make or be an exact copy of.

It is widespread practice in the IT industry to use cryptographic hashes to identify duplicate files. There is no debate that using cryptographic hashes is the fastest and most efficient way to do this. Therefore using "fuzzy logic" or whatever when the goal is to identify DUPLICATE files is a waste of time, takes longer to process and is generally a wheel that really does not need to be reinvented.

Mac then came back suggesting his tools were useful for detecting similar files.

I expressed some doubt in that statement. Given that organisations such as Google with deep pockets and large development teams are unable to build an algorithm that actually works at detecting similar files, there is good reason to doubt the ability of a one-man-band shareware/freeware peddler to achieve this.

Evaluation

However, Mac chased me.... so I picked four images, two pairs of visually similar images (see below), and downloaded Mac's toys.

I've done a fair amount of beta testing and product evaluation work on stuff more complex than this, so I know how to challenge a piece of software.

I deliberately picked challenging images. Both images were taken on a tripod at the exact same location and within a few minutes of each other. The only thing that changed slightly was zoom or exposure.

Obviously for my tests I used the real high-res images without the watermark etc.

(1) Dupeguru

In its default setting, Dupeguru picked up no similar files.

I turned down the detection settings to the loosest settings. Dupeguru correctly identified the forest scenes as being similar (26%), but failed to identify the coastal scene as being similar.

An unscientific test on a small number of samples, but I would suggest Dupeguru is probably not to be relied upon as a surefire way to identify similar files, however as a first stage tool to filter out some similar files before resorting to manual review, I can see it would have its uses.

I would give it 2.5/5.

(2) Visipics

No matter how loose I made the settings, Visipics came up with no matches for similar files.

0/5.



Happy now Mac ?




Last edited by mixture; 14th June 2013 at 09:45.
mixture is offline  
Reply
Old 14th June 2013 | 19:30
  #2 (permalink)  

Plastic PPRuNer
25 Anniversary
 
Joined: Sep 2000
Posts: 1,902
Likes: 0
From: Rochechouart, France
Aie mix!

Poor stupid old Mac here (a one-man-band alas), whom life has surprisingly succeeded in teaching the difference between between duplicate and similar.

I really think you should let Google know that their reverse image search doesn't work
- https://support.google.com/images/answer/1325808?hl=en

As for your lack of success with VisiPics and dupeGuru Picture Edition (not dupeGuru tout court); all that I can say is that happily, your disappointing experience is not mine (nor that of most users).

These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found.

Anyway, I promise not to "chase" you anymore, and am able to reassure you that the only juices that I am currently stewing in is an excellent glass of Cederberg David Nieuwoudt Ghost Corner Semillon of 2011 which is making me very happy.

Mac

Mac the Knife is offline  
Reply
Old 14th June 2013 | 20:47
  #3 (permalink)  
Thread Starter
 
Joined: Aug 2002
Posts: 3,663
Likes: 0
From: Earth
These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found.
Although had OSFO (or indeed, your good self) meta tagged their images in the first place upon import, such issues might not arise in the first place ?

David Nieuwoudt Ghost Corner Semillon
Never heard of it.... "Belondrade y Lutron" however, was a recent discovery !

I promise not to "chase" you anymore
Only people permitted to chase me are members of the opposite sex for reasons better suited to Jet Blast....

But thank you for the prompt, however I had not forgotten, was keeping it as a rainy day thing to keep me occupied on a long train journey.

Last edited by mixture; 14th June 2013 at 20:47.
mixture is offline  
Reply

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2026 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.