PDA

View Full Version : On the topic of duplicate files.....


mixture
14th Jun 2013, 09:42
Mr Mac sent me a PM chasing me for a follow-up.

Evidently he is still stewing in his own juices about my comments on a historical thread (here (www.pprune.org/archive/index.php/t-514180.html)).

Background

Mr OSFO said he was looking for software that would search for duplicate images on his computer and highlight these for deletion.

Scope

Mr OSFO specifically used the word duplicate.

I'm rather partial any of the definitions provided by the Oxford Dictionary of the word duplicate, for example make or be an exact copy of.

It is widespread practice in the IT industry to use cryptographic hashes to identify duplicate files. There is no debate that using cryptographic hashes is the fastest and most efficient way to do this. Therefore using "fuzzy logic" or whatever when the goal is to identify DUPLICATE files is a waste of time, takes longer to process and is generally a wheel that really does not need to be reinvented.

Mac then came back suggesting his tools were useful for detecting similar files.

I expressed some doubt in that statement. Given that organisations such as Google with deep pockets and large development teams are unable to build an algorithm that actually works at detecting similar files, there is good reason to doubt the ability of a one-man-band shareware/freeware peddler to achieve this.

Evaluation

However, Mac chased me.... so I picked four images, two pairs of visually similar images (see below), and downloaded Mac's toys.

I've done a fair amount of beta testing and product evaluation work on stuff more complex than this, so I know how to challenge a piece of software. :E

I deliberately picked challenging images. Both images were taken on a tripod at the exact same location and within a few minutes of each other. The only thing that changed slightly was zoom or exposure.

Obviously for my tests I used the real high-res images without the watermark etc.

(1) Dupeguru

In its default setting, Dupeguru picked up no similar files.

I turned down the detection settings to the loosest settings. Dupeguru correctly identified the forest scenes as being similar (26%), but failed to identify the coastal scene as being similar.

An unscientific test on a small number of samples, but I would suggest Dupeguru is probably not to be relied upon as a surefire way to identify similar files, however as a first stage tool to filter out some similar files before resorting to manual review, I can see it would have its uses.

I would give it 2.5/5.

(2) Visipics

No matter how loose I made the settings, Visipics came up with no matches for similar files.

0/5.



Happy now Mac ? :E



http://s7.postimg.org/5ncegapzf/Filtest2.jpg

Mac the Knife
14th Jun 2013, 19:30
Aie mix!

Poor stupid old Mac here (a one-man-band alas), whom life has surprisingly succeeded in teaching the difference between between duplicate and similar.

I really think you should let Google know that their reverse image search doesn't work
- https://support.google.com/images/answer/1325808?hl=en

As for your lack of success with VisiPics and dupeGuru Picture Edition (not dupeGuru tout court); all that I can say is that happily, your disappointing experience is not mine (nor that of most users).

These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found.

Anyway, I promise not to "chase" you anymore, and am able to reassure you that the only juices that I am currently stewing in is an excellent glass of Cederberg David Nieuwoudt Ghost Corner Semillon of 2011 which is making me very happy.

Mac

:ok:

mixture
14th Jun 2013, 20:47
These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found.

Although had OSFO (or indeed, your good self) meta tagged their images in the first place upon import, such issues might not arise in the first place ?

David Nieuwoudt Ghost Corner Semillon

Never heard of it.... "Belondrade y Lutron" however, was a recent discovery !

I promise not to "chase" you anymore

Only people permitted to chase me are members of the opposite sex for reasons better suited to Jet Blast.... :E

But thank you for the prompt, however I had not forgotten, was keeping it as a rainy day thing to keep me occupied on a long train journey.