![]() |
On the topic of duplicate files.....
Mr Mac sent me a PM chasing me for a follow-up.
Evidently he is still stewing in his own juices about my comments on a historical thread (here). Background Mr OSFO said he was looking for software that would search for duplicate images on his computer and highlight these for deletion. Scope Mr OSFO specifically used the word duplicate. I'm rather partial any of the definitions provided by the Oxford Dictionary of the word duplicate, for example make or be an exact copy of. It is widespread practice in the IT industry to use cryptographic hashes to identify duplicate files. There is no debate that using cryptographic hashes is the fastest and most efficient way to do this. Therefore using "fuzzy logic" or whatever when the goal is to identify DUPLICATE files is a waste of time, takes longer to process and is generally a wheel that really does not need to be reinvented. Mac then came back suggesting his tools were useful for detecting similar files. I expressed some doubt in that statement. Given that organisations such as Google with deep pockets and large development teams are unable to build an algorithm that actually works at detecting similar files, there is good reason to doubt the ability of a one-man-band shareware/freeware peddler to achieve this. Evaluation However, Mac chased me.... so I picked four images, two pairs of visually similar images (see below), and downloaded Mac's toys. I've done a fair amount of beta testing and product evaluation work on stuff more complex than this, so I know how to challenge a piece of software. :E I deliberately picked challenging images. Both images were taken on a tripod at the exact same location and within a few minutes of each other. The only thing that changed slightly was zoom or exposure. Obviously for my tests I used the real high-res images without the watermark etc. (1) Dupeguru In its default setting, Dupeguru picked up no similar files. I turned down the detection settings to the loosest settings. Dupeguru correctly identified the forest scenes as being similar (26%), but failed to identify the coastal scene as being similar. An unscientific test on a small number of samples, but I would suggest Dupeguru is probably not to be relied upon as a surefire way to identify similar files, however as a first stage tool to filter out some similar files before resorting to manual review, I can see it would have its uses. I would give it 2.5/5. (2) Visipics No matter how loose I made the settings, Visipics came up with no matches for similar files. 0/5. Happy now Mac ? :E http://s7.postimg.org/5ncegapzf/Filtest2.jpg |
Aie mix!
Poor stupid old Mac here (a one-man-band alas), whom life has surprisingly succeeded in teaching the difference between between duplicate and similar. I really think you should let Google know that their reverse image search doesn't work - https://support.google.com/images/answer/1325808?hl=en As for your lack of success with VisiPics and dupeGuru Picture Edition (not dupeGuru tout court); all that I can say is that happily, your disappointing experience is not mine (nor that of most users). These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found. Anyway, I promise not to "chase" you anymore, and am able to reassure you that the only juices that I am currently stewing in is an excellent glass of Cederberg David Nieuwoudt Ghost Corner Semillon of 2011 which is making me very happy. Mac :ok: |
These scorned "toys" (sniff!), while imperfect, sure beat going through a bunch of large folders by eye, as OFSO may have found. David Nieuwoudt Ghost Corner Semillon I promise not to "chase" you anymore But thank you for the prompt, however I had not forgotten, was keeping it as a rainy day thing to keep me occupied on a long train journey. |
| All times are GMT. The time now is 11:32. |
Copyright © 2026 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.