mixture
14th Jun 2013, 09:42
Mr Mac sent me a PM chasing me for a follow-up.
Evidently he is still stewing in his own juices about my comments on a historical thread (here (www.pprune.org/archive/index.php/t-514180.html)).
Background
Mr OSFO said he was looking for software that would search for duplicate images on his computer and highlight these for deletion.
Scope
Mr OSFO specifically used the word duplicate.
I'm rather partial any of the definitions provided by the Oxford Dictionary of the word duplicate, for example make or be an exact copy of.
It is widespread practice in the IT industry to use cryptographic hashes to identify duplicate files. There is no debate that using cryptographic hashes is the fastest and most efficient way to do this. Therefore using "fuzzy logic" or whatever when the goal is to identify DUPLICATE files is a waste of time, takes longer to process and is generally a wheel that really does not need to be reinvented.
Mac then came back suggesting his tools were useful for detecting similar files.
I expressed some doubt in that statement. Given that organisations such as Google with deep pockets and large development teams are unable to build an algorithm that actually works at detecting similar files, there is good reason to doubt the ability of a one-man-band shareware/freeware peddler to achieve this.
Evaluation
However, Mac chased me.... so I picked four images, two pairs of visually similar images (see below), and downloaded Mac's toys.
I've done a fair amount of beta testing and product evaluation work on stuff more complex than this, so I know how to challenge a piece of software. :E
I deliberately picked challenging images. Both images were taken on a tripod at the exact same location and within a few minutes of each other. The only thing that changed slightly was zoom or exposure.
Obviously for my tests I used the real high-res images without the watermark etc.
(1) Dupeguru
In its default setting, Dupeguru picked up no similar files.
I turned down the detection settings to the loosest settings. Dupeguru correctly identified the forest scenes as being similar (26%), but failed to identify the coastal scene as being similar.
An unscientific test on a small number of samples, but I would suggest Dupeguru is probably not to be relied upon as a surefire way to identify similar files, however as a first stage tool to filter out some similar files before resorting to manual review, I can see it would have its uses.
I would give it 2.5/5.
(2) Visipics
No matter how loose I made the settings, Visipics came up with no matches for similar files.
0/5.
Happy now Mac ? :E
http://s7.postimg.org/5ncegapzf/Filtest2.jpg
Evidently he is still stewing in his own juices about my comments on a historical thread (here (www.pprune.org/archive/index.php/t-514180.html)).
Background
Mr OSFO said he was looking for software that would search for duplicate images on his computer and highlight these for deletion.
Scope
Mr OSFO specifically used the word duplicate.
I'm rather partial any of the definitions provided by the Oxford Dictionary of the word duplicate, for example make or be an exact copy of.
It is widespread practice in the IT industry to use cryptographic hashes to identify duplicate files. There is no debate that using cryptographic hashes is the fastest and most efficient way to do this. Therefore using "fuzzy logic" or whatever when the goal is to identify DUPLICATE files is a waste of time, takes longer to process and is generally a wheel that really does not need to be reinvented.
Mac then came back suggesting his tools were useful for detecting similar files.
I expressed some doubt in that statement. Given that organisations such as Google with deep pockets and large development teams are unable to build an algorithm that actually works at detecting similar files, there is good reason to doubt the ability of a one-man-band shareware/freeware peddler to achieve this.
Evaluation
However, Mac chased me.... so I picked four images, two pairs of visually similar images (see below), and downloaded Mac's toys.
I've done a fair amount of beta testing and product evaluation work on stuff more complex than this, so I know how to challenge a piece of software. :E
I deliberately picked challenging images. Both images were taken on a tripod at the exact same location and within a few minutes of each other. The only thing that changed slightly was zoom or exposure.
Obviously for my tests I used the real high-res images without the watermark etc.
(1) Dupeguru
In its default setting, Dupeguru picked up no similar files.
I turned down the detection settings to the loosest settings. Dupeguru correctly identified the forest scenes as being similar (26%), but failed to identify the coastal scene as being similar.
An unscientific test on a small number of samples, but I would suggest Dupeguru is probably not to be relied upon as a surefire way to identify similar files, however as a first stage tool to filter out some similar files before resorting to manual review, I can see it would have its uses.
I would give it 2.5/5.
(2) Visipics
No matter how loose I made the settings, Visipics came up with no matches for similar files.
0/5.
Happy now Mac ? :E
http://s7.postimg.org/5ncegapzf/Filtest2.jpg