Here is my question for OP. It seems like the image shrinking step coupled with ...

razius · on April 4, 2014

First as a reply to the false positive problem: https://news.ycombinator.com/item?id=7527982

Idealy we would like to move to a content-based image retrieval system where we would be able to search based on certain features that can be derived from the image itself (color, shape, texture for example) so we could fine tune our results.

Yes, the presented example is a curious case, if we take the first three icons there and compare them based on share and color we can see that their shape is identical but the background is different. Based on this, should we consider different or identical? You can't have too many variations on a simple shape like a checkmark or a facebook logo so what variations should you allow and which ones would you consider as copying previous work?

fpgaminer · on April 3, 2014

dHash seems like a fast algorithm. So, one could search for potential collisions quickly with dHash, and then run a more complex and expensive algorithm on those matches to refine the results.

kirkbackus · on April 3, 2014

While this algorithm could suffer from a large false positive problem, that issue could also work to his advantage when implemented as a "find similar images to this", which was addressed later in the article.

sushirain · on April 3, 2014

I am more concerned here with false negatives due to cropping, resizing, rotating, etc.