so you have a before and after image, say a webpage mockup thats where the logo moved from top left to top right.
the blob finder finds the interesting pieces of each version, and the perceptual hash picks which blobs match each other, and the software can say with reasonable certainty that the top left part of the image was moved to top right.