Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a fan of this blog for longer than I can remember, it's refreshing to hear this particular author's take on this issue, especially considering their background.

I'm glad these issues were addressed in a much more elegant way than I would have put them:

> Apple's technical whitepaper is overly technical -- and yet doesn't give enough information for someone to confirm the implementation. (I cover this type of paper in my blog entry, "Oh Baby, Talk Technical To Me" under "Over-Talk".) In effect, it is a proof by cumbersome notation. This plays to a common fallacy: if it looks really technical, then it must be really good. Similarly, one of Apple's reviewers wrote an entire paper full of mathematical symbols and complex variables. (But the paper looks impressive. Remember kids: a mathematical proof is not the same as a code review.)

> Apple claims that there is a "one in one trillion chance per year of incorrectly flagging a given account". I'm calling bullshit on this.



Reading carefully through the paper, an important part of their calculation for the "one in a trillion" claim seems to rest on the cryptographic threshold approach they are using. In particular, it seems likely to me that the number matches required for your account to be flagged is relatively high (perhaps a dozen). If that is the case, their hash collision likelihood could be "only" 1 in a million, but it would still be vanishingly unlikely for a typical iCloud user to get a dozen false positives. 1e-6 is _much_ more testable than 1e-12 for the perceptual hashing, and the cryptographic parts of the secret sharing are easy to analyze mathematically.

As a disclaimer, I haven't done the actual math here. This also implies that the risk of your account getting flagged falsely is tightly related to how many images you upload.


You're assuming that perceptual hashes are uniformly distributed, but that's not the case. If I post a picture of my kid at the beach I'm far, far more likely to generate perceptual hashes closer to the threshold. Not to mention intimate photos of/with my partner.


Good point about the possibility of capturing a bunch of distinct photos with the same perceptual hash, either by taking a burst of photos or by editing one photo a bunch of times. I guess a better implementation would never upload two different encryption keys for the same perceptual hash and just send dummy data instead, but I haven't seen any indication that they actually do that.


yep. what if i take a burst of 12 photos that all incorrectly fall as a false positive to NeuralHash (which is a ML black box), and an Apple reviewer is now invading my privacy by looking at my photo library?


The the technical paper Apple put out that is linked to in the post talks about the risk, but isn’t very helpful

“Several solutions to this were considered, but ultimately, this issue is addressed by a mechanism outside of the cryptographic protocol.”


Not acceptable for a technology being deployed to hundreds of millions of people.


There was a huge brawl of sorts about "a mathematical proof is not the same as a code review" between Neal Koblitz, Alfred Menezes, etc on the one hand and theoretical crypto community on the other hand wrt "provable security". Here is a site: http://anotherlook.ca/


Regarding that rate, I’m no expert but my guess is that it’s the result of math, not actual testing of 1+ trillion images. This sounds like calling bullshit on “You have one trillion chance to win the lottery.”


The author addresses this point:

> Perhaps Apple is basing their "1 in 1 trillion" estimate on the number of bits in their hash? With cryptographic hashes (MD5, SHA1, etc.), we can use the number of bits to identify the likelihood of a collision. If the odds are "1 in 1 trillion", then it means the algorithm has about 40 bits for the hash. However, counting the bit size for a hash does not work with perceptual hashes.

> With perceptual hashes, the real question is how often do those specific attributes appear in a photo. This isn't the same as looking at the number of bits in the hash. (Two different pictures of cars will have different perceptual hashes. Two different pictures of similar dogs taken at similar angles will have similar hashes. And two different pictures of white walls will be almost identical.)

> With AI-driven perceptual hashes, including algorithms like Apple's NeuralHash, you don't even know the attributes so you cannot directly test the likelihood. The only real solution is to test by passing through a large number of visually different images. But as I mentioned, I don't think Apple has access to 1 trillion pictures.

> What is the real error rate? We don't know. Apple doesn't seem to know. And since they don't know, they appear to have just thrown out a really big number. As far as I can tell, Apple's claim of "1 in 1 trillion" is a baseless estimate. In this regard, Apple has provided misleading support for their algorithm and misleading accuracy rates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: