As a fan of this blog for longer than I can remember, it's refreshing to hear th...

HALtheWise · on Aug 9, 2021

Reading carefully through the paper, an important part of their calculation for the "one in a trillion" claim seems to rest on the cryptographic threshold approach they are using. In particular, it seems likely to me that the number matches required for your account to be flagged is relatively high (perhaps a dozen). If that is the case, their hash collision likelihood could be "only" 1 in a million, but it would still be vanishingly unlikely for a typical iCloud user to get a dozen false positives. 1e-6 is _much_ more testable than 1e-12 for the perceptual hashing, and the cryptographic parts of the secret sharing are easy to analyze mathematically.

As a disclaimer, I haven't done the actual math here. This also implies that the risk of your account getting flagged falsely is tightly related to how many images you upload.

mdoms · on Aug 9, 2021

You're assuming that perceptual hashes are uniformly distributed, but that's not the case. If I post a picture of my kid at the beach I'm far, far more likely to generate perceptual hashes closer to the threshold. Not to mention intimate photos of/with my partner.

HALtheWise · on Aug 9, 2021

Good point about the possibility of capturing a bunch of distinct photos with the same perceptual hash, either by taking a burst of photos or by editing one photo a bunch of times. I guess a better implementation would never upload two different encryption keys for the same perceptual hash and just send dummy data instead, but I haven't seen any indication that they actually do that.

dannyw · on Aug 9, 2021

yep. what if i take a burst of 12 photos that all incorrectly fall as a false positive to NeuralHash (which is a ML black box), and an Apple reviewer is now invading my privacy by looking at my photo library?

ec109685 · on Aug 9, 2021

The the technical paper Apple put out that is linked to in the post talks about the risk, but isn’t very helpful

“Several solutions to this were considered, but ultimately, this issue is addressed by a mechanism outside of the cryptographic protocol.”

dannyw · on Aug 9, 2021

Not acceptable for a technology being deployed to hundreds of millions of people.

raincom · on Aug 9, 2021

There was a huge brawl of sorts about "a mathematical proof is not the same as a code review" between Neal Koblitz, Alfred Menezes, etc on the one hand and theoretical crypto community on the other hand wrt "provable security". Here is a site: http://anotherlook.ca/

q-rews · on Aug 9, 2021

Regarding that rate, I’m no expert but my guess is that it’s the result of math, not actual testing of 1+ trillion images. This sounds like calling bullshit on “You have one trillion chance to win the lottery.”

heavyset_go · on Aug 9, 2021

The author addresses this point:

> Perhaps Apple is basing their "1 in 1 trillion" estimate on the number of bits in their hash? With cryptographic hashes (MD5, SHA1, etc.), we can use the number of bits to identify the likelihood of a collision. If the odds are "1 in 1 trillion", then it means the algorithm has about 40 bits for the hash. However, counting the bit size for a hash does not work with perceptual hashes.

> With perceptual hashes, the real question is how often do those specific attributes appear in a photo. This isn't the same as looking at the number of bits in the hash. (Two different pictures of cars will have different perceptual hashes. Two different pictures of similar dogs taken at similar angles will have similar hashes. And two different pictures of white walls will be almost identical.)

> With AI-driven perceptual hashes, including algorithms like Apple's NeuralHash, you don't even know the attributes so you cannot directly test the likelihood. The only real solution is to test by passing through a large number of visually different images. But as I mentioned, I don't think Apple has access to 1 trillion pictures.

> What is the real error rate? We don't know. Apple doesn't seem to know. And since they don't know, they appear to have just thrown out a really big number. As far as I can tell, Apple's claim of "1 in 1 trillion" is a baseless estimate. In this regard, Apple has provided misleading support for their algorithm and misleading accuracy rates.