Breaking the Zyzzyva encryption

Hello71 · on June 21, 2015

> The real hackers will know that as soon as I found evidence of sqlite3_key_v2 in the Zyzzyva dylib file that getting the key was inevitable. I don’t actually know the steps for removing debug symbols from compiled code off the top of my head, but I bet if this had been done, this would have made my job much, much harder.

I'm not entirely sure about OS X, but at least on Linux, system-assisted dynamic linking (i.e. not mmap(PROT_EXEC)) requires that all required symbols are exposed so that relocation can be done in the original executable; in other words, the OS needs to know where the functions in the library are so that it can tell the program how to call them.

Of course, you could obfuscate the function names, but then tracebacks wouldn't work properly and at that point you'd be better off just statically linking the whole program.

Debug symbols are completely different; if you have those, you can simply do "frame variables" which shows the args with names.

> Yesss. Time to get out the x86 assembly hats.

You don't even really need to do that. Since you know the function signature, you can assume (since it is in a separate library) that the function uses the standard System V AMD64 ABI where "the first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9" [0], meaning that the pKey pointer is probably in RDX. I know that the author said that it was in RAX, but since that is caller-saved, there must have been some copying or processing done to it inside the function.

[0] https://en.wikipedia.org/wiki/X86_calling_conventions#System...

jtchang · on June 21, 2015

I have a pet project trying to reverse engineer an android app if you are interested :)

miles · on June 20, 2015

From Who Owns Scrabble’s Word List?[1]:

"Dictionaries enjoy copyright protection for two main reasons: Their creators make judgments about what words to include, and entries feature definitions and other original material. (Just last week, a federal court in Massachusetts ruled against[2] a plaintiff who wanted to copy and repurpose the bulk of Merriam-Webster’s Collegiate, including definitions, for his own dictionary.) But in 1991, in Feist Publications Inc. v. Rural Telephone Service Co.[3], the Supreme Court decided that a phone company wasn’t entitled to a copyright on its white pages. That’s because the list of names and numbers lacked an important requirement: originality."

[1] http://www.slate.com/articles/life/gaming/2014/09/major_scra...

[2] http://www.scribd.com/doc/241384392/Richards-v-Webster

[3] http://scholar.google.com/scholar_case?case=1195336269698056...

leecb · on June 20, 2015

Isn't this a violation of the DMCA's anti-circumvention section? This seems to be explicitly describing how to circumvent protection measures for a copyrighted work.

https://www.law.cornell.edu/uscode/text/17/1201

DannyBee · on June 20, 2015

This assumes it's validly copyrighted. I wonder if the wordlist is even registered with the copyright office (I can't imagine it is, they are pretty good about not accepting stuff like this).

Additionally, to the degree that hasbro/whoever the heck claims a copyright on the work of other people, they are themselves violating various parts of the DMCA dealing with rights management info, etc.

Hasbro/whoever should know that it is not possible to effect a transfer of copyright without an explicit signed agreement. Thus, if all these people contributed, and then they slapped a copyright on it, they own exactly nothing.

(There is such a thing as a compilation copyright, but it it is a very minimalistic copyright, and assumes they actually did anything creative or original to the compiled list)

If someone was to press this point against the scrabble players, they would A. likely lose as the list will be considered non-copyrightable subject matter B. If the list was somehow found copyrightable, and this story is accurate, they would be opening themselves up to copyright infringement lawsuits from the scrabble players who contributed to the wordlist.

So they kinda lose either way.

skywhopper · on June 20, 2015

The fact that the DMCA could criminalize the act of inspecting the contents of an executeable file acquired legally and running on your personal computer and then telling other people about it is pretty good evidence that the DMCA is an immoral law that should be violated as much as possible. Kudos to the article's author.

userbinator · on June 21, 2015

Indeed, I think we should be doing everything we can to stop things from going further in the direction of the dystopia in Stallman's famous story: http://www.gnu.org/philosophy/right-to-read.en.html

black_puppydog · on June 21, 2015

It is sometimes surprising how accurate some of Stallman's dystopian visions were and it is frightening because some of them have not become true. Yet.

thothamon · on June 20, 2015

The article probably does violate that section of the DMCA. But it is also a research/scientific piece subject to the protection of the First Amendment. I could be mistaken, but I suspect if one or the other had to go -- the First Amendment or this law -- the First Amendment would win.

Said another way, even as a very pro-copyright judge, I would have a hard time saying the author did not have a First Amendment right to publish his research. Now if he wrote a program to make it easy to crack these databases and sold it for $5 each, that would be a different matter.

_3u10 · on June 20, 2015

Probably...

09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Perhaps if they want more people to know they can file a takedown request.

mukyu · on June 20, 2015

That would presume that the dictionary in question was a copyrightable work. The US has weak database copyright protection due to Feist. There is also Assessment Technologies to consider, but I don't believe that involved the DMCA.

madars · on June 20, 2015

It seems that Linux version has a version of the database as plain text (including their meanings); wc -l tells me that OSPD4.1.txt has 178378 entries, which seems about right. edit: seems like that's only true for v4 and below, while the article could be about v5 (it doesn't say)

Matt3o12_ · on June 20, 2015

How can you copyright a wordlist anyways? I'm really not familiar with scrabble but doesn't it just contain plain English words without any context? They could of course copyright the order but couldn't you just shuffle the list and publish it.

An explanation how copyright works in this case would be great.

Ded7xSEoPKYNsDd · on June 20, 2015

Some countries have a database right besides normal copyright. Copyright seems like a stretch, but I can understand why hobbyists don't want to take the risk of getting sued even if the law should be on their side.

https://en.wikipedia.org/wiki/Sui_generis_database_right

Matt3o12_ · on June 20, 2015

Thank you very much for the link but this sounds really stupid. Couldn't I just create a database with all posibile additions ranging from 1,000,000 to 10,000,000 and sue everyone who publishes a book/paper etc because of the "investment that is made in compiling a database" and it doesn't matter if they calculated the result themselves because they could've just used my database.

This is really stupid. I mean I get copyright but I think copyright should only apply to "the 'creative' aspect[s]" if the author wants that.

braythwayt · on June 21, 2015

We have had a similar discussion on Hacker News about a company that claims to be mechanically generating all possible arrangements of words of a certain length and copyrighting those, as well as all possible images of a certain size, and all possible musical melodies of a certain length, and so on.

The reason this is not considered copyrightable is that there must be some evidence of creative effort. Owning an infinite number of monkeys and typewriters does not entitle you to copyright everything they generate.

detaro · on June 21, 2015

No, you couldn't, because (at least for most of the implementations listed in the article) it DOES matter if they actually used your database or not. Also note database rights not necessarily are copyrights.

fit2rule · on June 20, 2015

Excellent work .. and of course, a salient reminder of why we all, individually, should copyright our own works, even if it is something done for free and/or for volunteer basis with no commercial interest. A right not exercised is one lost.

I think its preposterous that someone is able to trademark a word list. I bet its not even complete.

icebraining · on June 20, 2015

All works that can be copyrighted are automatically copyrighted from the moment of creation, unless you live in one of the dozen countries which haven't ratified neither the Berne Convention nor TRIPS. The US signed Berne in '89, by the way.

dublinben · on June 20, 2015

This is actually a better argument for copyleft licensing for any community effort like this. It would not have been possible for any company to lock down the efforts of this community if all contributions to the wordlist at question had been under a license like Creative Commons or the GPL.

nawitus · on June 20, 2015

What's your source for being able to trademark a word list?

zxc1234 · on June 20, 2015

Sick system. In other developed countries you have copyright if you do it. You don´t need to register that right anywhere. Of course more difficult to prove but thats another story...

CrystalGamma · on June 20, 2015

When I found out you had to register your stuff for copyright in certain countries, I was actually surprised ... I just thought having copyright for your work by default was normal.

thristian · on June 20, 2015

I believe it's normal in all countries that are signatories to the Berne Convention, which is... apparently ~160 of the ~190 UN member states.

llamaimperative · on June 20, 2015

In the US you don't have to register anything either.

cool-RR · on June 20, 2015

I wonder whether you can bypass copyright for a word list if you feed it into a bloom filter[1] and then save just the bloom filter.

[1] https://en.wikipedia.org/wiki/Bloom_filter

opcvx · on June 20, 2015

No, that wouldn't make sense because of the false positives.

You could instead store only the hashes of the words, using a sha256 or something similar.

lisper · on June 20, 2015

If I were inclined to twist the copyright tiger's tail, the way I would do it would be to encrypt the plaintext with a one-time pad and them publish the cyphertext and the pad anonymously in two different locations (preferably on two different domains). The key and the ciphertext in a one-time-pad are mathematically indistinguishable, so both publishing parties have plausible deniability that what they published was the key, i.e. just a string of random bits, which if course they have every right to do.

An even more interesting experiment would be to copyright the resulting key and the ciphertext, and put in the TOS for getting either one that you will not sue the publisher for any copyright violations.

That is, if I were so inclined :-)

Hello71 · on June 21, 2015

http://ansuz.sooke.bc.ca/entry/23: What Colour are your bits?

> Treating Colour as a function is almost the same as attaching tags to the bits - the difference is that when the Colour is a function of the bits, we don't have to worry about the tags being detached; on the other hand, when the Colour is a function of the bits, we can never have more than one possible Colour for a given sequence of bits. Monolith depends on exploiting this problem: it assumes that one file can only ever have one Colour, asserts that the Colour of its output file is the "you may copy this" Colour because of the (correct) claim that fixing any other single unchangeable Colour would raise legal problems, and then follows the logic to a claim that it can produce what would otherwise be an illegal copy of the copyrighted input, without breaking copyright law. One Colour per file was never one of the lawyers' rules of Colour; it's merely a consequence of "Colour is a function", and Colour being a function is just something we computer people decided to believe because functions make sense to our training and Colour doesn't. Colour is not actually a function at all.

kevin_thibedeau · on June 20, 2015

You can't copyright a random number although I suppose you could insert some deadbeefs for "artistic" effect without compromising the pad.

lisper · on June 20, 2015

> You can't copyright a random number

Why not?

mastax · on June 20, 2015

Sony certainly tried.

Dylan16807 · on June 20, 2015

No creative effort.

lisper · on June 20, 2015

So write a haiku, xor your random key with repeated copies of your haiku, call the result modern art, and copyright that.

Dylan16807 · on June 21, 2015

And when it serves a functional purpose as key you get to use it anyway.

x0x0 · on June 20, 2015

scribd was sued over essentially this: putting hashes of copyrighted works into an internal filter so they could never be uploaded again

http://www.wired.com/2010/07/copyrightfiltering-scribd/

cosmicexplorer · on June 20, 2015

so like.......did they put this decrypted database online, or something? true, we could just perform the same operations they did, but if you're going to go through the trouble of putting your crack in public, might as well spread its fruits too

Buge · on June 20, 2015

I'm pretty sure Cesar wants to avoid such a direct copyright violation. Sure just breaking the encryption might be considered a violation of the DMCA anti-DRM stuff, but that is a much more controversial law that many people oppose. A lawsuit over the anti-DRM would likely pull the EFF in, and make large news, while a lawsuit over plain spreading copyrighted information would be much more straightforward and likely for him to lose.

By only publishing the steps, he gets the benefit of the publicity of breaking the encryption. Then anonymous people can easily break it themselves and spread the actual list, free from worry of being sued.

userbinator · on June 21, 2015

My guess about the origins of the key, upon seeing its length, is that it is the SHA256 of something - could it be one of the words in the list?

opcvx · on June 20, 2015

Wouldn't it be easier to just dump the entire process after the database is loaded and decrypted in memory, and pick out the words?

hit8run · on June 20, 2015

Rest In Protein Zyzz.

n3on_net · on June 20, 2015

we all gonna make it brah