Google will label fake images created with its A.I

polotics · on May 11, 2023

Am I the only one who saw the first picture on the article and immediately thought: hell yeah, that's a fake CEO! ?

esafak · on May 10, 2023

We need this for all ML-generated content. Especially for text.

There must be some industry labeling standard?

armchairhacker · on May 10, 2023

I want to see a classifier that takes text and can check the sources to see if the information is accurate. And another model that takes text and rates how much “well-written” it is: bonus points for terseness and clarity.

That way either the AI generated text gets flagged by the classifiers, or the AI generated text is genuinely high-quality and I want it.

cloudking · on May 11, 2023

Not sure why you are getting down voted, idea makes sense

mc32 · on May 10, 2023

How do you enforce that with a copy paste of ASCII text? I'm not against disclosure/attribution, but it'd be voluntary.

kibibu · on May 10, 2023

https://arxiv.org/abs/2301.10226

> We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters.

Alpaca uses this already.

dullcrisp · on May 10, 2023

Zero-width spaces :)

pixl97 · on May 10, 2023

So what you're saying is the second tool in everyone's pipeline will be one that removes hidden characters?

stuckkeys · on May 11, 2023

And then use AI to remove the watermark. The perfect circle.

westurner · on May 11, 2023

> Google’s approach is to label the images when they come out of the AI system, instead of trying to determine whether they’re real later on. Google said Shutterstock and Midjourney would support this new markup approach. Google developer documentation says the markup will be able to categorize images as trained algorithmic media, which was made by an AI model; a composite image that was partially made with an AI model; or algorithmic media, which was created by a computer but isn’t based on training data.

Can it store at least: (1) the prompt; and (2) the model which purportedly were generated by a Turing robot with said markup specification? Is it schema.org JSON-LD?

It's IPTC: https://developers.google.com/search/docs/appearance/structu...

If IPTC-to-RDF i.e./e.g. schema:ImageObject (schema:CreativeWork > https://schema.org/ImageObject) mappings are complete, it would be possible to sign IPTC metadata with W3C Verifiable Credentials (and e.g. W3C DIDs) just like any other [JSON-LD,] RDF; but is there an IPTC schema extension for appending signatures, and/or is there an IPTC graph normalization step that generates equivalent output to a (web-standardized) JSON-LD normalization function?

/? IPTC jsonschema: https://github.com/ihsn/nada/blob/master/api-documentation/s...

/? IPTC schema.org RDFS

IPTC extension schema: https://exiv2.org/tags-xmp-iptcExt.html

[ Examples of input parameters & hyperparameters: from e.g. the screenshot in the README.md of stablediffusion-webui or text-generation-webui: https://github.com/AUTOMATIC1111/stable-diffusion-webui ]

How should input parameters and e.g. LLM model version & signed checksum and model hyperparameters be stored next to a generated CreativeWork? filename.png.meta.jsonld.json or similar?

westurner · on May 11, 2023

If an LLM passes the Turing test ("The Imitation Game") - i.e. has output indistinguishable from a human's output - does that imply that it is not possible to stylometrically fingerprint its outputs without intentional watermarking?

https://en.wikipedia.org/wiki/Turing_test

kadoban · on May 11, 2023

Implicit in the Turing test is the entity doing the evaluation. It's quite possible that a human evaluator could be tricked, but a tool-assisted human, or an AI itself could not be. Or even just some humans could be better at not being tricked than others.