The limits of OpenAI’s attempts to watermark AI text

Posted On: December 10, 2022

Whose handwriting was that, ChatGPT or a human? As a result, OpenAI is trying to figure out how to “watermark” AI-generated material to make it easier to identify.

Computer science professor and current guest researcher at OpenAI Scott Aaronson recently announced in a presentation at the University of Austin that the company is working on a tool for “statistically watermarking the outputs of a text [AI] system.” A “unnoticeable secret signal” would be included in all text generated by a system like ChatGPT to reveal its origin.

An OpenAI developer named Hendrik Kirchner constructed a prototype, and if Aaronson is to be believed, it will eventually be incorporated into other systems created by OpenAI.

With these new measures, “we want it to be much tougher to take [an AI system’s] output and pass it off as if it originated from a person,” Aaronson added. “This may be useful for eliminating academic plagiarism, of course, but also, for instance, mass creation of misinformation, like, say, flooding every site with supposedly on-topic comments supporting Russia’s invasion of Ukraine without having a building full of trolls in Moscow. Or using a person’s own literary voice against them.”

Take advantage of luck by

A watermark seems unnecessary. The popular messaging app ChatGPT serves as a great illustration of this. The OpenAI chatbot has taken the internet by storm, demonstrating not just an ability to answer difficult queries, but also to write poetry, solve programming riddles, and wax philosophic on a wide range of topics.

ChatGPT is hilarious and actually helpful, but it also has some serious moral issues. ChatGPT, like many text-generating systems before it, might be used to produce high-quality phishing emails and destructive viruses or to cheat on schoolwork. Furthermore, it is factually inconsistent as a question-answering tool, which is why programming Q&A site Stack Overflow has temporarily banned replies originating from ChatGPT.

Understanding how and why systems like ChatGPT function so effectively will help with understanding the technological foundations of OpenAI’s watermarking tool. Input and output text are processed by these systems as strings of “tokens,” which might be words, punctuation, or even individual letters. Systems continually generate a mathematical function called a probability distribution to determine the next token (e.g., word) to output, taking into consideration all previously generated tokens.

Following the generation of the distribution, OpenAI’s server handles the task of randomly sampling tokens in systems like ChatGPT hosted by OpenAI. There is an element of chance in this decision, which is why the identical text challenge might elicit a variety of answers.

Aaronson said in his presentation that OpenAI’s watermarking tool serves as a “wrapper” over preexisting text-generating systems, using a cryptographic algorithm executed on the server to “pseudorandomly” choose the next token. In principle, the system’s produced text would appear random to you and me, but a watermark might be revealed by anybody in possession of the “key” to the cryptographic function.

“Empirically, it appears that a few hundred tokens are sufficient to produce a reasonable indication that yes, this text did come from [an AI system]. In theory, you might use the algorithm to determine which portions of a lengthy document likely originated with [it] and which did not. To quote Aaronson: The watermarking and verification processes are both handled by the same instrument, which uses a secret key for both processes.

Major constraints

It’s not a novel concept to watermark AI-generated text. Earlier efforts, which were mostly rule-based, made use of strategies like synonym replacements and syntax-specific word alterations. However, OpenAI’s looks to be one of the first cryptography-based answers to the challenge, outside of theoretical studies released by the German institution CISPA last March.

When asked for further details, Aaronson declined to comment, saying only that he plans to co-author a research article on the watermarking prototype in the near future. Even OpenAI didn’t want to comment, noting merely that watermarking is one of many “provenance approaches” it’s looking into for identifying AI-created outputs.

However, professionals in the field who are not linked with any one institution have varying points of view. They point out that the tool is server-based, therefore it might not be compatible with all TGSs. And they claim that opponents would be able to circumvent it easily.

“I think it would be very straightforward to get around that by rewording, using synonyms, etc.,” Srini Devadas, an MIT professor of computer science, told. “There’s a little tug of war going on here.”

Since each token represents a discrete option, it would be impossible to invisibly fingerprint AI-generated language, as pointed out by Jack Hessel, a research scientist at the Allen Institute for AI. If the fingerprint is too blatant, it may cause the speaker to use words that hinder their fluency, while if it is too subtle, it may lead to confusion when the fingerprint is being sought out.

“You could worry that all this stuff about trying to be safe and responsible when scaling AI … as soon as it seriously hurts the bottom lines of Google and Meta and Alibaba and the other major players, a lot of it will go out the window,” Aaronson said. “On the other hand, we’ve seen over the past 30 years that the big Internet companies can agree on certain minimal standards, whether because of fear of getting sued, desire to be seen as a responsible player, or whatever else.”

Catherine A. Leal

Subtly charming pop culture geek. Amateur analyst. Freelance tv buff. Coffee lover

Recent Posts