In this day and age, it’s not particularly challenging or complicated to create synthetic content that looks realistic. We have lots of consumer-friendly AI tools to thank for that. But as a result, it’s getting increasingly difficult to determine whether we can or cannot trust what we see online. And that’s the problem that needs to be solved as soon as possible.
Why do we need to determine the authenticity of content? Well, the growing popularity of generative AI (ChatGPT, DALL·E, and MidJourney) has blurred the line between human and machine-generated content. Without a doubt, AI offers huge creative potential, but unfortunately, it also raises concerns about malicious misuse. Convincing AI-generated images can fuel political disinformation or deepfake propaganda. AI text and audio might be used for phishing scams and fraud.
After all, there have already been cases when people have been fooled into believing that an AI-generated piece of content was actually real. For instance, that AI-generated image of the Pope in a stylish puffer jacket.
At the moment, one of the ways for us to fight back is to embed invisible, yet detectable “fingerprints” into AI outputs that’ll help to identify AI content. This is called AI watermarking.
Of course, it sounds good on paper, but is it really effective?
In this article, we’ll explore what AI watermarking is, how it works, and whether it has any benefits or limitations.
What Is AI Watermarking?
AI watermarking is the process of embedding a hidden, machine-detectable signal into content generated by artificial intelligence models to verify ownership and prove that a piece of content was (or wasn’t) generated by AI.
Unlike visible watermarks (e.g. logos on stock images), an effective AI watermark should be imperceptible to the human eye or ear, yet specialized software and algorithms should be able to detect it.
But that’s not all. To be truly robust, an AI watermark must meet four key criteria:
- The model’s performance or the quality of the output shouldn’t be impaired. A generative AI model with integrated watermarking should operate like any other model.
- It should be challenging to remove, alter, or forge an AI watermark.
- An AI watermark should be compatible with different model designs (e.g., LLMs, diffusion models).
How Does AI Watermarking Work?
There are two stages of AI watermarking process:
- During training, an AI model learns to embed invisible identifiers into its outputs.
- After the training and deployment, specialized tools scan the generated output, looking for these hidden signals.
AI watermarks can be applied to text, images, video, or audio, and each medium requires a different technical approach.
Text Watermarking
AI language models usually generate one word at a time, predicting the next likely word in a sentence. For instance, an AI model might finish the sentence “A cat in the…” with the word “hat” or “bag” because these are the most likely choices.
Here’s how text watermarking works – the AI model’s vocabulary is divided into the green-list words and the red-list words. Then, when generating the output, the model will choose the words from the green list and avoid the words from the red list. Finally, when this output is analyzed, the special algorithm will detect that the text consists mostly of green-list words. Since people tend to choose a more random mix of words, the text will be flagged as AI-generated.
In most cases, people won’t be able to detect these green-listed words, unlike watermark detectors that have been trained to look for these patterns.
Image & Video Watermarking
There are certain visible red flags that can help you identify an AI-generated image. For instance, if a person in an image has six fingers on one hand. However, as image generative models keep improving, the addition of AI watermarks is necessary to determine if an image is real or not.
One of the most common methods that you’ve probably already seen is adding a visible watermark. In other words, people add the text “AI-generated” or “Generated by AI” to synthetic images. Naturally, this is really easy to notice and recognize. It’s more of a traditional watermark than an AI watermark.
To add an AI watermark to an image means to embed a hidden signal in pixel data or frequency domains. For instance, some weights can be changed in the early layers of CNN (convolutional neural networks) to encode noise that will be imperceptible to the human eye, but detectable by trained algorithms.
Recently, Google introduced SynthID, and that’s exactly how it works – it adds some imperceptible noise into an image. The image quality isn’t compromised. At the same time, even if someone changes colors, adds filters, resizes or compresses this AI-generated image, the digital watermark will still be detectable.
There isn’t much information regarding adding AI watermarks to videos. However, it is known that frame-based changes – making subtle changes to each video frame – or specific encoding tweaks are usually used.
Audio Watermarking
Audio watermarking is pretty simple. Tools like WaveVerify aim to detect AI-generated voices or synthetic audio. They add an ultrasonic signal (below ~20 or above ~20,000 Hz) to an AI-generated audio. It’s not possible for humans to hear this signal, but, naturally, special software will detect it.
Why Is AI Watermarking Necessary?
Here are the key reasons why it’s important for people to adopt AI watermarking:
- Combating Misinformation & Deepfakes. AI-generated media can be used to spread false information or impersonate real people. For example, someone can replicate your voice and then get access to your bank accounts since many banks use voice recognition. Watermarking can help with determining whether you’re talking to an AI or a real human being.
- Promoting Responsible Use and Transparency. AI watermarking is proposed as a way to prevent casual misuse. For instance, when students submit essays fully written by AI without even reading them or doing any fact-checking. If people know that a piece of content can be flagged as AI-generated, they will be more likely to do at least some of the work themselves instead of relying solely on AI. Additionally, online platforms can label AI-generated content, making it clear that what users are seeing, reading or hearing is not real.
- Protecting Intellectual Property. Not so long ago, there was a surge of images that looked like they had been drawn by the Ghibli Studio on social media. Those images were AI-generated, which means it’s not that hard for AI to replicate a style of drawing or a style of photography. To put it simply, if you are a digital artist, someone can generate an image that looks like your work and pass it as their own. AI watermarking can help to avoid this kind of copyright violation.
Additionally, watermarking can help AI-assisted works to be properly attributed by linking them to their creator. - Regulatory Compliance. The EU AI Act and other emerging policies require certain kinds of AI-generated content to be identifiable. AI watermarking is one of the ways to comply with these regulations.
Limitations of AI Watermarking
AI watermarking does sound quite promising. But unfortunately, it still has some limitations and challenges.
It’s easily removed. Some AI models are open source. So, their code can be found online and used by anyone. But it also means that the code can be edited. Thus, people can simply remove the AI watermarking from this code and run the model without it.
Surely, some models are closed-source, so you can’t easily get access to their code. But recent studies by ETH Zürich and the University of Maryland have shown that common watermarking techniques, especially for text, can be reverse-engineered and then removed. Using relatively simple attacks like paraphrasing or reformatting, researchers achieved:
- ~85% success in removing watermarks
- ~80% success in spoofing AI-generated text
Cropping, filters, changes in color, resizing, or compression may erase signals in AI-generated images. It’s possible for hackers to train models to bypass detection, as well.
Implementing AI watermarks is mostly voluntary. Yes, the EU AI Act and similar regulations are being adopted. Major companies like Google, Microsoft, and Meta have implemented AI watermarking into their models. However, for the most part, it’s up to companies to implement or not to implement AI watermarking into their models. And as you can expect, not all companies choose to do this.
There’s no standardization. In addition to the previous point, there is no unified watermark standard. It’s not possible to detect watermarks across different models (e.g., OpenAI, Meta, Midjourney). Google’s SynthID, for example, only works on content made with Google’s own models like Gemini or Imagen. So, if you use SynthID to analyze a watermarked piece of content that was generated by a non-Google model, it won’t be flagged as AI-generated.
The problem of false positives exists. Human writing might accidentally trigger detection and be identified as the product of AI. It’s possible for an image or text to unintentionally mimic a particular watermark. For example, when writing an essay, non-English speakers might choose rarely-used words or compose a sentence in a way that AI would. This may lead to unfair accusations of plagiarism or deceit. More to that, ill-minded people can actually add a watermark to a real image created by a human to get it flagged as AI-generated. Then, its authenticity will be questionable.
AI Watermarking in a Broader Defense Strategy
Taking all of the limitations into account, experts agree that watermarking alone won’t stop the misuse of AI. Instead, it should be one piece in a multi-layered content authentication framework, alongside:
- Provenance metadata standards like C2PA (Content Authenticity Initiative),
- Cryptographic signatures from model providers,
- Model fingerprinting to identify which AI created a file,
- AI-generated content detectors powered by machine learning.
By combining all of these tools and methods, it will be possible to tell apart human-made content from AI-generated content.
Conclusion
On paper, AI watermarking is a powerful tool for detecting AI-generated content that doesn’t impair a model’s performance or the quality of output. It can help with fighting against deepfakes and misinformation as well as protecting intellectual property.
However, this technology has some limitations. At the moment, AI watermarking is more of a work in progress than an ideal solution to the problem.
For watermarking to deliver on its promise, the future must include:
- Robust and tamper-resistant designs
- Global, cross-platform standards
- Broader adoption across platforms
- Transparency about watermarking methods
Nevertheless, it’s an important step against the misuse of AI and a starting point in building trust in today’s digital world flooded with synthetic content.