It's fair to say the subject of copyright comes up a lot in AI. Whether you're talking about a large language model (LLM) like the ones that power ChatGPT, or diffusion models that serve as the backbone of text-to-image creators like Midjourney, generative systems suck up massive amounts of training data, much of it from the open web.
This form of data haversting has concerned many content creators and publishers, including The New York Times, which brought its concerns to the courts in late December. The results of that lawsuit won’t be known for some time, but while the world waits for the law to catch up, the question remains: can authors, photographers, videographers and anyone else in the business of creating content do anything to ensure they stay connected and in control of the things they create?
AI chatbots are only as good as your prompt. However, few people have the time or patience to test, iterate, and revise their prompts for the best results. Luckily, we’ve done that for you — download our free guide to the 100 best prompts for PR professionals through the link below 👇
There might be. What all these issues are circling is the concept of provenance: ensuring the copyright holder of any piece of content is embedded within the content itself. One way to do that through digital watermarking — essentially creating an "Invisible QR code" that travels with the document, image, or video, even if it's copied and stripped of metadata.
Steg.AI is a company that specializes in digital watermarking, and The Media Copilot spoke with its CEO, Eric Wengrowski in our latest podcast. We fully explored the role of watermarking in a world where web crawlers are constantly hoovering up data, why it's important to label AI-generated content as synthetic, and the incredibly important question: Can you still detect the watermark of a piece of training data in model output?
We hope you enjoy the conversation, and if you do, please give it a share.
Share this post