A journalist's guide to AI privacy
What really happens to your data when you send it to an LLM? Why newsrooms need a balanced approach to data privacy.
If there's a topic that comes up without fail whenever I start teaching applied AI to a new group—whether they're journalists, communications professionals, or from some other field—it's data privacy. Everyone is suspicious of what AI companies might be doing with their data, and it often leads many people and companies to underutilize the tools they have access to.
That's why today I'm providing what I hope is a helpful guide on how to think about privacy with respect to large language models. While there is good reason to have concerns, those concerns can be largely mitigated with the right precautions and a realistic mindset.
Speaking of being realistic, it's time publishers got real about what AI search engines are doing to referral traffic. The truth is AI summaries act as substitutes for publisher content, but not all content is affected the same way. I'm excited to host a webinar on April 10 that dives deep into the topic of AI substitution risk. Read on to learn more 👇
AI Risk, Unpacked: How Publishers Can Safeguard Content Strategy and Revenue
Are you truly aware of what's driving your Google traffic—and how vulnerable it is to AI-driven disruption?
Join Pete Pachal, founder of The Media Copilot, and David Buttle, former director of platform strategy at the Financial Times, for an essential webinar designed specifically for media leaders. Learn how to see what's on the AI horizon by decode the insights behind your search traffic and understand exactly where your content stands in the AI landscape.
In this session, you'll discover:
Which types of content are most at risk of AI replacement—and which offer defensible, lasting value.
A clear framework for assessing AI substitution risks to protect your traffic and revenue.
Actionable strategies to serve audience needs in ways that AI cannot replicate, ensuring long-term resilience and growth.
Equip your organization to navigate the realities of AI disruption with clarity and confidence.
Essential AI privacy strategies for newsrooms
As the use of AI tools for reporting and research rises in newsrooms, so too do concerns about data privacy—especially when it comes to handling sensitive source material. In the early days of generative AI, there were many stories about chabots spitting back data from their training sets, so reporters are understandably cautious about what these AI tools might do with the data they're fed. Many assume that anything entered into AI software becomes training material for the system itself.
This assumption isn't entirely incorrect, though it's often overstated. Without a solid understanding of how different AI platforms handle privacy, journalists risk accidental leaks or, alternatively, might be forced into restrictive "no-AI" newsroom policies. Given the widespread adoption of AI in journalism, the risk of unintended disclosure is substantial.
Understanding AI privacy in journalism
AI is most effective when applied directly to specific sets of data. Asking general questions like "What were the causes of the Great Depression?" can result in vague or incorrect answers (a.k.a. "hallucinations"). A far better use of AI's natural-language processing abilities is to direct them at specific, detailed datasets. This will greatly reduce the chance of hallucinations and be of much more practical use to reporters and editors.
Typically, when you feed information into an AI system, it gets uploaded to the cloud, processed for analysis, and stored to allow future access or to enhance the model’s training. It's this last step—where data becomes part of the AI's training set—that's the most concerning. Once integrated into the model, there's a small risk the AI might later respond to a user query with this exact information, an event referred to as "regurgitation."
Keep reading with a 7-day free trial
Subscribe to The Media Copilot to keep reading this post and get 7 days of free access to the full post archives.