Mastering AI Data Scraping for Text and Video: A Guide for Journalists
How to use AI scraping to extract data from websites and videos to power better journalism.
For journalists today, working with data using AI and other analysis tools is crucial. But just as important is gathering that data, which isn’t always straightforward. It’s unlikely a corporation you’re investigating will hand over a nicely formatted PDF outlining all of their OSHA violations, is it?
For those unfamiliar with web scraping, it’s a valuable technique for data journalists. Skilled data journalists can code a scraper that will crawl over a website or series of websites, collect data, and return it in a usable format. The advantage of this technique is that journalists can then use this vast amount of data to tell trend stories, conduct deep-dive investigations, verify data and access otherwise inaccessible data.
That’s the old way of doing it. Today, AI has entered the picture with AI web scrapers. These differ from traditional web scrapers in that they take less setup time and skill — many can be used without a line of code — and are more robust. Older web scrapers might struggle with websites with a dynamic layout or otherwise make it difficult to extract data.
For truly scraper-hostile websites, I’ll introduce a new technique called “video scraping” that leverages the power of AI.
Here’s a comprehensive guide on harnessing these powerful tools and diving into modern data-driven journalism.
Why AI Scraping is Key in Data Journalism
A deep ocean of data is available online, but finding it in a valuable structured form is challenging. What good is a massive table of 911 call-response data in your area if you have to input all the data into a spreadsheet by hand?
Keep reading with a 7-day free trial
Subscribe to The Media Copilot to keep reading this post and get 7 days of free access to the full post archives.