The New York Times filed a lawsuit against OpenAI and Microsoft, accusing them of training AI models, including ChatGPT and Copilot, using millions of Times articles without consent, and is seeking billions in damages.
The lawsuit raises concerns about AI’s misuse of copyrighted content, impacting independent journalism’s production.
OpenAI defends its AI technology use, respecting content creators, but disputes persist on scraping web content for AI training.
According to the Copyright Alliance, to prove copyright infringement, the copying must be substantial and material, and protected expression must have been copied.
In November, a group of well-known authors, including John Grisham, George R.R. Martin, and Jonathan Franzen, sued OpenAI and Microsoft accusing the companies of training their models using the authors’ copywritten works.
Other lawsuits include notable suits by actress Sarah Silverman against Meta and OpenAI over use of her books and Getty Images against Stability AI over use of Getty’s image library.
The Times lawsuit highlights generative AI models’ tendency to replicate news data, unintentionally allowing users to bypass paywalls.
Publishers, including The Times, accuse AI companies of siphoning web traffic and ad revenue, raising concerns about AI’s impact on journalism.
Other news outlets like The Associated Press have taken a different course from litigation, choosing instead to negotiate license fees for its content.
In July, The AP inked a deal with OpenAI to open its archive to the company allowing it to train ChatGPT using its huge archive of news articles.
The Times in its complaint said, “The defendants seek to free-ride on The Times’s massive investment in its journalism.”
Ultimately, it will likely come down to money as the Times lawsuit will likely lead to a payment for the use of its articles.
But this begs the question of what this means in the long-term for journalism when the AI models are trained on past articles.