Meta may have illegally removed copyright info in AI corpus • The Register

A judge has found Meta must answer a claim it allegedly removed so-called copyright management information from material used to train its AI models.
The Friday ruling by Judge Vince Chhabria concerned the case Kadrey et al vs Meta Platforms, filed in July 2023 in a San Francisco federal court as a proposed class action by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, who reckon the Instagram titan’s use of their work to train its neural networks was illegal.
Their case burbled along until January 2025 when the plaintiffs made the explosive allegation that Meta knew it used copyrighted material for training, and that its AI models would therefore produce results that included copyright management information (CMI) – the fancy term for things like the creator of a copyrighted work, its license and terms of use, its date of creation, and so on, that accompany copyrighted material.
The miffed scribes alleged Meta therefore removed all of this copyright info from the works it used to train its models so users wouldn’t be made aware the results they saw stemmed from copyrighted stuff.
Judge Chhabria last week allowed the plaintiff’s claim that Meta violated the US Digital Millennium Copyright Act (DMCA) by removing copyright notices from works used to train the Facebook giant’s Llama family of models to continue. That decision makes it more likely the case will end in settlement or trial.
“[The plaintiffs’] allegations raise a ‘reasonable, if not particularly strong, inference’ that Meta removed CMI to try to prevent Llama from outputting CMI and thus revealing that it was trained on copyrighted material,” Judge Chhabria wrote in his order [PDF]. “This use of copyrighted material is clearly an identifiable (alleged) infringement.”
Meta has already admitted [PDF] it used a dataset named Books3 to train its Llama 1 large language model. The dataset has been found to include copyrighted works.
The news isn’t all bad for Meta because Judge Chhabria tossed one of the plaintiffs’ claims – that Meta’s Llama use of unlicensed books obtained from peer-to-peer torrents violated California’s Comprehensive Computer Data Access & Fraud Act (CDAFA).
Edward Lee, a professor of law at Santa Clara University, told The Register we should not infer anything about fair use based on the author’s DMCA 1202(b)(1) claim about the scrubbed CMI.
“At the hearing, Judge Chhabria also expressed some skepticism the plaintiffs would prove the DMCA [claim] and said it could be revisited on summary judgment,” Lee said. “What it does show is that the plaintiffs’ attorneys were able to find a more particularized factual basis for their DMCA claim, which had been dismissed earlier in the case.”
By allowing the CMI claim to advance, Chabria has delivered a second ruling that suggests the indiscriminate ingestion of copyrighted material to train AI models may have financial consequences.
The first came last month when Thomson Reuters won a partial summary judgment against shuttered AI firm Ross Intelligence that prevents the defendant firm from avoiding liability by claiming fair use.
Legal scholars have argued that AI inference – apps that produce outputs based on AI models – is more likely to be deemed copyright infringement because it’s obvious when a model spits out an author’s work verbatim. Inputting copyrighted material into models for training has been viewed as more likely to qualify for fair use defenses.
However, the Thomson Reuters decision and the survival of the DMCA claim against Meta look likely to strengthen plaintiffs in other AI-related litigation.
For example, Tremblay et al vs OpenAI et al was amended [PDF] last week. It seeks to revive its previously dismissed DMCA claim based on new but redacted evidence supporting allegations of CMI removal.
Citing revelations that followed from discovery, the revised complaint argues, “As amended, the DMCA claim sufficiently alleges that OpenAI actually removed CMI for training its large language models.”
Meta did not respond to a request for comment. ®