AI scrapers would be forced to ask permission under bill • The Register

A bipartisan pair of US Senators introduced a bill this week that would protect copyrighted content from being used for AI training without the owner’s permission. Content creators from large media companies to individual bloggers could effectively block Google, Meta, OpenAI, Anthropic, and others from appropriating their work.
If passed into law, the AI Accountability and Personal Data Protection Act [PDF] from Senators Josh Hawley (R-MO) and Richard Blumenthal (D-CT) would add a new federal tort allowing individuals to sue companies that use copyrighted works or personally identifiable information to train AI without the owner’s express prior consent.
Arguably the most important question in the media industry today is whether AI companies’ use of copyrighted training materials constitutes “fair use,” a legal shield against infringement claims. Fair use allows third parties to use copyrighted works for criticism, news reporting, commentary, and research. AI makers claim that training their models is protected by this doctrine and some courts have agreed.
Last month, a group of authors lost in court when a judge accepted Anthropic’s claim that the company has the right to use their books to train Claude AI, all without compensation or permission. That kind of thing doesn’t seem to sit well with Hawley.
“AI companies are robbing the American people blind while leaving artists, writers, and other creators with zero recourse,” the Republican Senator noted in a press release. “My bipartisan legislation would finally empower working Americans who now find their livelihoods in the crosshairs of Big Tech’s lawlessness.”
The AI Accountability and Data Protection Act’s text does not mention fair use. However, it does present both personally identifiable information and copyrighted material as types of “covered data” that require the data owner’s prior consent to be used for training.
Blumenthal, a frequent legislative partner of Hawley’s, agreed with his take, noting that AI safeguards are urgently needed.
“Consumers must be given rights and remedies — and legal tools to make them real — not relying on government enforcement alone,” Blumenthal added in the press release.
The bill spells out what it considers to be express prior consent, and those rules are strict, too. AI vendors have to clearly inform individuals of what their data is being used for and who will have access to it.
Companies have to ask for consent explicitly, and can’t tie it to the usability of a product if said data collection isn’t reasonably necessary. Consent requests can’t be mixed into other agreements, and they can’t just link out to a full explanation, either – it’s all gotta be stated up front to meet the terms of this legislation.
The bill also proposes to make illegal any arbitration agreements that prevent individuals from suing companies who improperly collected or used their data, freeing victims up to lob sueballs at AI companies to their heart’s content.
Covered data includes unique identifiers such as device IDs, IP addresses, advertising IDs, geolocation data, biometric identifiers, behavioral data (e.g., browning history and purchase patterns) and even information companies use to build profiles.
The end of AI scraping?
If this bill redefines fair use in favor of content creators, the entire information economy could change. At present, online publishers are suffering from a “traffic apocalypse” as Google’s AI Overviews compete with their content, depriving them of the ad impressions they need to stay in business. AI Overviews, ChatGPT, and almost every other LLM has been built by scraping huge portions of the web without permission.
Major AI companies like Google have long argued that AI scraping of websites constitutes fair use, but the matter is hardly settled, as demonstrated by a recent research paper commissioned by the EU Parliament that concluded AI scraping does not, in fact, constitute fair use, because AIs don’t learn like humans do.
The head of the US Copyright Office similarly said last month that AI scraping went beyond the limits of fair use, and while the opinion may have cost him his job, it seems that elected officials have been paying attention.
Introduced Monday and referred to committee, the bill may be a hard sell. There’s no indication when it could be up for review by the Senate Judiciary Committee, nor if it would pass muster for a full Senate vote after that. Neither Hawley nor Blumenthal’s offices responded to our questions. ®