Google’s AI cites web pages written by AI, study says • The Register

Welcome to the age of ouroboros. Google’s AI Overviews (AIOs), which now often appear at the top of organic search results, are drawing around 10 percent of their sources from documents written by … other AIs, according to a recent report.
Originality.ai, a company that makes AI detection software, recently studied 29,000 different Your Money or Your Life (YMYL) Google queries, those that cover life-changing topics such as health, financial, legal, or political topics. The company then evaluated the AIOs that appeared at the top of the page, the links they cited, and the first 100 organic search results for each query.
Running the AIO citations through its AI Detection Lite 1.0.1 model, the company found that 10.4 percent of them were likely generated by an LLM. This means one AI is drawing on output from another, which could contribute to an echo chamber of recycled ideas and biases.
“Even a small proportion of AI-generated citations in high-stakes areas raises trust and reliability concerns,” Originality.ai Director of Marketing and Sales Madeleine Lambert told The Register via email. “And while AI Summaries aren’t directly used in training data, over-sampling AI-written content makes it more likely those outputs are recycled into future models. This can then become a recursive loop.”
When AI models learn from other AI models, eventually it can lead to model collapse, where the output keeps getting worse. According to a Nature 2024 paper on the topic:
Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation. Being trained on polluted data, they then mis-perceive reality.
When we asked Google for its response, the company told us that it takes issue with the accuracy of Originality.ai’s AI detector itself.
“This is a flawed study relying on partial data and unreliable technology,” a spokesperson said. “AI detectors have not proven their effectiveness at detecting AI generated content – in fact, many have demonstrated they are error-prone. As in Search more broadly, the links that are included in AI Overviews are dynamic and change based on the information that is most relevant, helpful, and timely for a given search.”
While definitely not perfect, Originality.ai has elsewhere received high marks for accuracy. According to a University of Florida study [PDF] from 2024, the tool consistently rated GPT-4-generated abstracts as AI (mean score 0.975), suggesting it performed strongly in that setup. Another study [PDF], conducted by researchers from Arizona State University, showed the tool netting just a two percent false positive and a two percent false negative rate.
To be fair, the search giant also has never promised to bar AI-generated content from its listings or citations. In fact, in a 2023 blog post, the company explicitly stated that it would judge AI articles by their quality rather than by their lack of human authorship.
“AI has the ability to power new levels of expression and creativity, and to serve as a critical tool to help people create great content for the web,” Google wrote.
Whatever the case, Google’s AIOs should probably get used to citing and learning from other AI-generated content, because the Chocolate Factory is driving out human publishers by taking away the traffic they need to stay in business. A July study from Pew Research Center showed that users who encountered an AI overview were nearly half as likely to click through to a web result as those who did not.
Google disputed the results of the Pew study at the time it came out. However, other studies have indicated similar trends. For example, research from Ahrefs, a search tool company, in April showed a 34.5 percent lower click-through rate for the top result when an AIO was present above it.
Good enough to cite, but not in the top 100
Of the remaining AIO citations Originality.ai tracked, another 74.4 percent were written by humans. Researchers filed the remaining 15.2 percent into the “unclassifiable” category, which includes citations that were too short to analyze, appeared as videos, were in PDF format, or were broken links.
Lambert noted that some of those broken links, which made up 20 percent of the unclassifiable citations, may have been blocked only to Originality.ai’s crawler, while others were inaccessible to human users as well.
Most interestingly, of the links that did work in AIO citations, 52 percent of them were not among the top 100 pages Google showed in its organic search results for the same term. Of those 52 percent, Originality.ai flagged 12.8 percent (higher than the overall 10.4 percent) as being AI generated.
On the other hand, Originality.ai’s experience with rankings is far different from what Ahrefs found in a July study. For that research, Content Marketer Louise Linehan and Data Scientist Xibeijia Guan analyzed 1.9 million citations from 1 million AIOs and discovered that 76 percent were in the top 10 results, another 9.5 percent were also in the top 100, and only 14.4 percent of cited pages did not rank.
Sam Robson, founder and CEO of The Better Web Co., a Search Engine Optimization (SEO) firm, said that he also usually sees a strong correlation between web pages appearing in the top 10 and the same links showing up as AIO citations. However, he posited that Originality.ai may be seeing different results because it focused exclusively on YMYL queries.
“AI Overviews are powered by Gemini, and being Google’s own LLM, it’s designed to parse deeper and more varied training materials than Googlebot / Google Search ever have been,” Robson told The Register. “This might mean that, in the YMYL space, where some great information is contained in PDFs, whitepapers, and other formats that aren’t optimized for traditional search, AI overviews is doing a better job of highlighting these more varied resources.”
On the other hand, Google posited, placement in the top 100 search results for a term doesn’t necessarily mean that a link should show up as a citation for the AI overview. A company spokesperson noted that AIOs use a “query fan out technique” [PDF] that conducts a number of different and related searches to find content for the AIOs’ response. So, even though you queried one thing, the AI tool may have conducted many similar but slightly different queries to bring you its response. ®