Anthropic begins research into whether advanced AI could have experiences

Anthropic has launched a new research initiative focused on the welfare of AI systems—specifically, whether advanced models should be morally considered when they display human-like capabilities.
The project builds on a recent report co-authored by philosopher David Chalmers, which “highlighted the near-term possibility of both consciousness and high degrees of agency in AI systems” and argued that such models “might deserve moral consideration.”
Positioned as a new area within Anthropic’s broader safety and ethics work, the research aims to explore “the potential importance of model preferences and signs of distress” as well as “possible practical, low-cost interventions.”
The company notes that “there’s no scientific consensus on whether current or future AI systems could be conscious, or could have experiences that deserve consideration,” and says it is “approaching the topic with humility and with as few assumptions as possible.” Anthropic views this work as complementary to existing interpretability research, which aims to better understand model internals.
Ad
Kyle Fish, a researcher at Anthropic working on model welfare, describes the initiative as an attempt to reduce uncertainty about whether models might someday have experiences of their own—and what implications that would carry.
He frames the research as both philosophical and empirical, involving probabilistic rather than binary reasoning. “We’re just deeply uncertain about it,” Fish said in a recent podcast (see video below).
“There are staggeringly complex, both technical and philosophical questions that come into play and we’re at the very, very early stages of trying to wrap our head around those.”
Fish outlines two primary research directions: investigating behavioral evidence, such as how models respond when asked about preferences, or when placed in situations with choices; and analyzing model internals to identify architectural features that might align with existing theories of consciousness.
For example, researchers are examining whether large language models exhibit characteristics associated with global workspace theory, one of several scientific frameworks for understanding consciousness.
Recommendation
Anthropic emphasizes that this research does not imply current models are sentient. In fact, when asked to estimate the likelihood that Claude 3.7 Sonnet—the company’s current model—is conscious, Fish cited a recent internal discussion where estimates ranged from 0.15% to 15%, reflecting a wide spectrum of expert opinion. “We all thought that it was well below 50%,” he said, “but we ranged from odds of about like one in seven to one in 700. So yeah, still very uncertain.”