Researchers unveil LLM-based system that designs and runs social experiments on its own

Editorial Team

April 22, 2024
3 Min Read

Researchers unveil LLM-based system that designs and runs social experiments on its own

MIT and Harvard researchers have developed a new approach that uses large language models (LLMs) to automatically generate and test social science hypotheses.

Key to this approach are Structural Causal Models (SCMs), mathematical models for formulating hypotheses that provide a blueprint for constructing high-quality LLM-based agents, designing experiments, and analyzing data.

The system can generate hypotheses, design experiments, run them with LLM-driven agents that simulate humans, and analyze the results without human intervention. This makes the language model both researcher and research object, the researchers say.

Jeder Schritt im Prozess entspricht laut der Forscher einem analogen Schritt im sozialwissenschaftlichen Prozess, wie ervon Menschen durchgeführt wird. Die Entwicklung der Hypothese leitet die Versuchsplanung, die Durchführung und die Modellschätzung. Die Forscher können die Entscheidungen des Systems in jedem Schritt des Prozesses bearbeiten. — According to the researchers, each step in the process corresponds to an analogous step in the social science process as performed by humans. Hypothesis development guides experimental design, execution, and model estimation. Researchers can edit the system’s decisions at any step in the process. | Image: Manning, Zhu et al.

The researchers demonstrate the approach in several scenarios: a trial, a bail hearing, a job interview, and an auction. In each case, the system suggests and tests causal relationships, finding evidence for some hypotheses and not for others.

For example, in the negotiation situation, the likelihood of reaching an agreement increased as the seller’s emotional attachment to the item decreased. Both the buyer’s and the seller’s reservation prices mattered. In the bail hearing, a remorseful defendant was granted lower bail, but not if he had an extensive criminal record.

The researchers note that the insights from these simulated social interactions are not available by directly querying the LLM. However, when the LLM was equipped with the proposed SCM for each scenario, it could reliably predict the direction of the estimated effects, but not their strength.

In the auction experiment, the simulation results closely matched the predictions of auction theory that the final price would be close to the second-highest bid. The LLM’s predictions of auction prices were inaccurate, but improved dramatically when the model was conditioned with the adapted SCM.

The research team believes that this SCM-based LLM approach is a promising new method for studying simulated behavior on a large scale, offering advantages such as controlled experiments, interactivity, customization, and high repeatability of results. They suggest that this method could be a breakthrough for the social sciences, similar to the impact of Alphafold on protein research and GNoME on materials research.

“The system presented in this paper can generate these controlled experimental simulations en masse with prespecified plans for data collection and analysis. That contrasts most academic social science research as currently practiced,” the researchers write.

Recommendation

Unlike open social simulations, where it can be difficult to select and analyze outcomes, the SCM framework describes exactly what is to be measured as a downstream outcome. This avoids the need to infer causal structure from observational data after the fact, which can be problematic.

However, the challenge of translating the results generated in the simulation to actual human behavior remains.

Future research areas include optimizing the assignment of attributes to LLM agents, designing social interactions between agents, and exploring how the approach could be used for automated research programs.

The study highlights the potential of generative AI to accelerate scientific research in various fields.

Source link