Splunk Researchers Introduce MAG-V: A Multi-Agent Framework For Synthetic Data Generation and Reliable AI Trajectory Verification
 
									 
These days, large language models (LLMs) are getting integrated with multi-agent systems, where multiple intelligent agents collaborate to achieve a unified objective. Multi-agent frameworks are designed to improve problem-solving, enhance decision-making, and optimize the ability of AI systems to address diverse user needs. By distributing responsibilities among agents, these systems ensure better task execution and offer scalable solutions. They are valuable in applications like customer support, where accurate responses and adaptability are paramount.
However, to deploy these multi-agent systems, realistic and scalable datasets need to be created for testing and training. The scarcity of domain-specific data and privacy concerns surrounding proprietary information limits the ability to train AI systems effectively. Also, customer-facing AI agents must maintain logical reasoning and correctness when navigating through sequences of actions or trajectories to arrive at solutions. This process often involves external tool calls, resulting in errors if the wrong sequence or parameters are used. These inaccuracies lead to diminished user trust and reduced system reliability, creating a critical need for more robust methods to verify agent trajectories and generate realistic test datasets.
Traditionally, addressing these challenges involved relying on human-labeled data or leveraging LLMs as judges to verify trajectories. While LLM-based solutions have shown promise, they face significant limitations, including sensitivity to input prompts, inconsistent outputs from API-based models, and high operational costs. Also, these approaches are time-intensive and need to scale more effectively, especially when applied to complex domains that demand precise and context-aware responses. Consequently, there is an urgent need for a cost-effective and deterministic solution to validate AI agent behaviors and ensure reliable outcomes.
Researchers at Splunk Inc. have proposed an innovative framework called MAG-V (Multi-Agent Framework for Synthetic Data Generation and Verification), which aims to overcome these limitations. MAG-V is a multi-agent system designed to generate synthetic datasets and verify the trajectories of AI agents. The framework introduces a novel approach combining classical machine-learning techniques with advanced LLM capabilities. Unlike traditional systems, MAG-V does not rely on LLMs as feedback mechanisms. Instead, it utilizes deterministic methods and machine-learning models to ensure accuracy and scalability in trajectory verification.
MAG-V uses three specialized agents:
- An investigator: The investigator generates questions that mimic realistic customer queries
- An assistant: The assistant responds based on predefined trajectories
- A reverse engineer: The reverse engineer creates alternative questions from the assistant’s responses
This process allows the framework to generate synthetic datasets that stress-test the assistant’s capabilities. The team began with a seed dataset of 19 questions and expanded to 190 synthetic questions through an iterative process. After rigorous filtering, 45 high-quality questions were selected for testing. Each question was run five times to identify the most common trajectory, ensuring reliability in the dataset.
MAG-V employs semantic similarity, graph edit distance, and argument overlap to verify trajectories. These features train machine learning models like k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), and Random Forests. The framework succeeded in its evaluation, outperforming GPT-4o judge baselines by 11% accuracy and matching GPT-4’s performance in several metrics. For example, MAG-V’s k-NN model achieved an accuracy of 82.33% and demonstrated an F1 score of 71.73. The approach also showed cost-efficiency by coupling cheaper models like GPT-4o-mini with in-context learning samples, guiding them to perform at levels comparable to more expensive LLMs.
The MAG-V framework delivers results by addressing critical challenges in trajectory verification. Its deterministic nature ensures consistent outcomes, eliminating the variability associated with LLM-based approaches. By generating synthetic datasets, MAG-V reduces dependence on real customer data, addressing privacy concerns and data scarcity. The framework’s ability to verify trajectories using statistical and embedding-based features represents progress in AI system reliability. Also, MAG-V’s reliance on alternative questions for trajectory verification offers a robust method to test and validate the reasoning pathways of AI agents.
Several key takeaways from the research on MAG-V are as follows:
- MAG-V generated 190 synthetic questions from a seed dataset of 19, filtering them down to 45 high-quality queries. This process demonstrated the potential for scalable data creation to support AI testing and training.
- The framework’s deterministic methodology eliminates reliance on LLM-as-a-judge approaches, offering consistent and reproducible outcomes.
- Machine learning models trained using MAG-V’s features achieved accuracy improvements of up to 11% over GPT-4o baselines, showcasing the approach’s efficacy.
- By integrating in-context learning with cheaper LLMs like GPT-4o-mini, MAG-V provided a cost-effective alternative to high-end models without compromising performance.
- The framework is adaptable to various domains and demonstrates scalability by leveraging alternative questions to validate trajectories.

In conclusion, the MAG-V framework effectively addresses critical challenges in synthetic data generation and trajectory verification for AI systems. The framework offers a scalable, cost-effective, and deterministic solution by integrating multi-agent systems with classical machine learning models like k-NN, SVM, and Random Forests. MAG-V’s ability to generate high-quality synthetic datasets and verify trajectories with precision makes it deemed for deploying reliable AI applications.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.
🚨 [Must Subscribe]: Subscribe to our newsletter to get trending AI research and dev updates
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

 
                             
                             
                             
                            

 
             
            
 
				 
				