OpenAI unveils Operator agent for automating web tasks • The Register

OpenAI on Thursday launched a human-directed AI agent called Operator that can use a web browser by itself to accomplish various online tasks, or at least try to do so.
As demonstrated by OpenAI CEO Sam Altman, software engineer Yash Kumar, researcher Casey Chu, and technical staff member Reiichiro Nakano, the Operator agent can perform online activities that require multiple steps and have specified parameters, such as booking a restaurant reservation through OpenTable within a certain time window or finding concert tickets for a specified performer within a given price range.
Just like you feed queries into OpenAI’s ChatGPT to answer or respond to, users can give Operator instructions to carry out on the web as their personal assistant.
While individuals can perform such tasks on their own time at no extra cost, Operator can do so less reliably for US-based ChatGPT Pro subscribers, who pay $200 per month. OpenAI subscribers to Plus, Team, and Enterprise tiers can expect access once the rough spots get ironed out.
Operator is similar to Anthropic’s computer use API in that it combines the sort of browser automation enabled by software frameworks like Playwright and Selenium with text-based machine learning models and computer vision models for evaluating online words and images presented by browsing websites.
The overall aim is to automate web-based tasks to free humans from dull work … or from employment all together.
“Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes,” OpenAI explains in a write-up. “The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses.”
Those engagement opportunities presently involve negotiation with OpenAI. The biz said it is working with firms “like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, and others to ensure Operator addresses real-world needs while respecting established norms.”
In other words, OpenAI’s Operator may not interoperate well with web services that aren’t expecting frequent automated contact. But to the extent agent-based interaction becomes popular, OpenAI and like-minded agent purveyors may devalue search as a marketing and sales channel, since automated connections to services – and partner preferencing paved by APIs – have the potential to reduce the need for human-driven queries.
OpenAI’s agent is based on a model called Computer-Using Agent (CUA), which combines GPT-4o’s computer vision capabilities with training about how to deal with graphical user interfaces (GUIs). TikTok parent ByteDance recently released a similar open source project for automating GUI interactions, UI-TARS.
According to OpenAI, CUA has achieved a 38.1 percent success rate on the OSWorld benchmark test for full computer use tasks, a 58.1 percent success rate on WebArena, and an 87 percent success rate on WebVoyager for web-based tasks. So use Operator when you’re open to the possibility of not having your restaurant reservation booked or your groceries ordered.
CUA’s computer vision modality works by capturing and storing screenshots, which it uses to perform chain-of-thought “reasoning” to perform the requested task. Those familiar with the controversy surrounding Microsoft’s screen capturing Recall feature in the latest version of Windows may have some concerns about how OpenAI handles screenshot data.
The Register inquired to OpenAI seeking clarification, and we’ve not heard back. The biz says disabling the “Improve the model for everyone” in ChatGPT settings – on by default – will prevent data in Operator from being used to train its models.
We know bad actors may try to misuse this technology
As mentioned above, users of Operator enter the task as a text prompt and the AI agent is expected to attempt to accomplish that task, breaking it down into a series of steps and awaiting user intervention when the user is required to log in, provide payment details or solve CAPTCHAs – something current computer vision models can do quite effectively, if allowed.
“We know bad actors may try to misuse this technology,” OpenAI said. “That’s why we’ve designed Operator to refuse harmful requests and block disallowed content. Our moderation systems can issue warnings or even revoke access for repeated violations, and we’ve integrated additional review processes to detect and address misuse.”
According to the ChatGPT maker, Operator has been designed to defend against adversarial websites that might try to lead the AI agent astray through hidden prompts, malicious code, or phishing attempts. The AI agent supposedly has been designed to detect and ignore prompt injection attacks. And it’s said to operate under the supervision of a “monitor model” that watches for dubious behavior, augmented by anomaly detection processes involving human review and automated processes.
Nonetheless, OpenAI acknowledges, “no system is flawless and this is still a research preview.”
Operator arrives amid what AI industry leaders have heralded as “the agentic era,” a time when generative AI models apply multimodal text, audio, and vision capabilities to interact with other computing systems in order to tackle multi-step tasks that require some form of reasoning and progress assessment.
While AI agents may sound promising in theory, they’ve been something of a letdown in practice – possibly because every step in a complex task adds another opportunity for failure. A recent evaluation of AI code helper Devin, for example, suggests further work will need to be done to make these systems reliable. ®
In other AI news…
- US President Donald Trump has made an executive order calling for the development of AI systems “free from ideological bias or engineered social agendas,” and undoing Biden-era policies that “act as barriers” to progress.
- Anthropic has added a Citations feature to its Claude API. “Claude can now provide detailed references to the exact sentences and passages it uses to generate responses, leading to more verifiable, trustworthy outputs,” the lab, a rival to OpenAI, announced.