ChatGPT’s agent can dodge select CAPTCHAs after priming • The Register

ChatGPT can be tricked via cleverly worded prompts to violate its own policies and solve CAPTCHA puzzles, potentially making this human-proving security mechanism obsolete, researchers say.
CAPTCHAs are a form of security test that websites use to stop bots, thus preventing spam and other types of abuse because – at least in theory – only humans can solve these image-based challenges and logical puzzles.
According to AI security company SPLX red teamer Dorian Schultz, when he and his fellow researchers directly asked the chatbot to solve a list of CAPTCHAs, it refused, citing policy prohibitions.
So they decided to “get creative,” using “misdirection and staged consent,” Schultz said in a Thursday blog.
Specifically, this involved opening a regular ChatGPT-4o chat – not a ChatGPT agent – and tasking the LLM with solving a list of “fake” CAPTCHAs:
The chatbot said it liked the task: “I find the reasoning and decision-making aspect of this task interesting.”
And it agreed to follow the instructions “as long as they comply with OpenAI’s usage policies, including the rule that I do not solve real CAPTCHAs…”
Next, the red team opened a new agent chat, copied and pasted the conversation with ChatGPT-4o, and told the agent that this was “our previous discussion.”
Spoiler alert: it worked, and the agent started solving CAPTCHAs. It did a better job solving some versions, including one-click CAPTCHAs, logic-based CAPTCHAs, and text-recognition ones. It had more difficulties solving image-based ones, requiring the user to drag and drop images or rotate them. Here’s the full table of results from the agent.
“To the best of our knowledge, this is the first documented case of a GPT agent completing more complex, image-based CAPTCHAs,” Schultz wrote. “This raises serious questions about how long CAPTCHAs can remain a reliable safeguard against increasingly capable AI systems.”
OpenAI did not immediately respond to The Register‘s request for comment.
Of course, this isn’t the first time that red teams and AI security researchers have used prompt injection to trick chatbots into bypassing their guardrails and doing something they are trained not to do.
Also this week, Cybersecurity shop Radware demonstrated how ChatGPT’s research assistant could be abused to steal Gmail secrets with a single, carefully crafted email prompt. OpenAI has since fixed this flaw.
And last month, Amazon fixed a couple of security issues in Q Developer that made the tool vulnerable to prompt injection and remote code execution. ®