“This step is necessary to prove I’m not a bot,” wrote the bot as it passed an anti-AI screening step.
Maybe they should change the button to say, “I am a robot”?
On Friday, OpenAI’s new ChatGPT Agent, which can perform multistep tasks for users, proved it can pass through one of the Internet’s most common security checkpoints by clicking Cloudflare’s anti-bot verification—the same checkbox that’s supposed to keep automated programs like itself at bay.
ChatGPT Agent is a feature that allows OpenAI’s AI assistant to control its own web browser, operating within a sandboxed environment with its own virtual operating system and browser that can access the real Internet. Users can watch the AI’s actions through a window in the ChatGPT interface, maintaining oversight while the agent completes tasks. The system requires user permission before taking actions with real-world consequences, such as making purchases. Recently, Reddit users discovered the agent could do something particularly ironic.
The evidence came from Reddit, where a user named “logkn” of the r/OpenAI community posted screenshots of the AI agent effortlessly clicking through the screening step before it would otherwise present a CAPTCHA (short for “Completely Automated Public Turing tests to tell Computers and Humans Apart”) while completing a video conversion task—narrating its own process as it went.
A screenshot of ChatGPT Agent clicking through a Cloudflare bot screening test. Credit: logkn via Reddit
The screenshots shared on Reddit capture the agent navigating a two-step verification process: first clicking the “Verify you are human” checkbox, then proceeding to click a “Convert” button after the Cloudflare challenge succeeds. The agent provides real-time narration of its actions, stating “The link is inserted, so now I’ll click the ‘Verify you are human’ checkbox to complete the verification on Cloudflare. This step is necessary to prove I’m not a bot and proceed with the action.”
The absurdity of an AI agent declaring it needs to prove it’s “not a bot” while clicking through anti-bot measures has not been lost on observers. “In all fairness, it’s been trained on human data why would it identify as a bot? We should respect that choice,” joked one Reddit user in a reply.
The CAPTCHA arms race
While the agent didn’t face an actual CAPTCHA puzzle with images in this case, successfully passing Cloudflare’s behavioral screening that determines whether to present such challenges demonstrates sophisticated browser automation.
To understand the significance of this capability, it’s important to know that CAPTCHA systems have served as a security measure on the web for decades. Computer researchers invented the technique in the 1990s to screen bots from entering information into websites, originally using images with letters and numbers written in wiggly fonts, often obscured with lines or noise to foil computer vision algorithms. The assumption is that the task will be easy for humans but difficult for machines.
Cloudflare’s screening system, called Turnstile, often precedes actual CAPTCHA challenges and represents one of the most widely deployed bot-detection methods today. The checkbox analyzes multiple signals, including mouse movements, click timing, browser fingerprints, IP reputation, and JavaScript execution patterns to determine if the user exhibits human-like behavior. If these checks pass, users proceed without seeing a CAPTCHA puzzle. If the system detects suspicious patterns, it escalates to visual challenges.
The ability for an AI model to defeat a CAPTCHA isn’t entirely new (although having one narrate the process feels fairly novel). AI tools have been able to defeat certain CAPTCHAs for a while, which has led to an arms race between those that create them and those that defeat them. OpenAI’s Operator, an experimental web-browsing AI agent launched in January, faced difficulty clicking through some CAPTCHAs (and was also trained to stop and ask a human to complete them), but the latest ChatGPT Agent tool has seen a much wider release.
It’s tempting to say that the ability of AI agents to pass these tests puts the future effectiveness of CAPTCHAs into question, but for as long as there have been CAPTCHAs, there have been bots that could later defeat them. As a result, recent CAPTCHAs have become more of a way to slow down bot attacks or make them more expensive rather than a way to defeat them entirely. Some malefactors even hire out farms of humans to defeat them in bulk.
CAPTCHAs also have unexpected benefits for those who run them. Since 2007, the reCAPTCHA project began using its tests as a form of free labor for tasks like digitizing books and training machine-learning algorithms. Google acquired reCAPTCHA in 2009 and expanded its use to decode Google Street View addresses, extracting vision knowledge from human users solving challenges. Today’s reCAPTCHA challenges help Google train AI models for image recognition—creating an ironic cycle where humans proving they’re not robots are actually helping to make AI better at defeating future CAPTCHAs.
In a way, that future may have arrived. ChatGPT Agent’s demonstration showcases the agent tool’s ability to process visual context and navigate multi-step processes that would typically require human judgment. In the screenshots, the agent recognizes when verification is needed and completes it as part of a larger workflow—behavior that goes beyond simple scripted automation.
CAPTCHAs are just one example of the complex tasks ChatGPT Agent can handle. For example, another Reddit user showed off a photo of a load of groceries that Agent apparently purchased. “I had agent mode order me some groceries from a local supermarket while I worked yesterday for pickup this morning,” the Reddit user wrote. “It actually worked without any issue and did an okay job making a grocery list that works for me. I gave it barely any detail in my instructions other than to avoid red meat, prioritize health and keep it under $150.”
But ChatGPT Agent isn’t perfect. Some terrible website user interfaces are apparently better than CAPTCHA checkpoints at foiling the new bot. “Your agent did way better than mine,” wrote one Reddit reply. “Mine couldn’t figure out how to get to the stop and shop website.”
Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.