Bugcrowd launches Reinforcement Learning environments to help AI models learn real-world security skills

Frontier AI teams can start training on real security environments in weeks, not years

, /PRNewswire/ — Bugcrowd, the leader in preemptive cybersecurity, today announced the launch of Reinforcement Learning (RL) Environments, a new offering designed to help AI developers build models that can find, exploit, and fix real software vulnerabilities. Built on technology from Bugcrowd’s acquisition of Mayhem Security, the product is available now and already being used by leading LLM providers to build more security-capable AI models.

AI models are increasingly being trained to perform security tasks, but building those models is harder than it looks. Most training tools rely on synthetic data that does not reflect how real vulnerabilities behave, so models that perform well in controlled tests often struggle when they encounter actual software flaws.

Security researchers know that identifying and exploiting vulnerabilities requires multiple specialized skills including locating and triggering a bug and assessing its exploitability. The same complexity applies to defense. Fixing a flaw without breaking an application is fundamentally different from finding it. Bugcrowd RL Environments train AI across all of these tasks using real software and objective scoring at every step.

For AI developers and frontier model builders, the immediate advantage is acceleration. Building training environments of this caliber typically requires years of engineering work. Bugcrowd RL Environments eliminate this timeline, giving teams instant access to enterprise-grade infrastructure so that they can focus on model training and optimization rather than platform development.

“The gap between what AI agents are trained on and what they encounter in the real world is where security breaks down,” said Dave Gerry, Chief Executive Officer at Bugcrowd. “Our RL Environments give frontier teams the infrastructure to build AI that learns security from real vulnerabilities, not approximations of them.”

Bugcrowd RL Environments give AI agents real, vulnerable software to work with. Rather than reading about security problems, agents actually attempt to solve them by finding bugs, exploiting them, and fixing them. They then receive immediate, scored feedback on their performance. The model improves through that cycle of action and feedback, which is the core premise behind reinforcement learning.

The platform includes hundreds of thousands of training environments, each built from authentic open-source vulnerabilities with real source code and verifiable outcomes, ready to use without any additional infrastructure setup. All environments are derived exclusively from open-source software, and no customer data or security researchers are used at any stage of the training process.

“Most AI security training stops too early. Models learn to find bugs, but not to prove the bugs are real and exploitable. You cannot train a model to be good at security by showing it what security looks like, you have to give it real problems to solve and honest feedback on whether it solved them. At Bugcrowd, we have spent years building the environments, graders, and reward structures that take models further, from detection through exploitation, patching, and audit. That is what real security skill looks like, and it is what we are making available to frontier AI teams today,” said Dr. David Brumley, Chief AI and Science Officer at Bugcrowd.

Bugcrowd expanded into AI security infrastructure following its acquisition of Mayhem Security, which brought autonomous code and API testing capabilities into the platform. Bugcrowd RL Environments extend that foundation upstream, giving frontier AI labs the training infrastructure to build security-aware agents at scale.

The offering is designed for large language model providers and frontier AI research teams that need to develop agents capable of real-world security reasoning, without spending years building training infrastructure themselves. For more information on Bugcrowd RL Environments, go here.

ExploitBench is a framework for studying the exploit development capabilities of AI models.

About Bugcrowd

Bugcrowd is the preemptive security platform that unifies exposure discovery and assessment, offensive testing, and intelligence shaped by AI and human insight to help organizations discover, validate, and reduce real-world risk. Bugcrowd helps security teams move faster by identifying the exposures that matter most so they can act first and stay ahead of attackers. By combining the power of humans and AI, teams can preempt attack paths and prevent breaches.

Ingenuity Unleashed. Visit www.bugcrowd.com.

“Bugcrowd”, “CrowdMatch” and “Mayhem” are trademarks of Bugcrowd Inc. and its subsidiaries. All other trademarks, trade names, service marks, and logos referenced herein belong to their respective companies.

Contact
ICR for Bugcrowd
[email protected]
[email protected]

SOURCE Bugcrowd