Research shows malicious LLM prompts can be detected in real time

research-shows-malicious-llm-prompts-can-be-detected-in-real-time
Research shows malicious LLM prompts can be detected in real time
LLM

New research from cloud security platform Upwind and NVIDIA demonstrates how malicious LLM prompts can be detected with approximately 95 percent precision, while maintaining sub-millisecond inference for real-time traffic.

As Gartner predicts that more than 80 percent of enterprises will use generative AI APIs, models, or deployed enabled applications in production this year, application security is undergoing a fundamental shift. The interface itself, natural language, has become the attack surface.

“LLMs don’t just process input, they interpret intent,” says Mose Hassan, VP research and innovation, at Upwind. “That changes the security model entirely. Organizations aren’t just trying to block bad code anymore, they have to stop attempts that twist language and manipulate systems. Our research with NVIDIA shows you can do that effectively in live production environments, without slowing things down or driving up costs.”

Rather than relying on a single heavyweight model or static rules, Upwind has engineered a layered detection system designed around challenges including, latency, cost, false-positive tolerance and explainability.

This operates in three stages, first to identify if a request is for an LLM, thus ensuring analysis is only carried out when needed. Requests to an LLM are then analyzed using the NVIDIA nv-embedcode-7b-v1 model, deployed through NVIDIA NIM microservices. The model achieved 94.53 percent detection accuracy, while maintaining inference times well under 0.1 milliseconds.

The third and final step is to escalate high risk or ambiguous cases to the NVIDIA Nemotron-3-Nano-30B model, integrated with NVIDIA NeMo Guardrails, which acts as a reasoning layer to validate findings, reduce false positives and provide explanations aligned with security frameworks.

As LLM moves into enterprise workflows the models introduce new threat categories including prompt injection, jailbreaks, data exfiltration and social engineering. Traditional security controls are poorly suited to these threats. You can find out more about Upwind’s research into combating these threats on the company’s site.

Image Credit: Sascha Winter/Dreamstime.com