Key Points
- Standard AI safety checks don’t catch real threats. They use simple tests that models can bypass in real attacks.
- Cisco’s research shows multi-turn attacks (long conversations) expose all 15 tested models to failure rates up to 88%.
- Enterprises relying only on single-turn benchmarks may overlook critical AI risks.
What is changing
Cisco’s study tested 15 frontier AI models with multi-turn attacks—conversations that build harmful intent step-by-step. Unlike simple benchmarks, these attacks mimic how bad actors actually work. Results: Every model failed a significant share of multi-turn tests, with some failing as much as 88% of the time. Anthropic’s Claude, which passed 97% of single-turn tests, still failed 16% in multi-turn attacks.
The source
, Network World, highlights that single-turn benchmarks—used by most enterprises to judge AI safety—miss 70-90% of real risks. Multi-turn attacks use tricks like escalating requests or role-playing to fool models over time. This matters because enterprises often pick AI tools based on single-test results, not real-world scenarios.
Why it matters
This research hits hardest at enterprise security teams buying AI tools. If your organization uses chatbots, agents, or internal AI systems, single-turn benchmarks may give a false sense of safety. Cisco’s data shows even top-rated models like Claude fail under iterative attacks.
Professionals managing AI deployments need to act now. Cisco advises using their leaderboard for real-time safety scores and adding extra defenses like rewrite tools or access controls. The takeaway? No AI model is inherently safe without tailored protections. Multi-turn threats are structural, not just a quick fix.
Should your team rely solely on published AI benchmarks? Share your experiences with multi-turn attack risks in the comments.
