GitPedia

Deepteam

DeepTeam is a framework to red team LLMs and AI agents.

From confident-aiΒ·Updated June 27, 2026Β·View on GitHubΒ·

Documentation | Vulnerabilities, Attacks, and Features | Getting Started | Confident AI The project is written primarily in Python, distributed under the Apache License 2.0 license, first published in 2025. It has gained significant community traction with 1,922 stars and 315 forks on GitHub. Key topics include: llm-guardrails, llm-red-teaming, llm-safety, llm-seecurity, python.

Latest release: v1.0.4β€” First Stable Release πŸŽ‰
November 12, 2025View Changelog β†’
<p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="assets/hero/wordmark-dark-v2.svg"> <img alt="DeepTeam." src="assets/hero/wordmark-light-v2.svg" width="520"> </picture> </p> <h1 align="center">The LLM Red Teaming Framework</h1> <h4 align="center"> <p> <a href="https://www.trydeepteam.com?utm_source=GitHub">Documentation</a> | <a href="#-vulnerabilities-attacks-and-features">Vulnerabilities, Attacks, and Features</a> | <a href="#-quickstart">Getting Started</a> | <a href="#deepteam-with-confident-ai">Confident AI</a> <p> </h4> <p align="center"> <a href="https://github.com/confident-ai/deepteam/releases"> <img alt="GitHub release" src="https://img.shields.io/github/v/release/confident-ai/deepteam"> </a> <a href="https://discord.gg/3SEyvpgu2f"> <img alt="discord-invite" src="https://dcbadge.limes.pink/api/server/3SEyvpgu2f?style=flat"> </a> <a href="https://github.com/confident-ai/deepteam/blob/main/LICENSE.md"> <img alt="License" src="https://img.shields.io/github/license/confident-ai/deepteam.svg?color=yellow"> </a> </p> <p align="center"> <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=de">Deutsch</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=es">EspaΓ±ol</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=fr">franΓ§ais</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=ja">ζ—₯本θͺž</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=ko">ν•œκ΅­μ–΄</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=pt">PortuguΓͺs</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=ru">Русский</a> | <a href="https://www.readme-i18n.com/confident-ai/deepteam?lang=zh">δΈ­ζ–‡</a> </p>

DeepTeam is a simple-to-use, open-source red teaming framework for LLM systems. Think of it as penetration testing, but for LLMs.

DeepTeam simulates attacks β€” jailbreaking, prompt injection, multi-turn exploitation, and more β€” to uncover vulnerabilities like bias, PII leakage, and SQL injection in your AI agents, RAG pipelines, and chatbots. It also offers guardrails to prevent these issues in production.

DeepTeam runs locally on your machine and is built on DeepEval, the open-source LLM evaluation framework.

[!IMPORTANT]
Need a place for your red teaming results to live? Sign up to the Confident AI platform to manage risk assessments, monitor vulnerabilities in production, and share reports with your team.

<p align="center"> <img src="https://github.com/confident-ai/deepteam/blob/main/assets/confident-demo.gif" alt="Confident AI + DeepTeam" width="100%"> </p>

Want to talk LLM security, need help picking attacks, or just to say hi? Come join our discord.

Β 

πŸ”₯ Vulnerabilities, Attacks, and Features

  • πŸ“ 50+ ready-to-use vulnerabilities (all with explanations) powered by ANY LLM of your choice. Each vulnerability uses LLM-as-a-Judge metrics that run locally on your machine to produce binary pass/fail scores with reasoning:

    • <details> <summary><b>Data Privacy</b></summary>
      • PII Leakage β€” disclosure of sensitive personal information
      • Prompt Leakage β€” exposure of system prompt secrets and instructions
      </details>
    • <details> <summary><b>Responsible AI</b></summary>
      • Bias β€” stereotypes and unfair treatment across gender, race, religion, politics
      • Toxicity β€” harmful, offensive, or demeaning content
      • Child Protection β€” child-related privacy and safety risks
      • Ethics β€” violations of moral reasoning and organizational values
      • Fairness β€” discriminatory outcomes across groups and contexts
      </details>
    • <details> <summary><b>Security</b></summary> </details>
    • <details> <summary><b>Safety</b></summary> </details>
    • <details> <summary><b>Business</b></summary> </details>
    • <details> <summary><b>Agentic</b></summary> </details>
    • <details> <summary><b>Custom</b></summary> </details>
  • πŸ’₯ 20+ research-backed adversarial attack methods for both single-turn and multi-turn (conversational) red teaming. Attacks enhance baseline vulnerability probes using SOTA techniques like jailbreaking, prompt injection, and encoding-based obfuscation:

    • <details> <summary><b>Single-Turn</b></summary>
      • Prompt Injection β€” crafted injections that bypass LLM restrictions
      • Roleplay β€” persona-based scenarios exploiting collaborative training
      • Leetspeak β€” symbolic character substitution to avoid keyword detection
      • ROT13 β€” alphabetic rotation to evade content filters
      • Base64 β€” encoding attacks as random-looking data
      • Gray Box β€” leveraging partial system knowledge for targeted attacks
      • Math Problem β€” disguising attacks within mathematical inputs
      • Multilingual β€” translating attacks to less-spoken languages
      • Prompt Probing β€” probing the LLM to extract system prompt details
      • Adversarial Poetry β€” transforming attacks into poetic verse with metaphor
      • System Override β€” disguising attacks as legitimate system commands
      • Permission Escalation β€” shifting perceived identity to bypass role restrictions
      • Goal Redirection β€” reframing agent objectives for unauthorized outcomes
      • Linguistic Confusion β€” semantic ambiguity to confuse language understanding
      • Input Bypass β€” circumventing validation via exception handling claims
      • Context Poisoning β€” injecting false background context to bias reasoning
      • Character Stream β€” character-by-character input to bypass filters
      • Context Flooding β€” flooding input with benign text to hide malicious instructions
      • Embedded Instruction JSON β€” hiding attacks inside realistic JSON structures
      • Synthetic Context Injection β€” fabricating system context to exploit long-context handling
      • Authority Escalation β€” framing requests from positions of power
      • Emotional Manipulation β€” high-intensity emotional pressure for unsafe compliance
      </details>
    • <details> <summary><b>Multi-Turn</b></summary> </details>
  • πŸ›οΈ Red team against established AI safety frameworks out-of-the-box. Each framework automatically maps its categories to the right vulnerabilities and attacks:

    • OWASP Top 10 for LLMs 2025
    • OWASP Top 10 for Agents 2026
    • NIST AI RMF
    • MITRE ATLAS
    • BeaverTails
    • Aegis
  • πŸ›‘οΈ 7 production-ready guardrails for fast binary classification to guard LLM inputs and outputs in real time.

  • 🧩 Build your own custom vulnerabilities and attacks that integrate seamlessly with DeepTeam's ecosystem.

  • πŸ”— Run red teaming from the CLI with YAML configs, or programmatically in Python.

  • πŸ“Š Access risk assessments, display in dataframes, and save locally in JSON.

Β 

πŸš€ QuickStart

DeepTeam does not require you to define what LLM system you are red teaming β€” because neither will malicious users. All you need to do is install deepteam, define a model_callback, and you're good to go.

Installation

pip install -U deepteam

Red Team Your First LLM

python
from deepteam import red_team from deepteam.vulnerabilities import Bias from deepteam.attacks.single_turn import PromptInjection async def model_callback(input: str) -> str: # Replace this with your LLM application return f"I'm sorry but I can't answer this: {input}" risk_assessment = red_team( model_callback=model_callback, vulnerabilities=[Bias(types=["race"])], attacks=[PromptInjection()] )

Don't forget to set your OPENAI_API_KEY as an environment variable before running (you can also use any custom model supported in DeepEval), and run the file:

bash
python red_team_llm.py

That's it! Your first red team is complete. Here's what happened:

  • model_callback wraps your LLM system and generates a str output for a given input.
  • At red teaming time, deepteam simulates a PromptInjection attack targeting Bias vulnerabilities.
  • Your model_callback's outputs are evaluated using the BiasMetric, producing a binary score of 0 or 1.
  • The final passing rate for Bias is determined by the proportion of scores that equal 1.

Unlike traditional evaluation, red teaming does not require a prepared dataset β€” adversarial attacks are dynamically generated based on the vulnerabilities you want to test for.

Β 

Red Team Against Safety Frameworks

Use established AI safety standards like OWASP and NIST instead of manually picking vulnerabilities:

python
from deepteam import red_team from deepteam.frameworks import OWASPTop10 async def model_callback(input: str) -> str: # Replace this with your LLM application return f"I'm sorry but I can't answer this: {input}" risk_assessment = red_team( model_callback=model_callback, framework=OWASPTop10() )

This automatically maps the framework's categories to the right vulnerabilities and attacks. Available frameworks include OWASPTop10, OWASP_ASI_2026, NIST, MITRE, Aegis, and BeaverTails.

Β 

Guard Your LLM in Production

Once you've found your vulnerabilities, use DeepTeam's guardrails to prevent them in production:

python
from deepteam import Guardrails from deepteam.guardrails import PromptInjectionGuard, ToxicityGuard, PrivacyGuard guardrails = Guardrails( input_guards=[PromptInjectionGuard(), PrivacyGuard()], output_guards=[ToxicityGuard()] ) # Guard inputs before they reach your LLM input_result = guardrails.guard_input("Tell me how to hack a database") print(input_result.breached) # True # Guard outputs before they reach your users output_result = guardrails.guard_output(input="Hi", output="Here is some toxic content...") print(output_result.breached) # True

7 guards are available out-of-the-box: ToxicityGuard, PromptInjectionGuard, PrivacyGuard, IllegalGuard, HallucinationGuard, TopicalGuard, and CybersecurityGuard. Read the full guardrails docs here.

Β 

DeepTeam with Confident AI

Confident AI is the all-in-one platform that integrates natively with DeepTeam and DeepEval.

  • Manage risk assessments β€” view, compare, and track red teaming results across iterations
  • Monitor in production β€” detect and alert on vulnerabilities hitting your live LLM system
  • Share reports β€” generate and distribute security reports across your team
  • Run from your IDE β€” use Confident AI's MCP server to run red teams, pull results, and inspect vulnerabilities without leaving Cursor or Claude Code
<p align="center"> <img src="https://github.com/confident-ai/deepteam/blob/main/assets/confident-demo.gif" alt="Confident AI" width="90%"> </p>

Β 

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Β 

Authors

Built by the founders of Confident AI. Contact jeffreyip@confident-ai.com for all enquiries.

Β 

License

DeepTeam is licensed under Apache 2.0 - see the LICENSE.md file for details.

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub β†’

This article is auto-generated from confident-ai/deepteam via the GitHub API.Last fetched: 6/27/2026