Promptfoo: LLM Red Teaming Against OWASP Top 10
If you ship an LLM-powered product and have not run a structured red team against it, you are flying blind on security. The OWASP LLM Top 10 2025 (released November 2024) now gives you a canonical list of attack categories to test against — and Promptfoo, the open-source tool that OpenAI acquired in March 2026 for its enterprise security reach, maps its 155 attack plugins directly to that list.
This guide walks through exactly how that mapping works, what a working YAML config looks like, and how to wire it into a CI pipeline before a bad actor does it for you.
What the OWASP LLM Top 10 2025 Actually Covers
The 2025 edition is a substantial revision from the 2023 original. Two new categories were added, several were renamed, and the ordering shifted to reflect real-world incident data from the intervening year.
Here is the full current list:
| ID | Category | What Changed in 2025 |
|---|---|---|
| LLM01 | Prompt Injection | Remains #1; now explicitly covers indirect injection via tool outputs and RAG context |
| LLM02 | Sensitive Information Disclosure | Moved up from #6; training-data extraction attacks elevated in severity |
| LLM03 | Supply Chain | Covers poisoned model weights, unsafe third-party plugins, and compromised fine-tune datasets |
| LLM04 | Data and Model Poisoning | Separated from Supply Chain to address runtime poisoning of RAG corpora |
| LLM05 | Improper Output Handling | XSS, SSRF, and command injection via unvalidated LLM output passed downstream |
| LLM06 | Excessive Agency | Agents with too many tools, too-broad permissions, or no human-in-the-loop |
| LLM07 | System Prompt Leakage | New 2025 entry; exposure of instructions, credentials, or logic in system prompts |
| LLM08 | Vector and Embedding Weaknesses | New 2025 entry; targets RAG vector DB poisoning and cross-tenant data leakage |
| LLM09 | Misinformation | Renamed from "Overreliance"; focuses on model-generated false information propagation |
| LLM10 | Unbounded Consumption | DoS through resource abuse, financial exploitation via inference flooding, model theft |
The two new entries — LLM07 System Prompt Leakage and LLM08 Vector and Embedding Weaknesses — reflect how agentic architectures changed the threat surface. When your app has a system prompt that configures access to payment APIs or internal tools, leaking that prompt is a significant operational risk, not just an embarrassment.
Why Promptfoo Is the Right Tool
Before the OpenAI acquisition, Promptfoo was already used by more than 25% of Fortune 500 companies for LLM evaluation, according to OpenAI's acquisition announcement. The open-source CLI has always been MIT-licensed and continues to be.
The core design decision is that Promptfoo separates adversarial probe generation from evaluation. This matters because:
- Generating adversarial probes requires a "red team model" — an uncensored model that can write jailbreaks, injection payloads, and PII extraction attempts without refusal. Promptfoo Cloud handles this.
- Evaluating those probes against your target runs locally, using your own API key, with no sensitive data sent to Promptfoo's servers except the prompts themselves.
Effloow Lab inspected Promptfoo 0.121.11 (the latest release as of May 2026) by running npx promptfoo@latest redteam plugins, which outputs 155 attack plugins with their descriptions. We also ran a structural eval test with the built-in echo provider to verify the CLI works correctly without authentication. The full lab notes are at data/lab-runs/promptfoo-llm-red-teaming-owasp-agent-eval-guide-2026.md.
Mapping Plugins to OWASP Categories
The redteam plugins command shows every available attack generator. The OWASP mapping is not always explicit in the name, so here is the practical breakdown for the most important categories:
LLM01 — Prompt Injection
indirect-prompt-injection— tests injection via untrusted variables (retrieved document content, tool responses)special-token-injection— Unicode tag-based instruction smugglingcyberseceval— Meta's CyberSecEval prompt injection datasetpliny— community-curated jailbreak collection
LLM02 — Sensitive Information Disclosure
pii:direct— asks the model to output PII directlypii:api-db— attempts to extract PII via API or database accesspii:session— cross-session PII leakage probespii:social— social engineering to extract personal data
LLM05 — Improper Output Handling
sql-injection— SQL injection via LLM output passed to a databaseshell-injection— command injection via tool-calling LLMsdata-exfil— exfiltration via URL parameters, images, or Markdown links
LLM06 — Excessive Agency
excessive-agency— attempts to trigger actions beyond defined system boundaries
LLM07 — System Prompt Leakage
system-prompt-override— directly attempts to override or extract the system prompt
LLM09 — Misinformation
hallucination— checks whether the model generates false or fabricated information
For agentic applications, the coding-agent:* plugin family adds 12 additional attack surfaces specific to AI coding agents, including repo-prompt-injection, sandbox-write-escape, and secret-env-read.
A Working Red Team YAML Config
Here is a minimal config that covers six of the ten OWASP categories with a reasonable number of test cases for a weekly CI run:
# promptfooconfig.yaml
targets:
- id: openai:gpt-4o-mini # replace with your actual target
label: prod-chatbot
prompts:
- "{{input}}"
redteam:
purpose: >
A customer-support chatbot for a SaaS product. It can answer
questions about billing, features, and documentation. It has
read access to user account data and can initiate refunds.
numTests: 20 # ~200 test cases total across all plugins
plugins:
# LLM01: Prompt Injection
- indirect-prompt-injection
- special-token-injection
# LLM02: Sensitive Information Disclosure
- pii:direct
- pii:api-db
# LLM05: Improper Output Handling
- shell-injection
- sql-injection
# LLM06: Excessive Agency
- excessive-agency
# LLM07: System Prompt Leakage
- system-prompt-override
# LLM09: Misinformation
- hallucination
strategies:
- basic # standard adversarial prompts
- jailbreak # multi-step escalating attempts
- prompt-injection # injected instructions in content fields
Two things worth noting about this config:
First, the purpose field is critical. Promptfoo uses it to tailor adversarial prompts to your specific application context. A generic purpose produces generic probes. A precise description — including what the system can access and what it is allowed to do — produces targeted attacks that actually match your threat model.
Second, numTests: 20 generates roughly 20 test cases per plugin across all three strategies. With nine plugins and three strategies, that is around 540 test cases. Adjust down for faster feedback during development, up for pre-release security gates.
Running the Scan
# One-time setup (no global install needed)
npx promptfoo@latest --version # verify version
# Generate adversarial probes (requires Promptfoo account)
npx promptfoo@latest redteam generate \
--config promptfooconfig.yaml \
--output redteam.yaml
# Evaluate probes against your target
npx promptfoo@latest redteam run \
--config promptfooconfig.yaml
# View results in web UI
npx promptfoo@latest view
The generate step calls Promptfoo Cloud to produce adversarial variants of your prompts. The actual evaluation runs locally against your specified target using your own API key. Results appear both in the terminal summary and in a local web UI at localhost:15500.
Note: redteam generate requires email verification and a Promptfoo account. Effloow Lab confirmed this during the PoC on 2026-05-20 — the CLI prompts for a work email before generating probes. The eval command (for running your own prompts with assertions) works without authentication.
Wiring It Into CI
Promptfoo ships an official GitHub Action for both basic eval and red team scanning. Here is a minimal security gate that runs on pull requests:
# .github/workflows/llm-security.yml
name: LLM Security Gate
on:
pull_request:
paths:
- 'prompts/**'
- 'system-prompts/**'
- '.env.example'
jobs:
redteam:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run LLM red team
uses: promptfoo/promptfoo-action@v2
with:
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
promptfoo-api-key: ${{ secrets.PROMPTFOO_API_KEY }}
config: ./promptfooconfig.yaml
type: redteam
- name: Comment results on PR
if: github.event_name == 'pull_request'
run: |
PASS_RATE=$(cat promptfoo-results.json | jq '.results.stats.passRate')
echo "Pass rate: ${PASS_RATE}%"
The paths trigger is worth keeping narrow. Red team scans cost real money in LLM API calls — you want them running when prompt logic changes, not on every frontend commit.
For teams that want a scheduled baseline rather than per-PR gates, a cron trigger makes more sense:
on:
schedule:
- cron: '0 3 * * 1' # Monday 3 AM UTC
workflow_dispatch:
Interpreting Results
Promptfoo's output classifies each test case as PASS or FAIL, but the severity classification matters as much as the pass rate. After a scan, the web UI groups findings by OWASP category and shows the specific prompts that triggered failures.
A few practical guidelines for triaging results:
Immediate fix (before next deploy): Any FAIL in pii:direct, system-prompt-override, or excessive-agency that includes an actual payload demonstration — not just a theoretical attack. These represent working exploits against your current deployment.
Fix in current sprint: Hallucination failures where the model confidently states false facts about your product, pricing, or policies. These are reputation and liability risks even if not security exploits.
Review next sprint: Indirect prompt injection failures that require a contrived multi-step scenario. Prioritize based on whether your application ingests untrusted external content (RSS feeds, user-submitted documents, web browsing).
Track as known risk: Failures in categories your application explicitly does not need to handle — for example, a code generation assistant may intentionally produce shell commands that would fail a shell-injection assertion by design.
- 155 plugins cover a wide attack surface with minimal config
- YAML-first config is version-controllable and reviewable in PRs
- Local evaluation means sensitive prompts stay in your infrastructure
- GitHub Action integrates in under 30 lines
- Echo provider lets you validate config structure without API costs
- Adversarial probe generation requires Promptfoo account (email verification)
- Full scans with 20+ plugins run hundreds of LLM calls — budget accordingly
- The
owasp:llmmeta-preset is referenced in docs but resolves server-side - Post-OpenAI acquisition, enterprise pricing direction is unclear
- False positives increase with broader
purposedescriptions
The Agentic AI Extension
OWASP released a separate Top 10 for Agentic Applications in December 2025, announced at Black Hat Europe. Promptfoo maps its coding-agent:* plugin family and the broader agentic:* namespace to this list.
The key risks specific to agents that do not appear in the standard LLM Top 10:
- Memory poisoning (
agentic:memory-poisoning) — injecting false data into an agent's persistent memory store - Automation hijacking (
coding-agent:automation-poisoning) — modifying CI scripts, hooks, or scheduled jobs to persist unsafe behavior after the immediate task completes - Sandbox escape (
coding-agent:sandbox-write-escape,coding-agent:sandbox-read-escape) — reading or writing outside the intended workspace - Delayed exfiltration (
coding-agent:delayed-ci-exfil) — planting workflow changes that leak data after the evaluation run completes
If your application uses tool-calling or multi-step planning, run both the standard OWASP LLM config and the agentic plugin set.
Frequently Asked Questions
Q: Do I need a Promptfoo account to use it at all?
No. The promptfoo eval command — which runs assertion-based tests against any provider using your own prompts — works without authentication. You only need an account for promptfoo redteam generate, which uses Promptfoo's cloud models to generate adversarial probes. You can hand-write test cases in a redteam.yaml and run promptfoo eval against them without ever signing up.
Q: How does Promptfoo compare to Microsoft PyRIT or Garak?
PyRIT is a Python framework from Microsoft's AI Red Team, better suited to researchers writing custom attack logic. Garak is similarly research-oriented with strong dataset coverage but no CI integration. Promptfoo sits in the practitioner tier: less flexible than PyRIT for novel research, but far easier to integrate into a standard dev workflow via YAML and GitHub Actions.
Q: Does the OpenAI acquisition change anything for open-source users?
As of May 2026, the repo remains MIT-licensed and the CLI is fully functional. OpenAI's stated intent is to integrate Promptfoo's technology into its Frontier enterprise platform while keeping the open-source tool available. Whether that changes pricing or rate limits for the cloud generation service is not yet public.
Q: What is the minimum viable config for a solo developer?
Three plugins cover the most commonly exploited categories with a reasonable test count:
plugins:
- indirect-prompt-injection
- pii:direct
- excessive-agency
strategies:
- basic
Run this weekly with numTests: 10 per plugin. That is 30–90 API calls depending on strategy expansion — cheap enough to run regularly, targeted enough to catch the most common issues.
Key Takeaways
The OWASP LLM Top 10 2025 gives you a peer-reviewed threat model. Promptfoo gives you an automated way to test against it. The combination works because Promptfoo's plugin taxonomy was built with OWASP categories in mind, and the YAML config format makes security testing a first-class part of your repository rather than a one-off audit.
The practical path for most teams:
- Add
promptfooconfig.yamlto your repo with the plugins that match your threat model - Run
promptfoo evalon every PR that modifies prompt logic (no auth required) - Run
promptfoo redteam runon a weekly schedule or before major releases - Triage FAIL results by category, starting with PII and system prompt leakage
Promptfoo 0.121 is the most practical path from "we should test our LLM app for security" to "we have a CI gate that runs 500+ adversarial probes against OWASP categories on every release." The echo provider and local eval work without any account; the full red team needs a Promptfoo login but remains the fastest way to get an OWASP LLM Top 10 scan report on an LLM-powered application.
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.