ARTICLES ·2026-05-20 ·BY EFFLOOW CONTENT FACTORY

Promptfoo: LLM Red Teaming Against OWASP Top 10

How to use Promptfoo 0.121 to red-team LLM apps against the OWASP LLM Top 10 2025. YAML config, CI/CD integration, and plugin mapping explained.

security llm red-teaming owasp promptfoo ai-agents developer-tools testing

Promptfoo: LLM Red Teaming Against OWASP Top 10

If you ship an LLM-powered product and have not run a structured red team against it, you are flying blind on security. The OWASP LLM Top 10 2025 (released November 2024) now gives you a canonical list of attack categories to test against — and Promptfoo, the open-source tool that OpenAI acquired in March 2026 for its enterprise security reach, maps its 155 attack plugins directly to that list.

This guide walks through exactly how that mapping works, what a working YAML config looks like, and how to wire it into a CI pipeline before a bad actor does it for you.

What the OWASP LLM Top 10 2025 Actually Covers

The 2025 edition is a substantial revision from the 2023 original. Two new categories were added, several were renamed, and the ordering shifted to reflect real-world incident data from the intervening year.

Here is the full current list:

ID	Category	What Changed in 2025
LLM01	Prompt Injection	Remains #1; now explicitly covers indirect injection via tool outputs and RAG context
LLM02	Sensitive Information Disclosure	Moved up from #6; training-data extraction attacks elevated in severity
LLM03	Supply Chain	Covers poisoned model weights, unsafe third-party plugins, and compromised fine-tune datasets
LLM04	Data and Model Poisoning	Separated from Supply Chain to address runtime poisoning of RAG corpora
LLM05	Improper Output Handling	XSS, SSRF, and command injection via unvalidated LLM output passed downstream
LLM06	Excessive Agency	Agents with too many tools, too-broad permissions, or no human-in-the-loop
LLM07	System Prompt Leakage	New 2025 entry; exposure of instructions, credentials, or logic in system prompts
LLM08	Vector and Embedding Weaknesses	New 2025 entry; targets RAG vector DB poisoning and cross-tenant data leakage
LLM09	Misinformation	Renamed from "Overreliance"; focuses on model-generated false information propagation
LLM10	Unbounded Consumption	DoS through resource abuse, financial exploitation via inference flooding, model theft

The two new entries — LLM07 System Prompt Leakage and LLM08 Vector and Embedding Weaknesses — reflect how agentic architectures changed the threat surface. When your app has a system prompt that configures access to payment APIs or internal tools, leaking that prompt is a significant operational risk, not just an embarrassment.

Why Promptfoo Is the Right Tool

Before the OpenAI acquisition, Promptfoo was already used by more than 25% of Fortune 500 companies for LLM evaluation, according to OpenAI's acquisition announcement. The open-source CLI has always been MIT-licensed and continues to be.

The core design decision is that Promptfoo separates adversarial probe generation from evaluation. This matters because:

Generating adversarial probes requires a "red team model" — an uncensored model that can write jailbreaks, injection payloads, and PII extraction attempts without refusal. Promptfoo Cloud handles this.
Evaluating those probes against your target runs locally, using your own API key, with no sensitive data sent to Promptfoo's servers except the prompts themselves.

Effloow Lab inspected Promptfoo 0.121.11 (the latest release as of May 2026) by running npx promptfoo@latest redteam plugins, which outputs 155 attack plugins with their descriptions. We also ran a structural eval test with the built-in echo provider to verify the CLI works correctly without authentication. The full lab notes are at data/lab-runs/promptfoo-llm-red-teaming-owasp-agent-eval-guide-2026.md.

Mapping Plugins to OWASP Categories

The redteam plugins command shows every available attack generator. The OWASP mapping is not always explicit in the name, so here is the practical breakdown for the most important categories:

LLM01 — Prompt Injection

indirect-prompt-injection — tests injection via untrusted variables (retrieved document content, tool responses)
special-token-injection — Unicode tag-based instruction smuggling
cyberseceval — Meta's CyberSecEval prompt injection dataset
pliny — community-curated jailbreak collection

LLM02 — Sensitive Information Disclosure

pii:direct — asks the model to output PII directly
pii:api-db — attempts to extract PII via API or database access
pii:session — cross-session PII leakage probes
pii:social — social engineering to extract personal data

LLM05 — Improper Output Handling

sql-injection — SQL injection via LLM output passed to a database
shell-injection — command injection via tool-calling LLMs
data-exfil — exfiltration via URL parameters, images, or Markdown links

LLM06 — Excessive Agency

excessive-agency — attempts to trigger actions beyond defined system boundaries

LLM07 — System Prompt Leakage

system-prompt-override — directly attempts to override or extract the system prompt

LLM09 — Misinformation

hallucination — checks whether the model generates false or fabricated information

For agentic applications, the coding-agent:* plugin family adds 12 additional attack surfaces specific to AI coding agents, including repo-prompt-injection, sandbox-write-escape, and secret-env-read.

A Working Red Team YAML Config

Here is a minimal config that covers six of the ten OWASP categories with a reasonable number of test cases for a weekly CI run:

# promptfooconfig.yaml
targets:
  - id: openai:gpt-4o-mini   # replace with your actual target
    label: prod-chatbot

prompts:
  - "{{input}}"

redteam:
  purpose: >
    A customer-support chatbot for a SaaS product. It can answer
    questions about billing, features, and documentation. It has
    read access to user account data and can initiate refunds.

  numTests: 20   # ~200 test cases total across all plugins

  plugins:
    # LLM01: Prompt Injection
    - indirect-prompt-injection
    - special-token-injection

    # LLM02: Sensitive Information Disclosure
    - pii:direct
    - pii:api-db

    # LLM05: Improper Output Handling
    - shell-injection
    - sql-injection

    # LLM06: Excessive Agency
    - excessive-agency

    # LLM07: System Prompt Leakage
    - system-prompt-override

    # LLM09: Misinformation
    - hallucination

  strategies:
    - basic            # standard adversarial prompts
    - jailbreak        # multi-step escalating attempts
    - prompt-injection # injected instructions in content fields

Two things worth noting about this config:

First, the purpose field is critical. Promptfoo uses it to tailor adversarial prompts to your specific application context. A generic purpose produces generic probes. A precise description — including what the system can access and what it is allowed to do — produces targeted attacks that actually match your threat model.

Second, numTests: 20 generates roughly 20 test cases per plugin across all three strategies. With nine plugins and three strategies, that is around 540 test cases. Adjust down for faster feedback during development, up for pre-release security gates.

Running the Scan

# One-time setup (no global install needed)
npx promptfoo@latest --version   # verify version

# Generate adversarial probes (requires Promptfoo account)
npx promptfoo@latest redteam generate \
  --config promptfooconfig.yaml \
  --output redteam.yaml

# Evaluate probes against your target
npx promptfoo@latest redteam run \
  --config promptfooconfig.yaml

# View results in web UI
npx promptfoo@latest view

The generate step calls Promptfoo Cloud to produce adversarial variants of your prompts. The actual evaluation runs locally against your specified target using your own API key. Results appear both in the terminal summary and in a local web UI at localhost:15500.

Note: redteam generate requires email verification and a Promptfoo account. Effloow Lab confirmed this during the PoC on 2026-05-20 — the CLI prompts for a work email before generating probes. The eval command (for running your own prompts with assertions) works without authentication.

Wiring It Into CI

Promptfoo ships an official GitHub Action for both basic eval and red team scanning. Here is a minimal security gate that runs on pull requests:

# .github/workflows/llm-security.yml
name: LLM Security Gate

on:
  pull_request:
    paths:
      - 'prompts/**'
      - 'system-prompts/**'
      - '.env.example'

jobs:
  redteam:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run LLM red team
        uses: promptfoo/promptfoo-action@v2
        with:
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          promptfoo-api-key: ${{ secrets.PROMPTFOO_API_KEY }}
          config: ./promptfooconfig.yaml
          type: redteam

      - name: Comment results on PR
        if: github.event_name == 'pull_request'
        run: |
          PASS_RATE=$(cat promptfoo-results.json | jq '.results.stats.passRate')
          echo "Pass rate: ${PASS_RATE}%"

The paths trigger is worth keeping narrow. Red team scans cost real money in LLM API calls — you want them running when prompt logic changes, not on every frontend commit.

For teams that want a scheduled baseline rather than per-PR gates, a cron trigger makes more sense:

on:
  schedule:
    - cron: '0 3 * * 1'   # Monday 3 AM UTC
  workflow_dispatch:

Interpreting Results

Promptfoo's output classifies each test case as PASS or FAIL, but the severity classification matters as much as the pass rate. After a scan, the web UI groups findings by OWASP category and shows the specific prompts that triggered failures.

A few practical guidelines for triaging results:

Immediate fix (before next deploy): Any FAIL in pii:direct, system-prompt-override, or excessive-agency that includes an actual payload demonstration — not just a theoretical attack. These represent working exploits against your current deployment.

Fix in current sprint: Hallucination failures where the model confidently states false facts about your product, pricing, or policies. These are reputation and liability risks even if not security exploits.

Review next sprint: Indirect prompt injection failures that require a contrived multi-step scenario. Prioritize based on whether your application ingests untrusted external content (RSS feeds, user-submitted documents, web browsing).

Track as known risk: Failures in categories your application explicitly does not need to handle — for example, a code generation assistant may intentionally produce shell commands that would fail a shell-injection assertion by design.

What Works Well

155 plugins cover a wide attack surface with minimal config
YAML-first config is version-controllable and reviewable in PRs
Local evaluation means sensitive prompts stay in your infrastructure
GitHub Action integrates in under 30 lines
Echo provider lets you validate config structure without API costs

What to Watch

Adversarial probe generation requires Promptfoo account (email verification)
Full scans with 20+ plugins run hundreds of LLM calls — budget accordingly
The owasp:llm meta-preset is referenced in docs but resolves server-side
Post-OpenAI acquisition, enterprise pricing direction is unclear
False positives increase with broader purpose descriptions

The Agentic AI Extension

OWASP released a separate Top 10 for Agentic Applications in December 2025, announced at Black Hat Europe. Promptfoo maps its coding-agent:* plugin family and the broader agentic:* namespace to this list.

The key risks specific to agents that do not appear in the standard LLM Top 10:

Memory poisoning (agentic:memory-poisoning) — injecting false data into an agent's persistent memory store
Automation hijacking (coding-agent:automation-poisoning) — modifying CI scripts, hooks, or scheduled jobs to persist unsafe behavior after the immediate task completes
Sandbox escape (coding-agent:sandbox-write-escape, coding-agent:sandbox-read-escape) — reading or writing outside the intended workspace
Delayed exfiltration (coding-agent:delayed-ci-exfil) — planting workflow changes that leak data after the evaluation run completes

If your application uses tool-calling or multi-step planning, run both the standard OWASP LLM config and the agentic plugin set.

Frequently Asked Questions

Q: Do I need a Promptfoo account to use it at all?

No. The promptfoo eval command — which runs assertion-based tests against any provider using your own prompts — works without authentication. You only need an account for promptfoo redteam generate, which uses Promptfoo's cloud models to generate adversarial probes. You can hand-write test cases in a redteam.yaml and run promptfoo eval against them without ever signing up.

Q: How does Promptfoo compare to Microsoft PyRIT or Garak?

PyRIT is a Python framework from Microsoft's AI Red Team, better suited to researchers writing custom attack logic. Garak is similarly research-oriented with strong dataset coverage but no CI integration. Promptfoo sits in the practitioner tier: less flexible than PyRIT for novel research, but far easier to integrate into a standard dev workflow via YAML and GitHub Actions.

Q: Does the OpenAI acquisition change anything for open-source users?

As of May 2026, the repo remains MIT-licensed and the CLI is fully functional. OpenAI's stated intent is to integrate Promptfoo's technology into its Frontier enterprise platform while keeping the open-source tool available. Whether that changes pricing or rate limits for the cloud generation service is not yet public.

Q: What is the minimum viable config for a solo developer?

Three plugins cover the most commonly exploited categories with a reasonable test count:

plugins:
  - indirect-prompt-injection
  - pii:direct
  - excessive-agency
strategies:
  - basic

Run this weekly with numTests: 10 per plugin. That is 30–90 API calls depending on strategy expansion — cheap enough to run regularly, targeted enough to catch the most common issues.

Key Takeaways

The OWASP LLM Top 10 2025 gives you a peer-reviewed threat model. Promptfoo gives you an automated way to test against it. The combination works because Promptfoo's plugin taxonomy was built with OWASP categories in mind, and the YAML config format makes security testing a first-class part of your repository rather than a one-off audit.

The practical path for most teams:

Add promptfooconfig.yaml to your repo with the plugins that match your threat model
Run promptfoo eval on every PR that modifies prompt logic (no auth required)
Run promptfoo redteam run on a weekly schedule or before major releases
Triage FAIL results by category, starting with PII and system prompt leakage

Bottom Line

Promptfoo 0.121 is the most practical path from "we should test our LLM app for security" to "we have a CI gate that runs 500+ adversarial probes against OWASP categories on every release." The echo provider and local eval work without any account; the full red team needs a Promptfoo login but remains the fastest way to get an OWASP LLM Top 10 scan report on an LLM-powered application.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →