Abstract

Red-Agent presents a practical multi-provider framework for LLM red teaming focused on reliable cross-model execution, artifact completeness, and reproducible reporting. The empirical run uses a curated 11-model campaign over a fixed 20-probe ATT&CK-inspired taxonomy.

The paper pairs model outcomes with operational reachability and error accounting so readers can distinguish safety behavior from execution brittleness. Post-hoc StrongREJECT scoring is applied to archived trajectories without rerunning probes, producing per-model reports, JSONL traces, summary tables, and publication-ready figures.

Red-team operations console

From probe to audit trail

Select a layer to see what the framework records. The point is not a single leaderboard. The point is preserving enough evidence to explain what happened, where execution failed, and what the campaign can support.

Fixed campaign design

Twenty probes mapped to an ATT&CK-inspired taxonomy.

The framework keeps the campaign stable across providers so model behavior can be compared without silently changing the test instrument.

Why it matters

Most red-team reporting blurs together the model's safety behavior, provider availability, tool failures, and evaluator choices. Red-Agent treats those as separable layers. That makes the resulting evidence more useful for governance, procurement, model selection, and internal safety review.

The paper is also a practical bridge between research and operations. It emphasizes reproducible campaign artifacts rather than benchmark novelty, which is exactly what institutional decision-makers need when they must explain why a system passed, failed, or requires more testing.

Citation

Heath, Nathan. Red-Agent: A Practical Multi-Provider Framework for LLM Red Teaming with Operational Reachability and Artifact-Complete Reporting. SSRN, 2026. DOI: 10.2139/ssrn.6570383.

Open SSRN record

Related services