SuperagentLM

Overview

SuperagentLM is a SLM (Small Language Model) powers the Superagent's reasoning-driven detection of prompt injections, backdoors, and data leaks.

Attempts that override system policies.
Payloads such as reverse shells, ransomware droppers, or privilege escalation scripts.
Requests focused on secrets, credentials, or regulated PII.
Chains that try to coerce downstream models into unsafe behavior.

These accuracy numbers come from Superagent's internal detection eval suite; higher values mean fewer missed exploits during guard checks.

Architecture: GPT-OSS mixture-of-experts design with a 131k-token sliding attention context window, originally released as GPT-OSS 20B.
Finetuning: Instruction-tuned by Superagent on top of unsloth/gpt-oss-20b-unsloth-bnb-4bit via Unsloth's accelerated pipeline.
Parameters: 20.9B, exported as an 8-bit superagent_lm_finetue.Q8_0.gguf checkpoint for llama.cpp and compatible runtimes.
Package contents: Includes the Transformer config.json, chat template, recommended generation params, and the Q8_0 GGUF weights (~22.3 GB) for easy deployment across CPU/GPU setups.

Expect large downloads: the quantized GGUF export is ~19.5 GiB, while the full-precision shards weigh in at roughly 40 GiB for BF16 workflows.

Quantized guard rail tuned for llama.cpp, ~19.5 GiB download via huggingface-cli

BF16 safetensors across multiple shards (~40 GiB) for custom pipelines

Optimized for lighter inference: the Q8_0 GGUF package is ~1.7 GiB and the 16-bit checkpoint is ~1.5 GiB for users re-quantizing their own build.

Edge-friendly guard model in GGUF format (~1.7 GiB)

Reference FP16/BF16 weights (~1.5 GiB) for custom quantization

Superagent publishes the dataset behind its guard suite as a JSONL dataset (~39 MiB) so teams can reproduce benchmark checks locally.

Benchmark prompts, labels, and guard outcomes for regression testing