Superagent LogoSuperagent

Open-Weight Models

Run Guard on your own infrastructure with no API calls

Open-Weight Models

We publish Guard models on HuggingFace that you can run on your own infrastructure. No API calls, no data leaving your environment. 50-100ms latency on CPU or GPU.

Available Models

ModelParametersFormatUse Case
superagent-guard-0.6b0.6BSafetensorsFast inference, edge deployment
superagent-guard-1.7b1.7BSafetensorsBalanced speed and accuracy
superagent-guard-4b4BSafetensorsMaximum accuracy

GGUF (CPU Inference)

Quantized versions optimized for CPU inference with llama.cpp:

ModelParametersDownload
superagent-guard-0.6b-gguf0.6BHuggingFace
superagent-guard-1.7b-gguf1.7BHuggingFace
superagent-guard-4b-gguf4BHuggingFace

When to Self-Host

Self-host the models when you need:

  • Data residency: Keep all data on your infrastructure
  • Air-gapped environments: No external API calls
  • Cost control: No per-request pricing at scale
  • Latency guarantees: Predictable inference times

Getting Started

Download a model from HuggingFace and run it with your preferred inference framework (vLLM, llama.cpp, Transformers, etc.).

# Clone a model
git lfs install
git clone https://huggingface.co/superagent-ai/superagent-guard-1.7b

# Or download GGUF for CPU
huggingface-cli download superagent-ai/superagent-guard-1.7b-gguf

Browse all models: huggingface.co/superagent-ai