Python SDK

The safety-agent package provides methods for AI agent safety: guard() for detecting threats, redact() for removing sensitive data, scan() for analyzing repositories, and test() for running red team scenarios.

Installation

uv add safety-agent

Or with pip:

pip install safety-agent

Quick Start

from safety_agent import create_client

client = create_client()

# Guard: Detect threats (uses default superagent/guard-1.7b model)
result = await client.guard(input="user message to analyze")

# Redact: Remove PII
result = await client.redact(
    input="My email is john@example.com",
    model="openai/gpt-4o-mini"
)

# Scan: Analyze repository for AI agent attacks
result = await client.scan(repo="https://github.com/user/repo")

Client Configuration

Basic Configuration

client = create_client(api_key="your-api-key")  # Or set SUPERAGENT_API_KEY env var

Fallback Configuration

The SDK supports automatic fallback to an always-on endpoint when the primary Superagent model experiences cold starts. This is useful for production environments where latency consistency is critical.

client = create_client(
    enable_fallback=True,           # Enable fallback (default: True)
    fallback_timeout=5.0,           # Timeout before fallback in seconds (default: 5.0)
    fallback_url="https://..."      # Custom fallback URL (optional)
)

Option	Type	Default	Description
`api_key`	`str`	`SUPERAGENT_API_KEY` env	Your Superagent API key
`enable_fallback`	`bool`	`True`	Enable automatic fallback on timeout
`fallback_timeout`	`float`	`5.0`	Seconds to wait before falling back
`fallback_url`	`str`	Built-in URL	Custom fallback endpoint URL

The fallback URL can also be set via the SUPERAGENT_FALLBACK_URL environment variable.

Guard

The guard() method classifies input content as pass or block. It detects prompt injections, malicious instructions, and security threats.

Basic Usage

result = await client.guard(input="user message to analyze")

if result.classification == "block":
    print("Blocked:", result.violation_types)
    print("Reason:", result.reasoning)

Options

Option	Type	Required	Default	Description
`input`	`str \| bytes`	Yes	-	The input to analyze
`model`	`str`	No	`superagent/guard-1.7b`	Model in `provider/model` format
`system_prompt`	`str`	No	-	Custom system prompt
`chunk_size`	`int`	No	`8000`	Characters per chunk (0 to disable)

Response

Field	Type	Description
`classification`	`"pass" \| "block"`	Whether content passed or should be blocked
`reasoning`	`str`	Explanation of why content was classified as pass or block
`violation_types`	`list[str]`	Types of violations detected
`cwe_codes`	`list[str]`	CWE codes associated with violations
`usage`	`TokenUsage`	Token usage information

Input Types

Guard supports multiple input types:

Plain text: Analyzed directly
URLs: Automatically fetched and analyzed
Bytes: Analyzed based on content type
PDFs: Text extracted and analyzed per page
Images: Requires vision-capable model

# URL input
result = await client.guard(input="https://example.com/document.pdf")

# File input
with open("document.pdf", "rb") as f:
    result = await client.guard(input=f.read())

Redact

The redact() method removes sensitive content from text using placeholders or contextual rewriting.

Basic Usage

result = await client.redact(
    input="My email is john@example.com and SSN is 123-45-6789",
    model="openai/gpt-4o-mini"
)

print(result.redacted)
# "My email is <EMAIL_REDACTED> and SSN is <SSN_REDACTED>"

Options

Option	Type	Required	Default	Description
`input`	`str`	Yes	-	The text to redact
`model`	`str`	Yes	-	Model in `provider/model` format
`entities`	`list[str]`	No	Default PII	Entity types to redact
`rewrite`	`bool`	No	`False`	Rewrite contextually instead of placeholders

Response

Field	Type	Description
`redacted`	`str`	Sanitized text with redactions
`findings`	`list[str]`	What was redacted
`usage`	`TokenUsage`	Token usage information

Rewrite Mode

Contextually rewrites text instead of using placeholders:

result = await client.redact(
    input="My email is john@example.com",
    model="openai/gpt-4o-mini",
    rewrite=True
)

print(result.redacted)
# "My email is on file"

Custom Entities

Specify which entity types to redact:

result = await client.redact(
    input="Contact john@example.com or call 555-123-4567",
    model="openai/gpt-4o-mini",
    entities=["email addresses"]  # Only redact emails
)

Default Entities

When entities is not specified:

SSNs, Driver's License, Passport Numbers
API Keys, Secrets, Passwords
Names, Addresses, Phone Numbers
Emails, Credit Card Numbers

Scan

The scan() method analyzes a repository for AI agent-targeted attacks. It clones the repository into a secure Daytona sandbox and uses OpenCode to detect threats like repo poisoning, prompt injections, and malicious instructions.

Basic Usage

response = await client.scan(repo="https://github.com/user/repo")

print(response.result)  # Security report
print(f"Cost: ${response.usage.cost:.4f}")

Options

Option	Type	Required	Default	Description
`repo`	`str`	Yes	-	Git repository URL (https:// or git@)
`branch`	`str`	No	Default branch	Branch, tag, or commit to checkout
`model`	`str`	No	`anthropic/claude-sonnet-4-5`	Model for OpenCode analysis

Response

Field	Type	Description
`result`	`str`	Security report from OpenCode
`usage`	`ScanUsage`	Token usage metrics

ScanUsage

Field	Type	Description
`input_tokens`	`int`	Total input tokens used
`output_tokens`	`int`	Total output tokens used
`reasoning_tokens`	`int`	Reasoning tokens (if applicable)
`cost`	`float`	Total cost in USD

Example: Scanning a Branch

response = await client.scan(
    repo="https://github.com/user/repo",
    branch="feature-branch",
    model="anthropic/claude-sonnet-4-5"
)

print("Security Report:")
print(response.result)
print(f"\nTokens: {response.usage.input_tokens} in, {response.usage.output_tokens} out")
print(f"Cost: ${response.usage.cost:.4f}")

Environment Variables

Configure provider API keys:

export SUPERAGENT_API_KEY=your-superagent-key
export OPENAI_API_KEY=your-openai-key
export ANTHROPIC_API_KEY=your-anthropic-key
export GOOGLE_API_KEY=your-google-key
export GROQ_API_KEY=your-groq-key
export FIREWORKS_API_KEY=your-fireworks-key
export OPENROUTER_API_KEY=your-openrouter-key

# Required for scan()
export DAYTONA_API_KEY=your-daytona-api-key

Python

On this page