Python
Python SDK for AI agent safety - guard, redact, scan, and test methods
Python SDK
The safety-agent package provides methods for AI agent safety: guard() for detecting threats, redact() for removing sensitive data, scan() for analyzing repositories, and test() for running red team scenarios.
Installation
uv add safety-agentOr with pip:
pip install safety-agentQuick Start
from safety_agent import create_client
client = create_client()
# Guard: Detect threats (uses default superagent/guard-1.7b model)
result = await client.guard(input="user message to analyze")
# Redact: Remove PII
result = await client.redact(
input="My email is john@example.com",
model="openai/gpt-4o-mini"
)
# Scan: Analyze repository for AI agent attacks
result = await client.scan(repo="https://github.com/user/repo")Client Configuration
Basic Configuration
client = create_client(api_key="your-api-key") # Or set SUPERAGENT_API_KEY env varFallback Configuration
The SDK supports automatic fallback to an always-on endpoint when the primary Superagent model experiences cold starts. This is useful for production environments where latency consistency is critical.
client = create_client(
enable_fallback=True, # Enable fallback (default: True)
fallback_timeout=5.0, # Timeout before fallback in seconds (default: 5.0)
fallback_url="https://..." # Custom fallback URL (optional)
)| Option | Type | Default | Description |
|---|---|---|---|
api_key | str | SUPERAGENT_API_KEY env | Your Superagent API key |
enable_fallback | bool | True | Enable automatic fallback on timeout |
fallback_timeout | float | 5.0 | Seconds to wait before falling back |
fallback_url | str | Built-in URL | Custom fallback endpoint URL |
The fallback URL can also be set via the SUPERAGENT_FALLBACK_URL environment variable.
Guard
The guard() method classifies input content as pass or block. It detects prompt injections, malicious instructions, and security threats.
Basic Usage
result = await client.guard(input="user message to analyze")
if result.classification == "block":
print("Blocked:", result.violation_types)
print("Reason:", result.reasoning)Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
input | str | bytes | Yes | - | The input to analyze |
model | str | No | superagent/guard-1.7b | Model in provider/model format |
system_prompt | str | No | - | Custom system prompt |
chunk_size | int | No | 8000 | Characters per chunk (0 to disable) |
Response
| Field | Type | Description |
|---|---|---|
classification | "pass" | "block" | Whether content passed or should be blocked |
reasoning | str | Explanation of why content was classified as pass or block |
violation_types | list[str] | Types of violations detected |
cwe_codes | list[str] | CWE codes associated with violations |
usage | TokenUsage | Token usage information |
Input Types
Guard supports multiple input types:
- Plain text: Analyzed directly
- URLs: Automatically fetched and analyzed
- Bytes: Analyzed based on content type
- PDFs: Text extracted and analyzed per page
- Images: Requires vision-capable model
# URL input
result = await client.guard(input="https://example.com/document.pdf")
# File input
with open("document.pdf", "rb") as f:
result = await client.guard(input=f.read())Redact
The redact() method removes sensitive content from text using placeholders or contextual rewriting.
Basic Usage
result = await client.redact(
input="My email is john@example.com and SSN is 123-45-6789",
model="openai/gpt-4o-mini"
)
print(result.redacted)
# "My email is <EMAIL_REDACTED> and SSN is <SSN_REDACTED>"Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
input | str | Yes | - | The text to redact |
model | str | Yes | - | Model in provider/model format |
entities | list[str] | No | Default PII | Entity types to redact |
rewrite | bool | No | False | Rewrite contextually instead of placeholders |
Response
| Field | Type | Description |
|---|---|---|
redacted | str | Sanitized text with redactions |
findings | list[str] | What was redacted |
usage | TokenUsage | Token usage information |
Rewrite Mode
Contextually rewrites text instead of using placeholders:
result = await client.redact(
input="My email is john@example.com",
model="openai/gpt-4o-mini",
rewrite=True
)
print(result.redacted)
# "My email is on file"Custom Entities
Specify which entity types to redact:
result = await client.redact(
input="Contact john@example.com or call 555-123-4567",
model="openai/gpt-4o-mini",
entities=["email addresses"] # Only redact emails
)Default Entities
When entities is not specified:
- SSNs, Driver's License, Passport Numbers
- API Keys, Secrets, Passwords
- Names, Addresses, Phone Numbers
- Emails, Credit Card Numbers
Scan
The scan() method analyzes a repository for AI agent-targeted attacks. It clones the repository into a secure Daytona sandbox and uses OpenCode to detect threats like repo poisoning, prompt injections, and malicious instructions.
Basic Usage
response = await client.scan(repo="https://github.com/user/repo")
print(response.result) # Security report
print(f"Cost: ${response.usage.cost:.4f}")Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
repo | str | Yes | - | Git repository URL (https:// or git@) |
branch | str | No | Default branch | Branch, tag, or commit to checkout |
model | str | No | anthropic/claude-sonnet-4-5 | Model for OpenCode analysis |
Response
| Field | Type | Description |
|---|---|---|
result | str | Security report from OpenCode |
usage | ScanUsage | Token usage metrics |
ScanUsage
| Field | Type | Description |
|---|---|---|
input_tokens | int | Total input tokens used |
output_tokens | int | Total output tokens used |
reasoning_tokens | int | Reasoning tokens (if applicable) |
cost | float | Total cost in USD |
Example: Scanning a Branch
response = await client.scan(
repo="https://github.com/user/repo",
branch="feature-branch",
model="anthropic/claude-sonnet-4-5"
)
print("Security Report:")
print(response.result)
print(f"\nTokens: {response.usage.input_tokens} in, {response.usage.output_tokens} out")
print(f"Cost: ${response.usage.cost:.4f}")Environment Variables
Configure provider API keys:
export SUPERAGENT_API_KEY=your-superagent-key
export OPENAI_API_KEY=your-openai-key
export ANTHROPIC_API_KEY=your-anthropic-key
export GOOGLE_API_KEY=your-google-key
export GROQ_API_KEY=your-groq-key
export FIREWORKS_API_KEY=your-fireworks-key
export OPENROUTER_API_KEY=your-openrouter-key
# Required for scan()
export DAYTONA_API_KEY=your-daytona-api-key