Superagent LogoSuperagent
SDK

TypeScript

TypeScript SDK for AI agent safety - guard, redact, scan, and test methods

TypeScript SDK

The safety-agent package provides methods for AI agent safety: guard() for detecting threats, redact() for removing sensitive data, scan() for analyzing repositories, and test() for running red team scenarios.

Installation

npm install safety-agent

Quick Start

import { createClient } from "safety-agent";

const client = createClient();

// Guard: Detect threats
const guardResult = await client.guard({
  input: "user message to analyze"
});

// Redact: Remove PII
const redactResult = await client.redact({
  input: "My email is john@example.com",
  model: "openai/gpt-4o-mini"
});

// Scan: Analyze repository for AI agent attacks
const scanResult = await client.scan({
  repo: "https://github.com/user/repo"
});

Client Configuration

Basic Configuration

const client = createClient({
  apiKey: "your-api-key" // Or set SUPERAGENT_API_KEY env var
});

Fallback Configuration

The SDK supports automatic fallback to an always-on endpoint when the primary Superagent model experiences cold starts. This is useful for production environments where latency consistency is critical.

const client = createClient({
  enableFallback: true,           // Enable fallback (default: true)
  fallbackTimeoutMs: 5000,        // Timeout before fallback (default: 5000ms)
  fallbackUrl: "https://..."      // Custom fallback URL (optional)
});
OptionTypeDefaultDescription
apiKeystringSUPERAGENT_API_KEY envYour Superagent API key
enableFallbackbooleantrueEnable automatic fallback on timeout
fallbackTimeoutMsnumber5000Milliseconds to wait before falling back
fallbackUrlstringBuilt-in URLCustom fallback endpoint URL

The fallback URL can also be set via the SUPERAGENT_FALLBACK_URL environment variable.

Model Fallback

When using third-party providers (e.g., Google Gemini), transient errors like 503 (high demand) or 429 (rate limited) can cause requests to fail. The SDK supports automatic model fallback: if the primary model returns a retryable error, the request is re-issued to a backup model you specify.

const result = await client.guard({
  input: "user message to analyze",
  model: "google/gemini-2.5-flash-lite",
  fallbackModel: "google/gemini-2.5-pro"
});

If the primary model succeeds, fallbackModel is never called. If it returns a retryable status code (429, 500, 502, or 503), the SDK automatically retries with the fallback model. The fallback model can be from a different provider entirely:

const result = await client.guard({
  input: "user message to analyze",
  model: "google/gemini-2.5-flash-lite",
  fallbackModel: "openai/gpt-4o-mini"
});

The fallbackModel option is available on guard(), redact(), and scan(). The fallback model gets a single attempt — there is no recursive fallback chain.


Guard

The guard() method classifies input content as pass or block. It detects prompt injections, malicious instructions, and security threats.

Basic Usage

const result = await client.guard({
  input: "user message to analyze"
});

if (result.classification === "block") {
  console.log("Blocked:", result.violation_types);
  console.log("Reason:", result.reasoning);
}

Options

OptionTypeRequiredDefaultDescription
inputstring | Blob | URLYes-The input to analyze
modelstringNosuperagent/guard-1.7bModel in provider/model format
fallbackModelstringNo-Backup model used when primary returns 429/500/502/503
systemPromptstringNo-Custom system prompt
chunkSizenumberNo8000Characters per chunk (0 to disable)

Response

FieldTypeDescription
classification"pass" | "block"Whether content passed or should be blocked
reasoningstringExplanation of why content was classified as pass or block
violation_typesstring[]Types of violations detected
cwe_codesstring[]CWE codes associated with violations
usageTokenUsageToken usage information

Input Types

Guard supports multiple input types:

  • Plain text: Analyzed directly
  • URLs: Automatically fetched and analyzed
  • Blob/File: Analyzed based on MIME type
  • PDFs: Text extracted and analyzed per page
  • Images: Requires vision-capable model
// URL input
const result = await client.guard({
  input: "https://example.com/document.pdf"
});

// Image input (requires vision model)
const result = await client.guard({
  input: imageBlob,
  model: "openai/gpt-4o"
});

Redact

The redact() method removes sensitive content from text using placeholders or contextual rewriting.

Basic Usage

const result = await client.redact({
  input: "My email is john@example.com and SSN is 123-45-6789",
  model: "openai/gpt-4o-mini"
});

console.log(result.redacted);
// "My email is <EMAIL_REDACTED> and SSN is <SSN_REDACTED>"

Options

OptionTypeRequiredDefaultDescription
inputstringYes-The text to redact
modelstringYes-Model in provider/model format
fallbackModelstringNo-Backup model used when primary returns 429/500/502/503
entitiesstring[]NoDefault PIIEntity types to redact
rewritebooleanNofalseRewrite contextually instead of placeholders

Response

FieldTypeDescription
redactedstringSanitized text with redactions
findingsstring[]What was redacted
usageTokenUsageToken usage information

Rewrite Mode

Contextually rewrites text instead of using placeholders:

const result = await client.redact({
  input: "My email is john@example.com",
  model: "openai/gpt-4o-mini",
  rewrite: true
});

console.log(result.redacted);
// "My email is on file"

Custom Entities

Specify which entity types to redact:

const result = await client.redact({
  input: "Contact john@example.com or call 555-123-4567",
  model: "openai/gpt-4o-mini",
  entities: ["email addresses"] // Only redact emails
});

Default Entities

When entities is not specified:

  • SSNs, Driver's License, Passport Numbers
  • API Keys, Secrets, Passwords
  • Names, Addresses, Phone Numbers
  • Emails, Credit Card Numbers

Scan

The scan() method analyzes a repository for AI agent-targeted attacks. It clones the repository into a secure Daytona sandbox and uses OpenCode to detect threats like repo poisoning, prompt injections, and malicious instructions.

Basic Usage

const response = await client.scan({
  repo: "https://github.com/user/repo"
});

console.log(response.result);  // Security report
console.log(`Cost: $${response.usage.cost.toFixed(4)}`);

Options

OptionTypeRequiredDefaultDescription
repostringYes-Git repository URL (https:// or git@)
branchstringNoDefault branchBranch, tag, or commit to checkout
modelstringNoanthropic/claude-sonnet-4-5Model for OpenCode analysis
fallbackModelstringNo-Backup model used when primary returns 429/500/502/503

Response

FieldTypeDescription
resultstringSecurity report from OpenCode
usageScanUsageToken usage metrics

ScanUsage

FieldTypeDescription
inputTokensnumberTotal input tokens used
outputTokensnumberTotal output tokens used
reasoningTokensnumberReasoning tokens (if applicable)
costnumberTotal cost in USD

Environment Variables

# Required for scan()
export DAYTONA_API_KEY=your-daytona-api-key

# Required for the model (at least one)
export ANTHROPIC_API_KEY=your-anthropic-key
export OPENAI_API_KEY=your-openai-key

Example: Scanning a Branch

const response = await client.scan({
  repo: "https://github.com/user/repo",
  branch: "feature-branch",
  model: "anthropic/claude-sonnet-4-5"
});

console.log("Security Report:");
console.log(response.result);
console.log(`\nTokens: ${response.usage.inputTokens} in, ${response.usage.outputTokens} out`);
console.log(`Cost: $${response.usage.cost.toFixed(4)}`);
}