Superagent LogoSuperagent
SDK

TypeScript

TypeScript SDK for AI agent safety - guard, redact, scan, and test methods

TypeScript SDK

The safety-agent package provides methods for AI agent safety: guard() for detecting threats, redact() for removing sensitive data, scan() for analyzing repositories, and test() for running red team scenarios.

Installation

npm install safety-agent

Quick Start

import { createClient } from "safety-agent";

const client = createClient();

// Guard: Detect threats
const guardResult = await client.guard({
  input: "user message to analyze"
});

// Redact: Remove PII
const redactResult = await client.redact({
  input: "My email is john@example.com",
  model: "openai/gpt-4o-mini"
});

// Scan: Analyze repository for AI agent attacks
const scanResult = await client.scan({
  repo: "https://github.com/user/repo"
});

Client Configuration

Basic Configuration

const client = createClient({
  apiKey: "your-api-key" // Or set SUPERAGENT_API_KEY env var
});

Fallback Configuration

The SDK supports automatic fallback to an always-on endpoint when the primary Superagent model experiences cold starts. This is useful for production environments where latency consistency is critical.

const client = createClient({
  enableFallback: true,           // Enable fallback (default: true)
  fallbackTimeoutMs: 5000,        // Timeout before fallback (default: 5000ms)
  fallbackUrl: "https://..."      // Custom fallback URL (optional)
});
OptionTypeDefaultDescription
apiKeystringSUPERAGENT_API_KEY envYour Superagent API key
enableFallbackbooleantrueEnable automatic fallback on timeout
fallbackTimeoutMsnumber5000Milliseconds to wait before falling back
fallbackUrlstringBuilt-in URLCustom fallback endpoint URL

The fallback URL can also be set via the SUPERAGENT_FALLBACK_URL environment variable.


Guard

The guard() method classifies input content as pass or block. It detects prompt injections, malicious instructions, and security threats.

Basic Usage

const result = await client.guard({
  input: "user message to analyze"
});

if (result.classification === "block") {
  console.log("Blocked:", result.violation_types);
  console.log("Reason:", result.reasoning);
}

Options

OptionTypeRequiredDefaultDescription
inputstring | Blob | URLYes-The input to analyze
modelstringNosuperagent/guard-1.7bModel in provider/model format
systemPromptstringNo-Custom system prompt
chunkSizenumberNo8000Characters per chunk (0 to disable)

Response

FieldTypeDescription
classification"pass" | "block"Whether content passed or should be blocked
reasoningstringExplanation of why content was classified as pass or block
violation_typesstring[]Types of violations detected
cwe_codesstring[]CWE codes associated with violations
usageTokenUsageToken usage information

Input Types

Guard supports multiple input types:

  • Plain text: Analyzed directly
  • URLs: Automatically fetched and analyzed
  • Blob/File: Analyzed based on MIME type
  • PDFs: Text extracted and analyzed per page
  • Images: Requires vision-capable model
// URL input
const result = await client.guard({
  input: "https://example.com/document.pdf"
});

// Image input (requires vision model)
const result = await client.guard({
  input: imageBlob,
  model: "openai/gpt-4o"
});

Redact

The redact() method removes sensitive content from text using placeholders or contextual rewriting.

Basic Usage

const result = await client.redact({
  input: "My email is john@example.com and SSN is 123-45-6789",
  model: "openai/gpt-4o-mini"
});

console.log(result.redacted);
// "My email is <EMAIL_REDACTED> and SSN is <SSN_REDACTED>"

Options

OptionTypeRequiredDefaultDescription
inputstringYes-The text to redact
modelstringYes-Model in provider/model format
entitiesstring[]NoDefault PIIEntity types to redact
rewritebooleanNofalseRewrite contextually instead of placeholders

Response

FieldTypeDescription
redactedstringSanitized text with redactions
findingsstring[]What was redacted
usageTokenUsageToken usage information

Rewrite Mode

Contextually rewrites text instead of using placeholders:

const result = await client.redact({
  input: "My email is john@example.com",
  model: "openai/gpt-4o-mini",
  rewrite: true
});

console.log(result.redacted);
// "My email is on file"

Custom Entities

Specify which entity types to redact:

const result = await client.redact({
  input: "Contact john@example.com or call 555-123-4567",
  model: "openai/gpt-4o-mini",
  entities: ["email addresses"] // Only redact emails
});

Default Entities

When entities is not specified:

  • SSNs, Driver's License, Passport Numbers
  • API Keys, Secrets, Passwords
  • Names, Addresses, Phone Numbers
  • Emails, Credit Card Numbers

Scan

The scan() method analyzes a repository for AI agent-targeted attacks. It clones the repository into a secure Daytona sandbox and uses OpenCode to detect threats like repo poisoning, prompt injections, and malicious instructions.

Basic Usage

const response = await client.scan({
  repo: "https://github.com/user/repo"
});

console.log(response.result);  // Security report
console.log(`Cost: $${response.usage.cost.toFixed(4)}`);

Options

OptionTypeRequiredDefaultDescription
repostringYes-Git repository URL (https:// or git@)
branchstringNoDefault branchBranch, tag, or commit to checkout
modelstringNoanthropic/claude-sonnet-4-5Model for OpenCode analysis

Response

FieldTypeDescription
resultstringSecurity report from OpenCode
usageScanUsageToken usage metrics

ScanUsage

FieldTypeDescription
inputTokensnumberTotal input tokens used
outputTokensnumberTotal output tokens used
reasoningTokensnumberReasoning tokens (if applicable)
costnumberTotal cost in USD

Environment Variables

# Required for scan()
export DAYTONA_API_KEY=your-daytona-api-key

# Required for the model (at least one)
export ANTHROPIC_API_KEY=your-anthropic-key
export OPENAI_API_KEY=your-openai-key

Example: Scanning a Branch

const response = await client.scan({
  repo: "https://github.com/user/repo",
  branch: "feature-branch",
  model: "anthropic/claude-sonnet-4-5"
});

console.log("Security Report:");
console.log(response.result);
console.log(`\nTokens: ${response.usage.inputTokens} in, ${response.usage.outputTokens} out`);
console.log(`Cost: $${response.usage.cost.toFixed(4)}`);
}