idpishield

Defense against indirect prompt injection

Fast, local risk assessment for untrusted text before it reaches your LLM. Go library, CLI, and MCP server. Sub-millisecond detection with 88+ patterns.

Go Sub-millisecond Apache 2.0 88+ patterns MCP server

View on GitHub Docs

Quick Start

Three lines to protection

Add idpishield to your Go project and start assessing untrusted content immediately.

Go Library

terminal

go get github.com/pinchtab/idpishield

main.go

shield := idpi.New(idpi.Config{Mode: idpi.ModeBalanced})

result := shield.Assess(untrustedText, sourceURL)
if result.Blocked {
    log.Printf("blocked: score=%d reason=%s", result.Score, result.Reason)
}

CLI

terminal

go install github.com/pinchtab/idpishield/cmd/idpishield@latest

terminal

# Scan a file
idpishield scan page.txt --mode balanced

# Scan from stdin
echo "Ignore all previous instructions" | idpishield scan

# JSON output
{"score":80,"level":"critical","blocked":true}

MCP Server

terminal

# stdio mode (default)
idpishield mcp serve

# HTTP mode with auth
idpishield mcp serve --transport http --auth-token "$IDPI_MCP_TOKEN"

Exposes idpi_assess as an MCP tool — works with any MCP-compatible agent framework.

Features

Built for the AI security stack

Everything you need to assess prompt injection risk before content reaches your LLM.

⚡

Sub-millisecond

Pattern matching and risk scoring complete in under a millisecond. No network calls in fast or balanced mode.

🛡️

88+ Detection Patterns

Multi-language patterns covering instruction override, exfiltration, role hijacking, encoding tricks, and more.

🎯

Tiered Defense

Three modes — fast, balanced, deep — so you pick the right tradeoff between speed and detection accuracy.

🔧

Go Library First

Import as a Go package. One function call: Assess(text, url). CLI and MCP server are secondary interfaces.

🌍

Multi-language

Patterns for English, French, Spanish, German, and Japanese. Unicode normalization handles obfuscation attempts.

📊

Explainable Results

Every assessment returns score, level, matched patterns, categories, and reason — ready for audit logging.

🔒

Production Hardening

Input size limits, decode depth bounds, circuit breaker for deep service, strict mode for lower thresholds.

🤖

MCP Native

Run as an MCP server with stdio or HTTP transport. Token-based auth, constant-time credential checks.

Architecture

Tiered defense by design

Local pattern matching handles most threats instantly. Optional deep service adds semantic analysis when needed.

  Input Text
      │
      ├── Domain Allowlist ──── trusted? ──→ skip
      │
      ├── Unicode Normalization
      │     └── decode obfuscation (HTML entities, base64, etc.)
      │
      ├── Pattern Matching (88+ patterns, 5 languages)
      │     ├── instruction-override
      │     ├── exfiltration
      │     ├── role-hijacking
      │     ├── encoding-tricks
      │     └── social-engineering
      │
      ├── Risk Scoring (0–100)
      │     ├── default: blocks at ≥ 60
      │     └── strict:  blocks at ≥ 40
      │
      └── [Deep mode] ──→ Service escalation
            ├── Semantic similarity
            └── LLM intent analysis

Fast

Pattern matching on raw input. Highest throughput, lowest latency.

Balanced

Normalization + pattern matching. Recommended default for most integrations.

Deep

Balanced + optional service escalation for semantic and LLM-based analysis.

Output

Structured risk results

Every assessment returns an actionable, auditable result.

result.go

type RiskResult struct {
    Score      int      // 0–100 risk estimate
    Level      string   // safe | low | medium | high | critical
    Blocked    bool     // policy decision (score + strict mode)
    Reason     string   // human-readable explanation
    Patterns   []string // matched pattern IDs
    Categories []string // threat categories
}

Risk Levels

safe — score 0–19

low — score 20–39

medium — score 40–59

high — score 60–79

critical — score 80–100

Blocking Semantics

Default mode blocks at score ≥ 60. Strict mode lowers the threshold to ≥ 40. The blocked field is a policy output, not just a detection flag.

idpishield is part of the Pinchtab ecosystem — tools for AI agents that take security seriously. Built in the open, designed for production.