idpishield
Fast, local risk assessment for untrusted text before it reaches your LLM. Go library, CLI, and MCP server. Sub-millisecond detection with 88+ patterns.
Quick Start
Three lines to protection
Add idpishield to your Go project and start assessing untrusted content immediately.
Go Library
go get github.com/pinchtab/idpishield shield := idpi.New(idpi.Config{Mode: idpi.ModeBalanced})
result := shield.Assess(untrustedText, sourceURL)
if result.Blocked {
log.Printf("blocked: score=%d reason=%s", result.Score, result.Reason)
} CLI
go install github.com/pinchtab/idpishield/cmd/idpishield@latest # Scan a file
idpishield scan page.txt --mode balanced
# Scan from stdin
echo "Ignore all previous instructions" | idpishield scan
# JSON output
{"score":80,"level":"critical","blocked":true} MCP Server
# stdio mode (default)
idpishield mcp serve
# HTTP mode with auth
idpishield mcp serve --transport http --auth-token "$IDPI_MCP_TOKEN"
Exposes idpi_assess as an MCP tool — works with any MCP-compatible agent framework.
Features
Built for the AI security stack
Everything you need to assess prompt injection risk before content reaches your LLM.
Sub-millisecond
Pattern matching and risk scoring complete in under a millisecond. No network calls in fast or balanced mode.
88+ Detection Patterns
Multi-language patterns covering instruction override, exfiltration, role hijacking, encoding tricks, and more.
Tiered Defense
Three modes — fast, balanced, deep — so you pick the right tradeoff between speed and detection accuracy.
Go Library First
Import as a Go package. One function call: Assess(text, url). CLI and MCP server are secondary interfaces.
Multi-language
Patterns for English, French, Spanish, German, and Japanese. Unicode normalization handles obfuscation attempts.
Explainable Results
Every assessment returns score, level, matched patterns, categories, and reason — ready for audit logging.
Production Hardening
Input size limits, decode depth bounds, circuit breaker for deep service, strict mode for lower thresholds.
MCP Native
Run as an MCP server with stdio or HTTP transport. Token-based auth, constant-time credential checks.
Architecture
Tiered defense by design
Local pattern matching handles most threats instantly. Optional deep service adds semantic analysis when needed.
Input Text
│
├── Domain Allowlist ──── trusted? ──→ skip
│
├── Unicode Normalization
│ └── decode obfuscation (HTML entities, base64, etc.)
│
├── Pattern Matching (88+ patterns, 5 languages)
│ ├── instruction-override
│ ├── exfiltration
│ ├── role-hijacking
│ ├── encoding-tricks
│ └── social-engineering
│
├── Risk Scoring (0–100)
│ ├── default: blocks at ≥ 60
│ └── strict: blocks at ≥ 40
│
└── [Deep mode] ──→ Service escalation
├── Semantic similarity
└── LLM intent analysis
Pattern matching on raw input. Highest throughput, lowest latency.
Normalization + pattern matching. Recommended default for most integrations.
Balanced + optional service escalation for semantic and LLM-based analysis.
Output
Structured risk results
Every assessment returns an actionable, auditable result.
type RiskResult struct {
Score int // 0–100 risk estimate
Level string // safe | low | medium | high | critical
Blocked bool // policy decision (score + strict mode)
Reason string // human-readable explanation
Patterns []string // matched pattern IDs
Categories []string // threat categories
} Risk Levels
Blocking Semantics
Default mode blocks at score ≥ 60. Strict mode lowers the threshold to ≥ 40.
The blocked field is a policy output, not just a detection flag.
idpishield is part of the Pinchtab ecosystem — tools for AI agents that take security seriously. Built in the open, designed for production.