Secure proxy for AI/LLM APIs with prompt injection detection, PII filtering, rate limiting, and cost management.
Use Case
- Protect AI APIs from prompt injection attacks
- Filter PII from prompts before sending to LLM
- Rate limit and track token usage
- Validate requests against API schemas
- Monitor AI API costs
Architecture
┌─────────────────┐
│ Sentinel │
│ AI Gateway │
└────────┬────────┘
│
┌──────────┴──────────┐
│ AI Gateway Agent │
│ (prompt security) │
└──────────┬──────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ OpenAI │ │ Anthropic │ │ Azure │
│ API │ │ API │ │ OpenAI │
└───────────┘ └───────────┘ └───────────┘
Configuration
Create sentinel.kdl:
// AI Gateway Configuration
// Secure proxy for OpenAI, Anthropic, and Azure OpenAI
system {
worker-threads 0
graceful-shutdown-timeout-secs 30
}
listeners {
listener "https" {
address "0.0.0.0:8443"
protocol "https"
tls {
cert-file "/etc/sentinel/certs/ai.crt"
key-file "/etc/sentinel/certs/ai.key"
}
}
}
routes {
// Health check
route "health" {
priority 1000
matches { path "/health" }
service-type "builtin"
builtin-handler "health"
}
// OpenAI API proxy
route "openai" {
priority 200
matches {
path-prefix "/v1/"
host "api.openai.local"
}
upstream "openai"
agents "ai-gateway" "auth" "ratelimit"
policies {
timeout-secs 120
max-body-size "10MB"
}
}
// Anthropic API proxy
route "anthropic" {
priority 200
matches {
path-prefix "/v1/"
host "api.anthropic.local"
}
upstream "anthropic"
agents "ai-gateway" "auth" "ratelimit"
policies {
timeout-secs 120
max-body-size "10MB"
}
}
// Azure OpenAI proxy
route "azure-openai" {
priority 200
matches {
path-prefix "/openai/"
host "azure.openai.local"
}
upstream "azure-openai"
agents "ai-gateway" "auth" "ratelimit"
policies {
timeout-secs 120
max-body-size "10MB"
}
}
}
upstreams {
upstream "openai" {
targets {
target { address "api.openai.com:443" }
}
tls {
sni "api.openai.com"
}
}
upstream "anthropic" {
targets {
target { address "api.anthropic.com:443" }
}
tls {
sni "api.anthropic.com"
}
}
upstream "azure-openai" {
targets {
target { address "your-resource.openai.azure.com:443" }
}
tls {
sni "your-resource.openai.azure.com"
}
}
}
agents {
agent "ai-gateway" type="custom" {
unix-socket "/var/run/sentinel/ai-gateway.sock"
events "request_headers" "request_body"
timeout-ms 100
failure-mode "closed"
}
agent "auth" type="auth" {
unix-socket "/var/run/sentinel/auth.sock"
events "request_headers"
timeout-ms 50
failure-mode "closed"
}
agent "ratelimit" type="rate_limit" {
unix-socket "/var/run/sentinel/ratelimit.sock"
events "request_headers"
timeout-ms 20
failure-mode "open"
}
}
observability {
metrics {
enabled #true
address "0.0.0.0:9090"
}
logging {
level "info"
format "json"
}
}
Agent Setup
Install AI Gateway Agent
Start AI Gateway Agent
&
Configuration Options
| Option | Default | Description |
|---|---|---|
--prompt-injection | true | Detect prompt injection attacks |
--pii-detection | true | Detect PII in prompts |
--pii-action | log | Action on PII: block, redact, log |
--jailbreak-detection | true | Detect jailbreak attempts |
--schema-validation | false | Validate against API schemas |
--allowed-models | (all) | Comma-separated model allowlist |
--max-tokens | 0 | Max tokens per request (0=unlimited) |
--rate-limit-requests | 0 | Requests per minute per client |
--rate-limit-tokens | 0 | Tokens per minute per client |
--add-cost-headers | true | Add cost estimation headers |
Testing
Test Prompt Injection Detection
Expected response:
Test PII Detection
With --pii-action block:
HTTP/1.1 403 Forbidden
X-AI-Gateway-PII-Detected: ssn,email
With --pii-action redact, the message is modified:
"My SSN is [SSN REDACTED] and email is [EMAIL REDACTED]"
Test Jailbreak Detection
Expected: Blocked with jailbreak detection.
Test Model Allowlist
If gpt-4-turbo-preview is not in --allowed-models:
Response Headers
The AI Gateway agent adds informational headers:
| Header | Description |
|---|---|
X-AI-Gateway-Provider | Detected provider (openai, anthropic, azure) |
X-AI-Gateway-Model | Model used |
X-AI-Gateway-Tokens-Estimated | Estimated token count |
X-AI-Gateway-Cost-Estimated | Estimated cost in USD |
X-AI-Gateway-PII-Detected | Comma-separated PII types found |
Rate Limiting
Token-Based Rate Limiting
& # 100K tokens/min per client
When exceeded:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 0
X-RateLimit-Reset: 45
Retry-After: 45
Per-Tier Rate Limits
Configure different limits for different API key tiers:
# Free tier
&
# Pro tier
&
Cost Tracking
Monitor AI API costs with Prometheus:
# Cost per hour
sum(rate(sentinel_ai_gateway_cost_usd_total[1h])) * 3600
# Cost by model
sum by (model) (rate(sentinel_ai_gateway_cost_usd_total[1h])) * 3600
# Token usage by client
sum by (client_id) (rate(sentinel_ai_gateway_tokens_total[1h])) * 3600
Client Configuration
Python (OpenAI SDK)
=
=
Python (Anthropic SDK)
=
=
Node.js
;
;
;
Next Steps
- Security - Additional WAF protection
- Observability - Monitor AI API usage
- API Gateway - Full API management