Protocol v2 Features
As of v0.2.0, the Chaos Engineering agent supports protocol v2 with:
- Capability negotiation: Reports supported features during handshake
- Health reporting: Exposes health status with draining awareness
- Metrics export: Counter metrics for faults injected per experiment
- gRPC transport: Optional high-performance gRPC transport via
--grpc-address - Lifecycle hooks: Graceful shutdown and drain handling
Overview
The Chaos Engineering agent provides controlled fault injection for resilience testing. It allows you to inject latency, errors, and failures into HTTP traffic based on configurable rules and targeting criteria.
Key Capabilities
| Feature | Description |
|---|---|
| Latency Injection | Add fixed or random delays to requests |
| Error Injection | Return specific HTTP status codes |
| Timeout Simulation | Simulate upstream timeouts (504) |
| Response Corruption | Inject garbage into responses |
| Connection Reset | Simulate connection failures (502) |
| Safety Controls | Schedule windows, excluded paths, kill switch |
Features
Latency Injection
Add delay before proxying requests. Supports fixed delays or random ranges:
experiments:
- id: "api-latency"
targeting:
paths:
- prefix: "/api/"
percentage: 10
fault:
type: latency
fixed_ms: 500 # Fixed 500ms delay
- id: "random-latency"
targeting:
percentage: 5
fault:
type: latency
min_ms: 100 # Random 100-1000ms
max_ms: 1000
Error Injection
Return HTTP errors immediately without proxying:
experiments:
- id: "payment-errors"
targeting:
paths:
- exact: "/api/payments"
percentage: 5
fault:
type: error
status: 500
message: "Chaos: Internal Server Error"
headers:
x-chaos-injected: "true"
Timeout Simulation
Simulate upstream timeouts by sleeping then returning 504:
experiments:
- id: "upstream-timeout"
targeting:
paths:
- regex: "^/api/external/.*"
percentage: 2
fault:
type: timeout
duration_ms: 30000 # 30 second timeout
Response Corruption
Inject garbage into responses (probabilistic):
experiments:
- id: "corrupt-response"
targeting:
percentage: 1
fault:
type: corrupt
probability: 0.5 # 50% of targeted get corrupted
Connection Reset
Simulate connection failures (returns 502):
experiments:
- id: "connection-reset"
targeting:
paths:
- prefix: "/api/unstable/"
percentage: 3
fault:
type: reset
Targeting
Path Matching
Multiple matching strategies:
targeting:
paths:
- exact: "/api/users" # Exact match
- prefix: "/api/" # Prefix match
- regex: "^/api/v\\d+/.*" # Regex pattern
Header-Based Activation
Trigger chaos only when specific headers are present:
targeting:
headers:
x-chaos-enabled: "true"
This is useful for testing - developers can add the header to trigger faults on demand.
Percentage Selection
Affect only a percentage of matching requests:
targeting:
percentage: 10 # Affect 10% of matching requests
Safety Controls
Schedule Windows
Only run chaos during specific times:
safety:
schedule:
- days: [mon, tue, wed, thu, fri]
start: "09:00"
end: "17:00"
timezone: "America/New_York"
Excluded Paths
Protect critical endpoints:
safety:
excluded_paths:
- "/health"
- "/ready"
- "/metrics"
Kill Switch
Disable all chaos instantly:
settings:
enabled: false
Dry Run Mode
Log what would happen without affecting traffic:
settings:
dry_run: true
Or via command line:
sentinel-agent-chaos --dry-run
Installation
Using Bundle (Recommended)
The easiest way to install this agent is via the Sentinel bundle command:
# Install just this agent
sentinel bundle install chaos
# Or install all available agents
sentinel bundle install --all
The bundle command automatically downloads the correct binary for your platform and places it in ~/.sentinel/agents/.
From Cargo
cargo install sentinel-agent-chaos
From Source
git clone https://github.com/raskell-io/sentinel-agent-chaos.git
cd sentinel-agent-chaos
cargo build --release
Configuration
CLI Options
sentinel-agent-chaos [OPTIONS]
Options:
-c, --config <FILE> Path to configuration file [default: chaos.yaml]
-s, --socket <PATH> Unix socket path [default: /tmp/sentinel-chaos.sock]
--grpc-address <ADDR> gRPC listen address (e.g., 0.0.0.0:50051)
-L, --log-level <LEVEL> Log level [default: info]
--print-config Print example configuration and exit
--validate Validate configuration and exit
--dry-run Run in dry-run mode
-h, --help Print help
-V, --version Print version
Sentinel Integration
Add the agent to your Sentinel proxy configuration:
agents:
- name: chaos
socket: /tmp/sentinel-chaos.sock
on_request: true
on_response: false
Full Configuration Example
settings:
enabled: true
dry_run: false
log_injections: true
safety:
max_affected_percent: 50
schedule:
- days: [mon, tue, wed, thu, fri]
start: "09:00"
end: "17:00"
timezone: "UTC"
excluded_paths:
- "/health"
- "/ready"
- "/metrics"
experiments:
- id: "api-latency"
enabled: true
description: "Add latency to API calls"
targeting:
paths:
- prefix: "/api/"
percentage: 10
fault:
type: latency
min_ms: 100
max_ms: 500
- id: "header-triggered"
enabled: true
description: "Latency when X-Chaos header present"
targeting:
headers:
x-chaos-latency: "true"
percentage: 100
fault:
type: latency
fixed_ms: 2000
Response Headers
When faults are injected, the following headers are added:
| Header | Description |
|---|---|
x-chaos-injected | Always "true" when a fault was injected |
x-chaos-experiment | ID of the experiment that was applied |
Best Practices
- Start with dry run mode - Verify targeting before enabling
- Use low percentages - Start with 1-5% and increase gradually
- Always exclude health checks - Ensure
/health,/readyare protected - Set schedule windows - Only run during business hours
- Use header triggers for testing - Controlled testing without affecting production
- Monitor logs - Track how many faults are being injected