Guide to diagnosing and resolving common Sentinel issues.
Quick Diagnostics
Check Service Status
# Is Sentinel running?
|
# Check listening ports
|
# View recent logs
Test Configuration
# Validate configuration
# Test with verbose output
Check Connectivity
# Test listener
# Test upstream directly
# Check DNS resolution
Common Issues
Startup Failures
“Address already in use”
Error: Address already in use (os error 98)
Cause: Another process is using the port.
Solution:
# Find what's using the port
# or
|
# Kill the process or change Sentinel's port
“Permission denied” on privileged ports
Error: Permission denied (os error 13)
Cause: Ports below 1024 require root or capabilities.
Solution:
# Option 1: Grant capability
# Option 2: Use port >= 1024 and redirect
# Option 3: Use systemd socket activation
“Configuration file not found”
Error: Configuration error: Failed to load configuration file
Solution:
# Check file exists and permissions
# Verify path
Connection Issues
502 Bad Gateway
Symptoms: All requests return 502.
Diagnosis:
# Check upstream health
# Test upstream directly
# Check logs for upstream errors
Common causes:
- Upstream server not running
- Firewall blocking connection
- DNS resolution failure
- Wrong upstream address/port
Solutions:
# Verify upstream is accessible
# Check firewall
|
# Verify DNS
503 Service Unavailable
Symptoms: Intermittent 503 errors.
Diagnosis:
# Check circuit breaker status
# Check connection limits
|
Common causes:
- Circuit breaker open
- All upstreams unhealthy
- Connection limit reached
- Rate limit exceeded
Solutions:
// Increase connection limits
limits {
max-total-connections 20000
max-connections-per-client 200
}
// Adjust circuit breaker
routes {
route "api" {
circuit-breaker {
failure-threshold 10 // More tolerant
timeout-seconds 60 // Longer recovery
}
}
}
504 Gateway Timeout
Symptoms: Requests timeout after delay.
Diagnosis:
# Check upstream response time
# Check timeout settings
Solutions:
// Increase timeouts for slow endpoints
routes {
route "slow-api" {
policies {
timeout-secs 120
}
}
}
upstreams {
upstream "backend" {
timeouts {
request-secs 120
read-secs 60
}
}
}
TLS/Certificate Issues
“Invalid certificate chain”
# Verify certificate
# Check certificate chain
# Test TLS connection
“Certificate expired”
# Check expiration
# Check days until expiration
Key/cert mismatch
# Compare modulus
|
|
# These should match
Performance Issues
High Latency
Diagnosis:
# Check P99 latency
|
# Profile request
curl-format.txt:
time_namelookup: %{time_namelookup}s\n
time_connect: %{time_connect}s\n
time_appconnect: %{time_appconnect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_redirect: %{time_redirect}s\n
time_starttransfer: %{time_starttransfer}s\n
time_total: %{time_total}s\n
Common causes and solutions:
| Cause | Solution |
|---|---|
| DNS resolution slow | Use IP addresses or local DNS cache |
| TLS handshake slow | Enable session resumption |
| Connection establishment | Increase connection pool |
| Upstream slow | Add caching, optimize backend |
| Body too large | Stream instead of buffer |
High Memory Usage
Diagnosis:
# Check memory metrics
|
# Check process memory
|
|
Solutions:
// Reduce buffer sizes
limits {
max-body-buffer-bytes 524288 // 512KB
max-body-inspection-bytes 524288
}
// Reduce connection pool
upstreams {
upstream "backend" {
connection-pool {
max-connections 50
max-idle 10
}
}
}
// Set memory limit
limits {
max-memory-percent 70.0
}
High CPU Usage
Diagnosis:
# Check CPU metrics
|
# Profile with perf (Linux)
Solutions:
// Adjust worker threads
server {
worker-threads 4 // Match CPU cores
}
// Reduce logging
// Set RUST_LOG=warn in environment
// Disable unnecessary features
routes {
route "api" {
policies {
buffer-requests false
buffer-responses false
}
}
}
Debug Mode
Enable Debug Logging
# Via environment
RUST_LOG=debug
# Module-specific debug
RUST_LOG=sentinel::proxy=debug,sentinel::agents=trace
# Pretty format for development
SENTINEL_LOG_FORMAT=pretty RUST_LOG=debug
Log Analysis
# Find errors
# Find specific correlation ID
# Count errors by type
| | |
# Find slow requests (>1s)
Request Tracing
Every request has a correlation ID in X-Correlation-Id header:
# Make request and get correlation ID
# X-Correlation-Id: 2kF8xQw4BnM
# Search logs by ID
|
Metrics Analysis
# Dump all metrics
# Check error rates
|
# Check upstream health
|
Health Check Failures
Sentinel Health Check
# Basic health
# Detailed status
Upstream Health Check Failures
Diagnosis:
# Check upstream status
# Test health endpoint directly
Common causes:
- Health endpoint returns non-200
- Health check timeout too short
- Health endpoint path wrong
- Upstream overloaded
Solutions:
upstreams {
upstream "backend" {
health-check {
type "http" {
path "/health" // Verify path
expected-status 200
}
timeout-secs 10 // Increase timeout
unhealthy-threshold 5 // More tolerant
}
}
}
Agent Issues
Agent Connection Failed
Agent error: auth - connection refused
Diagnosis:
# Check agent is running
|
# Check socket exists
# Test socket connection
Solutions:
# Start agent
# Check socket permissions
Agent Timeouts
Diagnosis:
# Check agent latency metrics
|
# Check timeout count
|
Solutions:
agents {
agent "auth" {
timeout-ms 200 // Increase timeout
circuit-breaker {
failure-threshold 10 // More tolerant
}
}
}
Configuration Reload Issues
Reload Failed
# Check reload status
|
# Validate new config before reload
# Manual reload
Config Validation Errors
# Get detailed validation errors
# Common issues:
# - Route references undefined upstream
# - Duplicate route/upstream IDs
# - Invalid regex in path-regex
# - Missing required fields
Getting Help
Collect Diagnostic Information
# System info
# Sentinel version
# Configuration (sanitized)
|
# Recent logs
# Metrics snapshot
Log Locations
| Platform | Location |
|---|---|
| systemd | journalctl -u sentinel |
| Docker | docker logs sentinel |
| Kubernetes | kubectl logs -l app=sentinel |
| Custom | Check working-directory in config |
See Also
- Health Monitoring - Health checks and monitoring
- Metrics Reference - Available metrics
- Error Codes - Error types and codes