A load balancer configuration distributing traffic across multiple backend servers with health checks, session affinity, and weighted routing.
Use Case
- Distribute traffic across multiple backend instances
- Handle backend failures gracefully
- Support blue-green and canary deployments
- Sticky sessions for stateful applications
Architecture
┌─────────────────┐
│ Sentinel │
│ Load Balancer │
└────────┬────────┘
│
┌────────────┬────────────┼────────────┬────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ App 1 │ │ App 2 │ │ App 3 │ │ App 4 │ │ App 5 │
│ :3000 │ │ :3001 │ │ :3002 │ │ :3003 │ │ :3004 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
Configuration
Create sentinel.kdl:
// Load Balancer Configuration
// Distributes traffic across multiple backends
system {
worker-threads 0
graceful-shutdown-timeout-secs 60
}
listeners {
listener "http" {
address "0.0.0.0:8080"
protocol "http"
}
listener "https" {
address "0.0.0.0:8443"
protocol "https"
tls {
cert-file "/etc/sentinel/certs/lb.crt"
key-file "/etc/sentinel/certs/lb.key"
}
}
}
routes {
// Health check endpoint
route "health" {
priority 1000
matches {
path "/health"
}
service-type "builtin"
builtin-handler "health"
}
// Upstream health status (admin)
route "upstream-status" {
priority 999
matches {
path "/admin/upstreams"
header name="X-Admin-Key" value="${ADMIN_KEY}"
}
service-type "builtin"
builtin-handler "upstreams"
}
// Main application - round robin
route "app" {
matches {
path-prefix "/"
}
upstream "app-cluster"
circuit-breaker {
failure-threshold 5
success-threshold 2
timeout-seconds 30
}
retry-policy {
max-attempts 3
retryable-status-codes 502 503 504
}
}
}
upstreams {
upstream "app-cluster" {
target "10.0.1.10:3000" weight=100
target "10.0.1.11:3000" weight=100
target "10.0.1.12:3000" weight=100
target "10.0.1.13:3000" weight=100
target "10.0.1.14:3000" weight=100
load-balancing "round-robin"
health-check {
type "http" {
path "/health"
expected-status 200
}
interval-secs 5
timeout-secs 3
unhealthy-threshold 3
healthy-threshold 2
}
connection-pool {
max-idle-connections 100
idle-timeout-secs 60
}
}
}
observability {
metrics {
enabled #true
address "0.0.0.0:9090"
}
logging {
level "info"
format "json"
}
}
Load Balancing Algorithms
Round Robin (Default)
upstreams {
upstream "app" {
load-balancing "round-robin"
}
}
Distributes requests evenly across all healthy backends.
Weighted Round Robin
system {
worker-threads 0
}
listeners {
listener "http" {
address "0.0.0.0:8080"
protocol "http"
}
}
upstreams {
upstream "app" {
target "10.0.1.10:3000" weight=100
target "10.0.1.11:3000" weight=50
target "10.0.1.12:3000" weight=50
load-balancing "weighted-round-robin"
}
}
routes {
route "default" {
matches { path-prefix "/" }
upstream "backend"
}
}
Least Connections
upstreams {
upstream "app" {
load-balancing "least-connections"
}
}
Routes to the backend with fewest active connections.
IP Hash (Sticky Sessions)
upstreams {
upstream "app" {
load-balancing "ip-hash"
}
}
Same client IP always routes to the same backend (when available).
Random
upstreams {
upstream "app" {
load-balancing "random"
}
}
Maglev (Consistent Hashing)
upstreams {
upstream "cache-cluster" {
target "cache-1:6379"
target "cache-2:6379"
target "cache-3:6379"
load-balancing "maglev"
}
}
Google’s Maglev algorithm provides O(1) lookup with minimal key redistribution when backends change. Ideal for cache clusters.
Peak EWMA (Latency-Aware)
upstreams {
upstream "api" {
target "api-1:8080"
target "api-2:8080"
target "api-3:8080"
load-balancing "peak_ewma"
}
}
Tracks latency using exponential moving average. Automatically routes away from slow backends.
Locality-Aware (Multi-Region)
upstreams {
upstream "global-api" {
target "10.0.1.1:8080" {
metadata { "zone" "us-east-1a" }
}
target "10.0.1.2:8080" {
metadata { "zone" "us-east-1b" }
}
target "10.0.2.1:8080" {
metadata { "zone" "eu-west-1a" }
}
load-balancing "locality_aware"
}
}
Prefers backends in the same zone as the proxy, reducing cross-region latency.
Weighted Least Connections
upstreams {
upstream "mixed-capacity" {
target "large-server:8080" weight=200
target "medium-server:8080" weight=100
target "small-server:8080" weight=50
load-balancing "weighted_least_conn"
}
}
Selects backends with the lowest connections-to-weight ratio. Use when backends have different capacities.
Deterministic Subsetting (Large Clusters)
upstreams {
upstream "mega-cluster" {
// 1000+ targets
target "backend-001:8080"
target "backend-002:8080"
// ... many more ...
target "backend-999:8080"
load-balancing "deterministic_subset"
}
}
Each proxy instance connects to a subset of backends. Reduces connection overhead for very large clusters.
Adaptive (Self-Tuning)
upstreams {
upstream "api" {
target "api-1:8080" weight=100
target "api-2:8080" weight=100
target "api-3:8080" weight=100
load-balancing "adaptive"
health-check {
type "http" {
path "/health"
expected-status 200
}
interval-secs 5
}
}
}
Dynamically adjusts weights based on response times and error rates.
LLM Inference (Token-Based)
upstreams {
upstream "llm-cluster" {
target "gpu-node-1:8080"
target "gpu-node-2:8080"
target "gpu-node-3:8080"
load-balancing "least_tokens_queued"
}
}
Specialized for LLM workloads. Routes to the backend with fewest tokens queued.
Deployment Patterns
Blue-Green Deployment
upstreams {
// Blue (current production)
upstream "app-blue" {
target "10.0.1.10:3000"
target "10.0.1.11:3000"
}
// Green (new version)
upstream "app-green" {
target "10.0.2.10:3000"
target "10.0.2.11:3000"
}
}
routes {
route "app" {
matches {
path-prefix "/"
}
// Switch between blue and green by changing this
upstream "app-blue"
}
}
Switch traffic by updating upstream "app-blue" to upstream "app-green" and reloading:
Canary Deployment
upstreams {
upstream "app-canary" {
// Stable (90% traffic)
target "10.0.1.10:3000" weight=90
target "10.0.1.11:3000" weight=90
// Canary (10% traffic)
target "10.0.2.10:3000" weight=10
load-balancing "weighted-round-robin"
}
}
Header-Based Routing (A/B Testing)
routes {
// Beta users route to new version
route "app-beta" {
priority 100
matches {
path-prefix "/"
header name="X-Beta-User" value="true"
}
upstream "app-v2"
}
// Everyone else gets stable version
route "app-stable" {
priority 50
matches {
path-prefix "/"
}
upstream "app-v1"
}
}
Testing
Check Upstream Health
Response:
Verify Load Distribution
# Send 100 requests and check distribution
for; do
done | |
Simulate Backend Failure
# Stop one backend
# Verify traffic continues
# Check health status
Metrics
Key load balancer metrics:
|
| Metric | Description |
|---|---|
sentinel_upstream_health | Health status per target (1=healthy, 0=unhealthy) |
sentinel_upstream_connections_active | Active connections per target |
sentinel_upstream_requests_total | Requests per target |
sentinel_upstream_latency_seconds | Latency per target |
Customizations
Connection Draining
upstreams {
upstream "app" {
connection-draining {
enabled #true
timeout-secs 30
}
}
}
Allows in-flight requests to complete before removing unhealthy targets.
Slow Start
upstreams {
upstream "app" {
slow-start {
enabled #true
duration-secs 60
}
}
}
Gradually increases traffic to newly healthy targets.
Next Steps
- Observability - Monitor load distribution
- API Gateway - Add authentication layer
- Security - Protect against attacks