Capacity Planning

Guide for sizing and scaling Sentinel deployments.

Resource Requirements

Minimum Requirements

ResourceMinimumRecommendedNotes
CPU2 cores4+ coresScales linearly with request rate
Memory512 MB2 GB+Depends on connection count
Disk1 GB10 GBLogs, certificates, GeoIP DB
Network100 Mbps1 Gbps+Based on traffic volume

Resource Consumption Model

CPU Usage:

  • TLS handshakes: ~2ms CPU per handshake
  • Request processing: ~0.1ms CPU per request (proxy-only)
  • WAF inspection: ~1-5ms CPU per request (when enabled)
  • Compression: ~0.5-2ms CPU per MB compressed

Memory Usage:

  • Base process: ~50 MB
  • Per connection: ~2-8 KB (idle) / ~16-64 KB (active)
  • Per worker thread: ~8 MB
  • Request buffering: configurable via max-body-size
  • Connection pool: ~1 KB per pooled connection

Sizing Guidelines

Small Deployment

Traffic: < 1,000 requests/second

server {
    worker-threads 2
    max-connections 5000
}

connection-pool {
    max-connections 100
    max-idle 20
}

Resources: 2 cores, 1 GB RAM

Medium Deployment

Traffic: 1,000 - 10,000 requests/second

server {
    worker-threads 4
    max-connections 20000
}

connection-pool {
    max-connections 200
    max-idle 50
}

Resources: 4 cores, 4 GB RAM per instance, 3 instances for HA

Large Deployment

Traffic: 10,000 - 100,000 requests/second

system {
    worker-threads 0  // Use all available cores
    max-connections 50000
}

listeners {
    listener "http" {
        address "0.0.0.0:8080"
        protocol "http"
    }
}

routes {
    route "default" {
        matches { path-prefix "/" }
        upstream "backend"
    }
}

upstreams {
    upstream "backend" {
        targets {
            target { address "127.0.0.1:3000" }
        }
        connection-pool {
            max-connections 500
            max-idle 100
            idle-timeout-secs 120
        }
    }
}

Resources: 8+ cores, 16 GB RAM per instance, 5+ instances across regions

Performance Characteristics

Request Processing Latency

ComponentLatency (p50)Latency (p99)
TCP accept< 0.1 ms< 0.5 ms
TLS handshake (new)2-5 ms10-20 ms
TLS handshake (resumed)0.5-1 ms2-5 ms
Header parsing< 0.1 ms< 0.5 ms
Route matching< 0.05 ms< 0.2 ms
Upstream selection< 0.01 ms< 0.05 ms
Agent call (if enabled)1-5 ms10-50 ms
Proxy overhead (total)0.5-2 ms5-15 ms

Throughput Limits

ScenarioApproximate LimitBottleneck
Simple proxy (HTTP)50,000 RPS/coreCPU
TLS termination10,000 new conn/s/coreCPU (crypto)
Large body (1MB)1-8 GbpsNetwork/Memory
WAF enabled5,000-10,000 RPS/coreAgent latency

Connection Limits Formula

Max Connections = Available Memory (MB) / Memory per Connection (KB) * 1024

Example:
4096 MB / 16 KB * 1024 = 262,144 connections (theoretical max)
Practical max: ~50% of theoretical for headroom

Capacity Metrics

Key Metrics to Monitor

# Current request rate
curl -s localhost:9090/metrics | grep 'requests_total'

# Active connections
curl -s localhost:9090/metrics | grep 'open_connections'

# Connection pool utilization
curl -s localhost:9090/metrics | grep 'connection_pool'

# Memory usage
curl -s localhost:9090/metrics | grep 'process_resident_memory_bytes'

# Request latency percentiles
curl -s localhost:9090/metrics | grep 'request_duration.*quantile'

Capacity Thresholds

MetricWarningCriticalAction
CPU utilization> 70%> 85%Scale horizontally
Memory utilization> 75%> 90%Increase memory or scale
Connection count> 70% max> 85% maxIncrease limits or scale
p99 latency> 100ms> 500msInvestigate or scale
Error rate> 0.1%> 1%Investigate upstream/config

Prometheus Alerting Rules

groups:
  - name: sentinel-capacity
    rules:
      - alert: SentinelHighCPU
        expr: rate(process_cpu_seconds_total{job="sentinel"}[5m]) > 0.7
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Sentinel CPU usage > 70%"

      - alert: SentinelConnectionsHigh
        expr: sentinel_open_connections / sentinel_max_connections > 0.7
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Sentinel approaching connection limit"

      - alert: SentinelLatencyHigh
        expr: histogram_quantile(0.99, rate(sentinel_request_duration_seconds_bucket[5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Sentinel p99 latency > 100ms"

Scaling Strategies

Vertical Scaling

When to use: Quick fix, single-instance deployments

// Increase worker threads (if CPU-bound)
server {
    worker-threads 8  // Increase from 4
}

// Increase connection limits (if connection-bound)
server {
    max-connections 50000  // Increase from 20000
}

Limits:

  • Single machine limits (typically 64 cores, 256 GB RAM)
  • Single point of failure
  • Diminishing returns above 8-16 cores for proxy workloads

Horizontal Scaling

When to use: Production deployments, high availability

                 Load Balancer
      ┌───────────────┼───────────────┐
      │               │               │
 ┌────▼────┐    ┌────▼────┐    ┌────▼────┐
 │Sentinel │    │Sentinel │    │Sentinel │
 │   #1    │    │   #2    │    │   #3    │
 └─────────┘    └─────────┘    └─────────┘

Scaling Formula:

Instances = (Peak RPS × Safety Factor) / RPS per Instance

Example:
Peak RPS: 50,000
Safety Factor: 1.5
RPS per Instance: 15,000 (with WAF)

Instances = (50,000 × 1.5) / 15,000 = 5 instances

Auto-Scaling (Kubernetes)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sentinel-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sentinel
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: sentinel_requests_per_second
        target:
          type: AverageValue
          averageValue: "10000"

Load Testing

Baseline Test

# Simple throughput test with wrk
wrk -t12 -c400 -d30s http://sentinel:8080/health

# Latency-focused test
wrk -t4 -c50 -d60s --latency http://sentinel:8080/api/endpoint

Capacity Test Script

#!/bin/bash
# Find maximum sustainable throughput

for CONNECTIONS in 100 500 1000 2000 5000 10000; do
    echo "Testing with $CONNECTIONS connections..."

    wrk -t12 -c$CONNECTIONS -d60s --latency http://sentinel:8080/api \
        > results-${CONNECTIONS}c.txt 2>&1

    # Check for errors or degradation
    RPS=$(grep "Requests/sec" results-${CONNECTIONS}c.txt | awk '{print $2}')
    P99=$(grep "99%" results-${CONNECTIONS}c.txt | awk '{print $2}')

    echo "$CONNECTIONS connections: $RPS RPS, p99=$P99"
    sleep 30  # Cool down
done

Capacity Planning Process

1. Gather Requirements

  • Peak requests per second
  • Average request/response size
  • TLS termination required?
  • WAF/Agent processing?
  • Growth projections
  • SLA requirements (availability, latency)

2. Calculate Base Capacity

Rules of Thumb:

  • 1 core ≈ 10,000-50,000 simple proxy RPS
  • TLS halves throughput
  • WAF reduces throughput by 50-70%
  • Minimum 3 instances for HA

3. Size and Validate

  • Load test with expected peak traffic
  • Verify p99 latency within SLA
  • Test failover scenarios (N-1 capacity)
  • Validate auto-scaling triggers

4. Document and Review

  • Capacity limits and headroom
  • Scaling thresholds
  • Review schedule (quarterly or 25% traffic increase)

Quick Reference

Common Bottlenecks

SymptomLikely BottleneckSolution
High CPU, low connectionsProcessing capacityAdd cores/instances
High connections, low CPUConnection limitsIncrease limits, optimize keepalive
High p99, moderate CPUUpstream latencyOptimize upstreams
Errors under loadResource exhaustionScale up/out

Capacity Rules of Thumb

  1. CPU: 1 core ≈ 10,000-50,000 simple proxy RPS
  2. Memory: 16 KB per active connection (more with WAF)
  3. TLS: Halves throughput, 10K new connections/sec/core
  4. WAF: Reduces throughput by 50-70%
  5. Instances: Minimum 3 for HA, N+1 for maintenance

See Also