Metrics Reference

Prometheus metrics exposed by Sentinel for monitoring and alerting.

Metrics Endpoint

Metrics are available at the /metrics endpoint on the admin listener:

curl http://localhost:9090/metrics

Configure the admin listener:

listeners {
    listener "admin" {
        address "127.0.0.1:9090"
        protocol "http"
    }
}

routes {
    route "metrics" {
        matches {
            path "/metrics"
        }
        service-type "builtin"
        builtin-handler "metrics"
    }
}

Request Metrics

sentinel_request_duration_seconds

Request latency histogram.

TypeLabelsDescription
Histogramroute, methodRequest duration in seconds

Buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Example queries:

# Average latency by route
rate(sentinel_request_duration_seconds_sum[5m])
  / rate(sentinel_request_duration_seconds_count[5m])

# P99 latency
histogram_quantile(0.99,
  rate(sentinel_request_duration_seconds_bucket[5m]))

# P95 latency by route
histogram_quantile(0.95,
  sum(rate(sentinel_request_duration_seconds_bucket[5m])) by (le, route))

sentinel_requests_total

Total request counter.

TypeLabelsDescription
Counterroute, method, statusTotal requests

Example queries:

# Requests per second
rate(sentinel_requests_total[5m])

# Error rate (5xx)
sum(rate(sentinel_requests_total{status=~"5.."}[5m]))
  / sum(rate(sentinel_requests_total[5m]))

# Success rate by route
sum(rate(sentinel_requests_total{status="200"}[5m])) by (route)
  / sum(rate(sentinel_requests_total[5m])) by (route)

sentinel_active_requests

Currently active requests.

TypeLabelsDescription
Gauge-Number of in-flight requests

Example queries:

# Current active requests
sentinel_active_requests

# Alert if too high
sentinel_active_requests > 1000

sentinel_request_body_size_bytes

Request body size histogram.

TypeLabelsDescription
HistogramrouteRequest body size in bytes

Buckets: 100B, 1KB, 10KB, 100KB, 1MB, 10MB, 100MB

sentinel_response_body_size_bytes

Response body size histogram.

TypeLabelsDescription
HistogramrouteResponse body size in bytes

Scoped Metrics

When using namespaces and services, additional metrics with scope labels are available.

sentinel_scoped_request_duration_seconds

Request latency with namespace/service labels.

TypeLabelsDescription
Histogramnamespace, service, route, methodRequest duration in seconds

Example queries:

# P99 latency by namespace
histogram_quantile(0.99,
  sum(rate(sentinel_scoped_request_duration_seconds_bucket[5m])) by (le, namespace))

# Compare latency across services
histogram_quantile(0.95,
  sum(rate(sentinel_scoped_request_duration_seconds_bucket[5m])) by (le, namespace, service))

sentinel_scoped_requests_total

Request counter with namespace/service labels.

TypeLabelsDescription
Counternamespace, service, route, method, statusTotal requests

Example queries:

# Request rate by namespace
sum(rate(sentinel_scoped_requests_total[5m])) by (namespace)

# Error rate by service
sum(rate(sentinel_scoped_requests_total{status=~"5.."}[5m])) by (namespace, service)
  / sum(rate(sentinel_scoped_requests_total[5m])) by (namespace, service)

# Top 10 busiest services
topk(10, sum(rate(sentinel_scoped_requests_total[5m])) by (namespace, service))

sentinel_scoped_active_requests

Active requests gauge with namespace/service labels.

TypeLabelsDescription
Gaugenamespace, serviceIn-flight requests per scope

sentinel_scoped_upstream_attempts_total

Upstream attempts with scope labels.

TypeLabelsDescription
Counternamespace, service, upstream, routeConnection attempts

sentinel_scoped_upstream_failures_total

Upstream failures with scope labels.

TypeLabelsDescription
Counternamespace, service, upstream, route, reasonConnection failures

Example queries:

# Failure rate by namespace
sum(rate(sentinel_scoped_upstream_failures_total[5m])) by (namespace)
  / sum(rate(sentinel_scoped_upstream_attempts_total[5m])) by (namespace)

sentinel_scoped_rate_limit_hits_total

Rate limit hits with scope labels.

TypeLabelsDescription
Counternamespace, service, route, policyRate limit violations

Example queries:

# Rate limit hits by namespace
sum(rate(sentinel_scoped_rate_limit_hits_total[5m])) by (namespace)

# Services hitting rate limits
sum(rate(sentinel_scoped_rate_limit_hits_total[5m])) by (namespace, service) > 0

sentinel_scoped_circuit_breaker_state

Circuit breaker state with scope labels.

TypeLabelsDescription
Gaugenamespace, service, upstreamState: 0=closed, 1=open

Example queries:

# Open circuit breakers by namespace
sentinel_scoped_circuit_breaker_state == 1

# Count of open circuit breakers per namespace
count(sentinel_scoped_circuit_breaker_state == 1) by (namespace)

Upstream Metrics

sentinel_upstream_attempts_total

Upstream connection attempts.

TypeLabelsDescription
Counterupstream, routeTotal connection attempts

sentinel_upstream_failures_total

Upstream connection failures.

TypeLabelsDescription
Counterupstream, route, reasonTotal failures

Reason values:

  • connection_refused - TCP connection refused
  • connection_timeout - Connection timed out
  • read_timeout - Read timeout
  • write_timeout - Write timeout
  • tls_error - TLS handshake failed
  • dns_error - DNS resolution failed

Example queries:

# Failure rate by upstream
sum(rate(sentinel_upstream_failures_total[5m])) by (upstream)
  / sum(rate(sentinel_upstream_attempts_total[5m])) by (upstream)

# Connection refused errors
sum(rate(sentinel_upstream_failures_total{reason="connection_refused"}[5m])) by (upstream)

sentinel_circuit_breaker_state

Circuit breaker state.

TypeLabelsDescription
Gaugecomponent, routeState: 0=closed, 1=open

Example queries:

# Open circuit breakers
sentinel_circuit_breaker_state == 1

# Alert on circuit breaker open
sentinel_circuit_breaker_state{component="upstream"} == 1

Agent Metrics

sentinel_agent_latency_seconds

Agent call latency histogram.

TypeLabelsDescription
Histogramagent, eventAgent call duration

Event values:

  • on_request_headers
  • on_request_body
  • on_response_headers
  • on_response_body

Example queries:

# P99 agent latency
histogram_quantile(0.99,
  rate(sentinel_agent_latency_seconds_bucket[5m]))

# Average latency by agent
rate(sentinel_agent_latency_seconds_sum[5m])
  / rate(sentinel_agent_latency_seconds_count[5m])

sentinel_agent_timeouts_total

Agent call timeouts.

TypeLabelsDescription
Counteragent, eventTotal timeouts

Example queries:

# Timeout rate by agent
rate(sentinel_agent_timeouts_total[5m])

# Alert on high timeout rate
rate(sentinel_agent_timeouts_total[5m]) > 0.1

sentinel_blocked_requests_total

Requests blocked by agents/WAF.

TypeLabelsDescription
CounterreasonTotal blocked requests

Reason values:

  • waf - Blocked by WAF
  • auth - Authentication failed
  • rate_limit - Rate limited
  • policy - Policy violation

Connection Pool Metrics

sentinel_connection_pool_size

Total connections in pool.

TypeLabelsDescription
GaugeupstreamTotal connections

sentinel_connection_pool_idle

Idle connections in pool.

TypeLabelsDescription
GaugeupstreamIdle connections

sentinel_connection_pool_acquired_total

Connections acquired from pool.

TypeLabelsDescription
CounterupstreamTotal acquisitions

Example queries:

# Pool utilization
(sentinel_connection_pool_size - sentinel_connection_pool_idle)
  / sentinel_connection_pool_size

# Connection acquisition rate
rate(sentinel_connection_pool_acquired_total[5m])

TLS Metrics

sentinel_tls_handshake_duration_seconds

TLS handshake duration.

TypeLabelsDescription
HistogramversionHandshake duration

Version values: TLS1.2, TLS1.3

System Metrics

sentinel_memory_usage_bytes

Process memory usage.

TypeLabelsDescription
Gauge-Memory usage in bytes

sentinel_cpu_usage_percent

CPU usage percentage.

TypeLabelsDescription
Gauge-CPU usage 0-100

sentinel_open_connections

Open connections count.

TypeLabelsDescription
Gauge-Number of open connections

Prometheus Configuration

Basic Scrape Config

scrape_configs:
  - job_name: 'sentinel'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 15s
    metrics_path: /metrics

With Service Discovery

scrape_configs:
  - job_name: 'sentinel'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: sentinel
        action: keep
      - source_labels: [__meta_kubernetes_pod_container_port_name]
        regex: metrics
        action: keep

Alerting Rules

Example Alerts

groups:
  - name: sentinel
    rules:
      # High error rate
      - alert: SentinelHighErrorRate
        expr: |
          sum(rate(sentinel_requests_total{status=~"5.."}[5m]))
          / sum(rate(sentinel_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on Sentinel"
          description: "Error rate is {{ $value | humanizePercentage }}"

      # Circuit breaker open
      - alert: SentinelCircuitBreakerOpen
        expr: sentinel_circuit_breaker_state == 1
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Circuit breaker open"
          description: "Circuit breaker open for {{ $labels.component }}"

      # High latency
      - alert: SentinelHighLatency
        expr: |
          histogram_quantile(0.99,
            rate(sentinel_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High P99 latency"
          description: "P99 latency is {{ $value }}s"

      # Agent timeouts
      - alert: SentinelAgentTimeouts
        expr: rate(sentinel_agent_timeouts_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Agent timeouts detected"
          description: "Agent {{ $labels.agent }} timing out"

      # No healthy upstreams
      - alert: SentinelNoHealthyUpstreams
        expr: |
          sum(sentinel_circuit_breaker_state{component="upstream"})
          == count(sentinel_circuit_breaker_state{component="upstream"})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No healthy upstreams"

Grafana Dashboard

Key panels for a Sentinel dashboard:

  1. Request Rate - rate(sentinel_requests_total[5m])
  2. Error Rate - 5xx / total
  3. Latency P50/P95/P99 - histogram_quantile
  4. Active Requests - sentinel_active_requests
  5. Upstream Health - circuit breaker states
  6. Agent Latency - agent_latency histogram
  7. Connection Pool - size vs idle
  8. Memory/CPU - system metrics

See Also