Performance Benchmarks

This page presents benchmark results for Agent Protocol v2 optimizations, helping you understand expected performance characteristics and make informed configuration decisions.

Executive Summary

Agent Protocol v2 achieves significant performance improvements through lock-free data structures, optimized serialization, and efficient memory management.

OptimizationImprovementImpact
Atomic health cache10x fasterHot-path health checks
MessagePack serialization24-32% fasterLarge payload processing
SmallVec headers40% fasterSingle-value header allocation
Body chunk streaming4.7-11x fasterWAF body inspection
Protocol metrics<3ns overheadZero-cost observability

Full request path latency: ~230ns (excluding network I/O)


Lock-Free Operations

Health State Caching

Health checks are on every request’s hot path. Using atomic operations instead of locks provides 10x improvement:

OperationAtomicRwLockSpeedup
Read0.46ns4.6ns10x
Write0.46ns1.8ns4x

Timestamp Tracking

Atomic timestamp reads for “last seen” tracking:

OperationAtomicU64RwLockSpeedup
Read0.78ns4.7ns6x
Write18.7ns16.7ns~Equal

Reads dominate in production, making the 6x improvement significant.

Connection Affinity Lookup

DashMap provides O(1) lookup regardless of concurrent request count:

EntriesLookup (hit)Lookup (miss)
1012.3ns8.8ns
10013.5ns8.3ns
1,00014.0ns8.9ns
10,00012.9ns9.5ns

Serialization Performance

MessagePack vs JSON

MessagePack provides significant wins for larger payloads with many headers:

Serialization:

PayloadJSONMessagePackSpeedup
Small (204B)153ns150ns2%
Large (1080B)745ns562ns25%

Deserialization:

PayloadJSONMessagePackSpeedup
Small (204B)403ns297ns26%
Large (894B)2.46us1.68us32%

Wire size reduction:

JSON small:        204 bytes
MessagePack small: 110 bytes (46% smaller)

JSON large:        1080 bytes
MessagePack large: 894 bytes (17% smaller)

When to Use MessagePack

Enable the binary-uds feature for:

  • Processing request/response bodies (8-10x improvement)
  • High header volume (25-32% improvement for large headers)
  • Bandwidth-constrained environments (17-46% smaller payloads)

Use JSON for:

  • Debugging and observability (human-readable)
  • Interop with non-Rust agents lacking MessagePack support
  • Small payloads where simplicity matters more

Body Chunk Streaming

The most dramatic improvement is in body chunk handling, critical for WAF agents inspecting request bodies.

Serialization Throughput

SizeJSON + Base64MessagePack BinarySpeedup
1KB1.97 GiB/s9.25 GiB/s4.7x
4KB2.46 GiB/s27.2 GiB/s11x
16KB2.51 GiB/s31.4 GiB/s12.5x
64KB1.75 GiB/s4.62 GiB/s2.6x

Deserialization Throughput

SizeJSON + Base64MessagePack BinarySpeedup
1KB4.4 GiB/s20.4 GiB/s4.6x
4KB6.6 GiB/s49.5 GiB/s7.5x
16KB7.2 GiB/s48.7 GiB/s6.8x
64KB7.5 GiB/s62.4 GiB/s8.3x

MessagePack with serde_bytes achieves 8-10x better throughput by avoiding base64 encoding overhead.


Header Optimization (SmallVec)

Most HTTP headers have a single value. SmallVec stores these inline, avoiding heap allocation.

Single-Value Headers (Common Case)

ContainerTimeNotes
Vec18.9nsHeap allocation
SmallVec<[String; 1]>11.5nsInline storage

Speedup: 40% for the most common case.

Header Map Creation (20 headers)

ContainerTimeSpeedup
Vec-based map1.29us-
SmallVec-based map1.07us17%

Multi-value headers (3+ values) spill to heap with ~5% overhead, but this case is rare in practice.


Protocol Metrics Overhead

Built-in metrics add negligible overhead:

OperationTime
Counter increment1.65ns
Counter read0.31ns
Histogram record2.61ns

A typical request with 5 metric updates adds ~15ns total.


Full Request Path

The complete hot path (excluding network I/O):

  1. Agent lookup (DashMap)
  2. Affinity check (DashMap)
  3. Health check (AtomicBool)
  4. Counter increments (2x AtomicU64)
  5. Serialization
  6. Affinity store/clear (DashMap insert/remove)
PathTime
JSON path~226ns
MessagePack path~226ns

Total: ~230ns for the complete hot path.


Comparison with Targets

MetricTargetAchievedResult
Connection selection<1us~15ns67x better
Health checkO(1)0.46nsAchieved
Body throughput>1 GiB/s62 GiB/s62x better
Metrics overheadNegligible2.6nsAchieved
Affinity lookupO(1)~13nsAchieved

Configuration Recommendations

Based on benchmarks, these configurations work well for most deployments:

High Throughput

let config = AgentPoolConfig {
    connections_per_agent: 8,
    load_balance_strategy: LoadBalanceStrategy::LeastConnections,
    channel_buffer_size: 128,
    ..Default::default()
};

Low Latency

let config = AgentPoolConfig {
    connections_per_agent: 4,
    load_balance_strategy: LoadBalanceStrategy::LeastConnections,
    request_timeout: Duration::from_millis(100),
    channel_buffer_size: 32,
    ..Default::default()
};

Body Inspection (WAF)

Enable binary transport for body-heavy workloads:

agents {
    agent "waf" type="waf" {
        binary-uds "/var/run/sentinel/waf.sock"
        events "request_headers" "request_body"
        buffer-size 65536
    }
}

Test Environment

These benchmarks were collected on:

  • Platform: macOS Darwin 24.6.0
  • Rust: 1.92.0 (release build, LTO enabled)
  • Tool: Criterion with 100 samples per benchmark

For detailed methodology and raw data, see the benchmark source code.