Performance Benchmarks

This page presents benchmark results for Agent Protocol v2 optimizations, helping you understand expected performance characteristics and make informed configuration decisions.

Executive Summary

Agent Protocol v2 achieves significant performance improvements through lock-free data structures, optimized serialization, and efficient memory management.

Optimization	Improvement	Impact
Atomic health cache	10x faster	Hot-path health checks
MessagePack serialization	24-32% faster	Large payload processing
SmallVec headers	40% faster	Single-value header allocation
Body chunk streaming	4.7-11x faster	WAF body inspection
Protocol metrics	<3ns overhead	Zero-cost observability

Full request path latency: ~230ns (excluding network I/O)

Lock-Free Operations

Health State Caching

Health checks are on every request’s hot path. Using atomic operations instead of locks provides 10x improvement:

Operation	Atomic	RwLock	Speedup
Read	0.46ns	4.6ns	10x
Write	0.46ns	1.8ns	4x

Timestamp Tracking

Atomic timestamp reads for “last seen” tracking:

Operation	AtomicU64	RwLock	Speedup
Read	0.78ns	4.7ns	6x
Write	18.7ns	16.7ns	~Equal

Reads dominate in production, making the 6x improvement significant.

Connection Affinity Lookup

DashMap provides O(1) lookup regardless of concurrent request count:

Entries	Lookup (hit)	Lookup (miss)
10	12.3ns	8.8ns
100	13.5ns	8.3ns
1,000	14.0ns	8.9ns
10,000	12.9ns	9.5ns

Serialization Performance

MessagePack vs JSON

MessagePack provides significant wins for larger payloads with many headers:

Serialization:

Payload	JSON	MessagePack	Speedup
Small (204B)	153ns	150ns	2%
Large (1080B)	745ns	562ns	25%

Deserialization:

Payload	JSON	MessagePack	Speedup
Small (204B)	403ns	297ns	26%
Large (894B)	2.46us	1.68us	32%

Wire size reduction:

JSON small:        204 bytes
MessagePack small: 110 bytes (46% smaller)

JSON large:        1080 bytes
MessagePack large: 894 bytes (17% smaller)

When to Use MessagePack

Enable the binary-uds feature for:

Processing request/response bodies (8-10x improvement)
High header volume (25-32% improvement for large headers)
Bandwidth-constrained environments (17-46% smaller payloads)

Use JSON for:

Debugging and observability (human-readable)
Interop with non-Rust agents lacking MessagePack support
Small payloads where simplicity matters more

Body Chunk Streaming

The most dramatic improvement is in body chunk handling, critical for WAF agents inspecting request bodies.

Serialization Throughput

Size	JSON + Base64	MessagePack Binary	Speedup
1KB	1.97 GiB/s	9.25 GiB/s	4.7x
4KB	2.46 GiB/s	27.2 GiB/s	11x
16KB	2.51 GiB/s	31.4 GiB/s	12.5x
64KB	1.75 GiB/s	4.62 GiB/s	2.6x

Deserialization Throughput

Size	JSON + Base64	MessagePack Binary	Speedup
1KB	4.4 GiB/s	20.4 GiB/s	4.6x
4KB	6.6 GiB/s	49.5 GiB/s	7.5x
16KB	7.2 GiB/s	48.7 GiB/s	6.8x
64KB	7.5 GiB/s	62.4 GiB/s	8.3x

MessagePack with serde_bytes achieves 8-10x better throughput by avoiding base64 encoding overhead.

Header Optimization (SmallVec)

Most HTTP headers have a single value. SmallVec stores these inline, avoiding heap allocation.

Single-Value Headers (Common Case)

Container	Time	Notes
Vec	18.9ns	Heap allocation
SmallVec<[String; 1]>	11.5ns	Inline storage

Speedup: 40% for the most common case.

Header Map Creation (20 headers)

Container	Time	Speedup
Vec-based map	1.29us	-
SmallVec-based map	1.07us	17%

Multi-value headers (3+ values) spill to heap with ~5% overhead, but this case is rare in practice.

Protocol Metrics Overhead

Built-in metrics add negligible overhead:

Operation	Time
Counter increment	1.65ns
Counter read	0.31ns
Histogram record	2.61ns

A typical request with 5 metric updates adds ~15ns total.

Full Request Path

The complete hot path (excluding network I/O):

Agent lookup (DashMap)
Affinity check (DashMap)
Health check (AtomicBool)
Counter increments (2x AtomicU64)
Serialization
Affinity store/clear (DashMap insert/remove)

Path	Time
JSON path	~226ns
MessagePack path	~226ns

Total: ~230ns for the complete hot path.

Comparison with Targets

Metric	Target	Achieved	Result
Connection selection	<1us	~15ns	67x better
Health check	O(1)	0.46ns	Achieved
Body throughput	>1 GiB/s	62 GiB/s	62x better
Metrics overhead	Negligible	2.6ns	Achieved
Affinity lookup	O(1)	~13ns	Achieved

Configuration Recommendations

Based on benchmarks, these configurations work well for most deployments:

High Throughput

let config = AgentPoolConfig {
    connections_per_agent: 8,
    load_balance_strategy: LoadBalanceStrategy::LeastConnections,
    channel_buffer_size: 128,
    ..Default::default()
};

Low Latency

let config = AgentPoolConfig {
    connections_per_agent: 4,
    load_balance_strategy: LoadBalanceStrategy::LeastConnections,
    request_timeout: Duration::from_millis(100),
    channel_buffer_size: 32,
    ..Default::default()
};

Body Inspection (WAF)

Enable binary transport for body-heavy workloads:

agents {
    agent "waf" type="waf" {
        binary-uds "/var/run/sentinel/waf.sock"
        events "request_headers" "request_body"
        buffer-size 65536
    }
}

Test Environment

These benchmarks were collected on:

Platform: macOS Darwin 24.6.0
Rust: 1.92.0 (release build, LTO enabled)
Tool: Criterion with 100 samples per benchmark

For detailed methodology and raw data, see the benchmark source code.