Overview
A comprehensive bot detection and management agent for Sentinel. Analyzes multiple signals to classify traffic as human, good bot (search engines, monitors), or bad bot (scrapers, attackers), returning a bot score with configurable ALLOW/BLOCK/CHALLENGE decisions.
Protocol v2 Features
As of v0.2.0, the Bot Management agent supports protocol v2 with:
- Capability negotiation: Reports supported features during handshake
- Health reporting: Exposes health status for monitoring
- Metrics export: Counter metrics for bot detections (allowed, blocked, challenged)
- gRPC transport: Optional high-performance gRPC transport via
--grpc-address - Lifecycle hooks: Graceful shutdown and drain handling
Features
- Multi-Signal Detection: Combines header analysis, User-Agent validation, known bot lookup, and behavioral patterns
- Known Bot Database: Identifies legitimate bots (Googlebot, Bingbot, etc.) with reverse DNS verification
- Bad Bot Patterns: Detects security scanners (sqlmap, nikto, nuclei) and scrapers
- Behavioral Analysis: Tracks session patterns, request rates, and timing regularity
- Challenge System: HMAC-signed challenge tokens for suspicious traffic
- Bot Score: 0-100 score with confidence level and category classification
- Configurable Thresholds: Tune ALLOW/CHALLENGE/BLOCK decision boundaries
Installation
Using Bundle (Recommended)
The easiest way to install this agent is via the Sentinel bundle command:
# Install just this agent
sentinel bundle install bot-management
# Or install all available agents
sentinel bundle install --all
The bundle command automatically downloads the correct binary for your platform and places it in ~/.sentinel/agents/.
Using Cargo
cargo install sentinel-agent-bot-management
From Source
git clone https://github.com/raskell-io/sentinel-agent-bot-management
cd sentinel-agent-bot-management
cargo build --release
Configuration
Command Line
sentinel-agent-bot-management \
--socket /var/run/sentinel/bot-management.sock \
--grpc-address 0.0.0.0:50051 \
--config /etc/sentinel/bot-management.yaml
Sentinel Configuration
agent "bot-management" {
socket "/var/run/sentinel/bot-management.sock"
timeout 50ms
events ["request_headers"]
}
route {
match { path-prefix "/" }
agents ["bot-management"]
upstream "backend"
}
Agent Configuration (YAML)
thresholds:
allow_threshold: 30 # Score below which to allow
block_threshold: 80 # Score above which to block
min_confidence: 0.5 # Minimum confidence to act
detection:
header_analysis: true
user_agent_validation: true
known_bot_lookup: true
behavioral_analysis: true
weights:
header: 0.20
user_agent: 0.25
known_bot: 0.35
behavioral: 0.20
allow_list:
search_engines: true # Allow Googlebot, Bingbot, etc.
social_media: true # Allow Facebook, Twitter crawlers
monitoring: true # Allow UptimeRobot, Pingdom, etc.
seo_tools: false # Block SEO crawlers by default
verify_identity: true # Verify bots via reverse DNS
challenge:
default_type: javascript
token_validity_seconds: 300
cookie_name: "_sentinel_bot_check"
behavioral:
max_sessions: 100000
session_timeout_seconds: 3600
rpm_threshold: 60 # Requests per minute threshold
min_requests_for_scoring: 5
Detection Methods
Header Analysis
Detects bot characteristics from HTTP headers:
| Signal | Score Impact | Description |
|---|---|---|
Missing Accept-Language | +15 | Browsers always send this |
Missing Accept-Encoding | +15 | Browsers always send this |
Missing sec-ch-ua (Chrome) | +20 | Chrome 89+ sends Client Hints |
| Automation headers | +30 | X-Selenium, X-Puppeteer, etc. |
Generic Accept (*/*) | +10 | Browsers send specific types |
User-Agent Analysis
Parses and validates User-Agent strings:
| Signal | Score Impact | Description |
|---|---|---|
| Bot keywords | +40 | bot, crawler, spider in UA |
| Outdated browser | +25 | Chrome < 90 (suspicious in 2026) |
| Impossible UA | +50 | Conflicting browser identifiers |
| Security scanner | +60 | sqlmap, nikto, nuclei, etc. |
| Missing UA | +30 | No User-Agent header |
Known Bot Database
Identifies and verifies known bots:
Good Bots (Verified):
- Googlebot (reverse DNS:
.googlebot.com) - Bingbot (reverse DNS:
.search.msn.com) - DuckDuckBot (IP range verification)
- Facebookbot, Twitterbot, LinkedInBot
- UptimeRobot, Pingdom, Datadog
Bad Patterns:
- sqlmap, nikto, nessus (security scanners)
- masscan, zgrab (port/service scanners)
- gobuster, dirbuster (directory scanners)
- nuclei, wfuzz (vulnerability scanners)
- hydra (brute forcer)
- scrapy, httrack (scrapers/copiers)
Behavioral Analysis
Tracks session patterns over time:
| Signal | Score Impact | Description |
|---|---|---|
| High request rate | +30 | >60 requests per minute |
| Regular timing | +20 | Suspiciously consistent intervals |
| Low path diversity | +15 | Hitting same paths repeatedly |
| No resource requests | +10 | Missing CSS/JS/image requests |
Decision Flow
Score ≤ 30 → ALLOW (add bot headers)
Score > 80 → BLOCK (403 Forbidden)
30 < Score ≤ 80 → CHALLENGE (JS/CAPTCHA)
For verified good bots (Googlebot, etc.), the request is immediately allowed regardless of other signals.
Response Headers
| Header | Description |
|---|---|
X-Bot-Score | Bot likelihood score (0-100) |
X-Bot-Category | Classification: human, search_engine, social_media, monitoring, malicious, unknown |
X-Bot-Confidence | Detection confidence (0.00-1.00) |
X-Bot-Verified | Verified bot name (e.g., “Googlebot”) |
X-Bot-Challenge | passed if challenge token validated |
Challenge System
When a request falls in the CHALLENGE range (30-80), the agent returns a challenge decision. Sentinel can be configured to:
- JavaScript Challenge: Require JS execution proof
- CAPTCHA Challenge: Redirect to CAPTCHA page
- Proof of Work: Require computational proof
Once passed, a signed cookie token allows subsequent requests through.
Test Examples
Browser Request (Low Score)
curl -i http://localhost:8080/api/data \
-H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
-H "Accept: text/html,application/xhtml+xml" \
-H "Accept-Language: en-US,en;q=0.9" \
-H "Accept-Encoding: gzip, deflate, br" \
-H "sec-ch-ua: \"Chromium\";v=\"120\""
Expected: X-Bot-Score: 0-20, X-Bot-Category: human
curl Request (Medium Score)
curl -i http://localhost:8080/api/data
Expected: X-Bot-Score: 40-60, likely CHALLENGE decision
Security Scanner (High Score)
curl -i http://localhost:8080/api/data \
-H "User-Agent: sqlmap/1.5"
Expected: X-Bot-Score: 95, BLOCK decision
Verified Googlebot (Allowed)
# From verified Googlebot IP with proper UA
curl -i http://localhost:8080/api/data \
-H "User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)"
Expected: X-Bot-Score: 0, X-Bot-Verified: Googlebot
Performance
- Latency: <5ms typical detection time
- Memory: ~50MB for 100k tracked sessions
- Throughput: >50k requests/second
Related Agents
| Agent | Integration |
|---|---|
| WAF | Combine with attack detection |
| Auth | Bot detection before authentication |
| AI Gateway | Protect AI endpoints from scraping |
Comparison with Other Solutions
| Feature | Bot Management | Cloudflare Bot | AWS WAF Bot |
|---|---|---|---|
| Self-hosted | Yes | No | No |
| Open source | Yes | No | No |
| Custom rules | Yes | Limited | Limited |
| Reverse DNS verification | Yes | Yes | No |
| Behavioral analysis | Yes | Yes | Limited |
| Challenge types | 3 | 1 | 1 |
| Latency | <5ms | 10-50ms | 10-50ms |