Bot Management

Official Stable

Comprehensive bot detection with multi-signal analysis, known bot verification, and behavioral tracking.

Version: 0.2.0 Author: Sentinel Core Team License: Apache-2.0 Protocol: vv2 View Source

Quick Install

Cargo
cargo install sentinel-agent-bot-management

Overview

A comprehensive bot detection and management agent for Sentinel. Analyzes multiple signals to classify traffic as human, good bot (search engines, monitors), or bad bot (scrapers, attackers), returning a bot score with configurable ALLOW/BLOCK/CHALLENGE decisions.

Protocol v2 Features

As of v0.2.0, the Bot Management agent supports protocol v2 with:

  • Capability negotiation: Reports supported features during handshake
  • Health reporting: Exposes health status for monitoring
  • Metrics export: Counter metrics for bot detections (allowed, blocked, challenged)
  • gRPC transport: Optional high-performance gRPC transport via --grpc-address
  • Lifecycle hooks: Graceful shutdown and drain handling

Features

  • Multi-Signal Detection: Combines header analysis, User-Agent validation, known bot lookup, and behavioral patterns
  • Known Bot Database: Identifies legitimate bots (Googlebot, Bingbot, etc.) with reverse DNS verification
  • Bad Bot Patterns: Detects security scanners (sqlmap, nikto, nuclei) and scrapers
  • Behavioral Analysis: Tracks session patterns, request rates, and timing regularity
  • Challenge System: HMAC-signed challenge tokens for suspicious traffic
  • Bot Score: 0-100 score with confidence level and category classification
  • Configurable Thresholds: Tune ALLOW/CHALLENGE/BLOCK decision boundaries

Installation

The easiest way to install this agent is via the Sentinel bundle command:

# Install just this agent
sentinel bundle install bot-management

# Or install all available agents
sentinel bundle install --all

The bundle command automatically downloads the correct binary for your platform and places it in ~/.sentinel/agents/.

Using Cargo

cargo install sentinel-agent-bot-management

From Source

git clone https://github.com/raskell-io/sentinel-agent-bot-management
cd sentinel-agent-bot-management
cargo build --release

Configuration

Command Line

sentinel-agent-bot-management \
    --socket /var/run/sentinel/bot-management.sock \
    --grpc-address 0.0.0.0:50051 \
    --config /etc/sentinel/bot-management.yaml

Sentinel Configuration

agent "bot-management" {
    socket "/var/run/sentinel/bot-management.sock"
    timeout 50ms
    events ["request_headers"]
}

route {
    match { path-prefix "/" }
    agents ["bot-management"]
    upstream "backend"
}

Agent Configuration (YAML)

thresholds:
  allow_threshold: 30      # Score below which to allow
  block_threshold: 80      # Score above which to block
  min_confidence: 0.5      # Minimum confidence to act

detection:
  header_analysis: true
  user_agent_validation: true
  known_bot_lookup: true
  behavioral_analysis: true
  weights:
    header: 0.20
    user_agent: 0.25
    known_bot: 0.35
    behavioral: 0.20

allow_list:
  search_engines: true     # Allow Googlebot, Bingbot, etc.
  social_media: true       # Allow Facebook, Twitter crawlers
  monitoring: true         # Allow UptimeRobot, Pingdom, etc.
  seo_tools: false         # Block SEO crawlers by default
  verify_identity: true    # Verify bots via reverse DNS

challenge:
  default_type: javascript
  token_validity_seconds: 300
  cookie_name: "_sentinel_bot_check"

behavioral:
  max_sessions: 100000
  session_timeout_seconds: 3600
  rpm_threshold: 60        # Requests per minute threshold
  min_requests_for_scoring: 5

Detection Methods

Header Analysis

Detects bot characteristics from HTTP headers:

SignalScore ImpactDescription
Missing Accept-Language+15Browsers always send this
Missing Accept-Encoding+15Browsers always send this
Missing sec-ch-ua (Chrome)+20Chrome 89+ sends Client Hints
Automation headers+30X-Selenium, X-Puppeteer, etc.
Generic Accept (*/*)+10Browsers send specific types

User-Agent Analysis

Parses and validates User-Agent strings:

SignalScore ImpactDescription
Bot keywords+40bot, crawler, spider in UA
Outdated browser+25Chrome < 90 (suspicious in 2026)
Impossible UA+50Conflicting browser identifiers
Security scanner+60sqlmap, nikto, nuclei, etc.
Missing UA+30No User-Agent header

Known Bot Database

Identifies and verifies known bots:

Good Bots (Verified):

  • Googlebot (reverse DNS: .googlebot.com)
  • Bingbot (reverse DNS: .search.msn.com)
  • DuckDuckBot (IP range verification)
  • Facebookbot, Twitterbot, LinkedInBot
  • UptimeRobot, Pingdom, Datadog

Bad Patterns:

  • sqlmap, nikto, nessus (security scanners)
  • masscan, zgrab (port/service scanners)
  • gobuster, dirbuster (directory scanners)
  • nuclei, wfuzz (vulnerability scanners)
  • hydra (brute forcer)
  • scrapy, httrack (scrapers/copiers)

Behavioral Analysis

Tracks session patterns over time:

SignalScore ImpactDescription
High request rate+30>60 requests per minute
Regular timing+20Suspiciously consistent intervals
Low path diversity+15Hitting same paths repeatedly
No resource requests+10Missing CSS/JS/image requests

Decision Flow

Score ≤ 30  →  ALLOW (add bot headers)
Score > 80  →  BLOCK (403 Forbidden)
30 < Score ≤ 80  →  CHALLENGE (JS/CAPTCHA)

For verified good bots (Googlebot, etc.), the request is immediately allowed regardless of other signals.

Response Headers

HeaderDescription
X-Bot-ScoreBot likelihood score (0-100)
X-Bot-CategoryClassification: human, search_engine, social_media, monitoring, malicious, unknown
X-Bot-ConfidenceDetection confidence (0.00-1.00)
X-Bot-VerifiedVerified bot name (e.g., “Googlebot”)
X-Bot-Challengepassed if challenge token validated

Challenge System

When a request falls in the CHALLENGE range (30-80), the agent returns a challenge decision. Sentinel can be configured to:

  1. JavaScript Challenge: Require JS execution proof
  2. CAPTCHA Challenge: Redirect to CAPTCHA page
  3. Proof of Work: Require computational proof

Once passed, a signed cookie token allows subsequent requests through.

Test Examples

Browser Request (Low Score)

curl -i http://localhost:8080/api/data \
  -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
  -H "Accept: text/html,application/xhtml+xml" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -H "Accept-Encoding: gzip, deflate, br" \
  -H "sec-ch-ua: \"Chromium\";v=\"120\""

Expected: X-Bot-Score: 0-20, X-Bot-Category: human

curl Request (Medium Score)

curl -i http://localhost:8080/api/data

Expected: X-Bot-Score: 40-60, likely CHALLENGE decision

Security Scanner (High Score)

curl -i http://localhost:8080/api/data \
  -H "User-Agent: sqlmap/1.5"

Expected: X-Bot-Score: 95, BLOCK decision

Verified Googlebot (Allowed)

# From verified Googlebot IP with proper UA
curl -i http://localhost:8080/api/data \
  -H "User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html)"

Expected: X-Bot-Score: 0, X-Bot-Verified: Googlebot

Performance

  • Latency: <5ms typical detection time
  • Memory: ~50MB for 100k tracked sessions
  • Throughput: >50k requests/second
AgentIntegration
WAFCombine with attack detection
AuthBot detection before authentication
AI GatewayProtect AI endpoints from scraping

Comparison with Other Solutions

FeatureBot ManagementCloudflare BotAWS WAF Bot
Self-hostedYesNoNo
Open sourceYesNoNo
Custom rulesYesLimitedLimited
Reverse DNS verificationYesYesNo
Behavioral analysisYesYesLimited
Challenge types311
Latency<5ms10-50ms10-50ms