Sentinel -- Change Detection Layer
Source: byoc_agent/sentinel.py
The Sentinel is a pure Python change-detection layer that runs after every risk/score refresh (every 15 minutes). It compares current state against previous state and creates agent_tasks when triggers fire.
No LLM calls. Zero cost.
How It Works
refresh_unified.py / refresh_scores.py / refresh_risk.py
↓
sentinel.check_triggers()
↓
Run 4 trigger checks independently (one failure doesn't block others)
↓
Deduplicate tasks by cluster_id (keep highest priority)
↓
INSERT into alerts.agent_tasks
Trigger Checks
1. New Critical Cluster
Detects clusters whose risk_level is Critical in cluster_risk_snapshots but have no investigation task created in the last 4 hours.
| Field | Value |
|---|---|
| Priority | 1 (highest) |
| Context | risk_level, risk_reasons, account_name |
| Dedup window | 4 hours |
2. Score Cliff (>15pt drop)
Detects clusters whose health score dropped more than SCORE_DROP_THRESHOLD (15 points) between the two most recent scoring snapshots.
| Field | Value |
|---|---|
| Priority | 2 |
| Context | previous_score, current_score, drop amount |
| Source | cluster_health_scores (two most recent computed_at timestamps) |
3. Alert Storm (>5 alerts/hour)
Detects clusters with more than ALERT_STORM_COUNT (5) firing alerts in the last ALERT_STORM_WINDOW_HOURS (1 hour).
| Field | Value |
|---|---|
| Priority | 1 |
| Context | alert_count, alert_names (GROUP_CONCAT) |
| Source | lark_alerts where alert_status = 'Firing' |
4. Tier A/S Customer with Warning or Critical
Any Tier A or S customer with a Warning or Critical risk level always gets an investigation task, if not already queued in the last 4 hours. Joins cluster_risk_snapshots with dim_customer_profile.
| Field | Value |
|---|---|
| Priority | 1 (Critical) or 2 (Warning) |
| Context | risk_level, risk_reasons, customer_tier, customer_name |
| Dedup window | 4 hours |
Configuration Constants
| Constant | Value | Description |
|---|---|---|
SCORE_DROP_THRESHOLD | 15 | Minimum point drop to trigger |
ALERT_STORM_COUNT | 5 | Minimum firing alerts per cluster per hour |
ALERT_STORM_WINDOW_HOURS | 1 | Time window for alert storm detection |
Task Structure
Each task written to agent_tasks contains:
| Column | Description |
|---|---|
task_id | UUID (truncated to 32 chars) |
created_at | Timestamp |
task_type | Always "investigate" |
priority | 1 (highest) or 2 |
cluster_id | Target cluster UUID |
cluster_name | Display name |
customer_name | Customer display name |
customer_tier | A, B, or S |
trigger_reason | Human-readable reason string |
context_json | JSON with trigger-specific details |
status | Always "pending" at creation |
assigned_agent | Always "investigator" |
Deduplication
Before inserting, tasks are deduplicated by cluster_id. If the same cluster triggered multiple checks, only the task with the highest priority (lowest number) is kept.
Fault Tolerance
Each trigger check runs independently in a try/except block. If one check fails (e.g., database timeout), the others still execute. Failures are logged but do not raise exceptions.
Usage
from byoc_agent.sentinel import check_triggers
# Called by refresh scripts after each scoring run
count = check_triggers()
# Returns: number of new agent_tasks created