Sentinel -- Change Detection Layer

Source: byoc_agent/sentinel.py

The Sentinel is a pure Python change-detection layer that runs after every risk/score refresh (every 15 minutes). It compares current state against previous state and creates agent_tasks when triggers fire.

No LLM calls. Zero cost.

How It Works

refresh_unified.py / refresh_scores.py / refresh_risk.py
       ↓
sentinel.check_triggers()
       ↓
Run 4 trigger checks independently (one failure doesn't block others)
       ↓
Deduplicate tasks by cluster_id (keep highest priority)
       ↓
INSERT into alerts.agent_tasks

Trigger Checks

1. New Critical Cluster

Detects clusters whose risk_level is Critical in cluster_risk_snapshots but have no investigation task created in the last 4 hours.

Field	Value
Priority	1 (highest)
Context	risk_level, risk_reasons, account_name
Dedup window	4 hours

2. Score Cliff (>15pt drop)

Detects clusters whose health score dropped more than SCORE_DROP_THRESHOLD (15 points) between the two most recent scoring snapshots.

Field	Value
Priority	2
Context	previous_score, current_score, drop amount
Source	`cluster_health_scores` (two most recent `computed_at` timestamps)

3. Alert Storm (>5 alerts/hour)

Detects clusters with more than ALERT_STORM_COUNT (5) firing alerts in the last ALERT_STORM_WINDOW_HOURS (1 hour).

Field	Value
Priority	1
Context	alert_count, alert_names (GROUP_CONCAT)
Source	`lark_alerts` where `alert_status = 'Firing'`

4. Tier A/S Customer with Warning or Critical

Any Tier A or S customer with a Warning or Critical risk level always gets an investigation task, if not already queued in the last 4 hours. Joins cluster_risk_snapshots with dim_customer_profile.

Field	Value
Priority	1 (Critical) or 2 (Warning)
Context	risk_level, risk_reasons, customer_tier, customer_name
Dedup window	4 hours

Configuration Constants

Constant	Value	Description
`SCORE_DROP_THRESHOLD`	15	Minimum point drop to trigger
`ALERT_STORM_COUNT`	5	Minimum firing alerts per cluster per hour
`ALERT_STORM_WINDOW_HOURS`	1	Time window for alert storm detection

Task Structure

Each task written to agent_tasks contains:

Column	Description
`task_id`	UUID (truncated to 32 chars)
`created_at`	Timestamp
`task_type`	Always `"investigate"`
`priority`	1 (highest) or 2
`cluster_id`	Target cluster UUID
`cluster_name`	Display name
`customer_name`	Customer display name
`customer_tier`	A, B, or S
`trigger_reason`	Human-readable reason string
`context_json`	JSON with trigger-specific details
`status`	Always `"pending"` at creation
`assigned_agent`	Always `"investigator"`

Deduplication

Before inserting, tasks are deduplicated by cluster_id. If the same cluster triggered multiple checks, only the task with the highest priority (lowest number) is kept.

Fault Tolerance

Each trigger check runs independently in a try/except block. If one check fails (e.g., database timeout), the others still execute. Failures are logged but do not raise exceptions.

Usage

from byoc_agent.sentinel import check_triggers

# Called by refresh scripts after each scoring run
count = check_triggers()
# Returns: number of new agent_tasks created

How It Works​

Trigger Checks​

1. New Critical Cluster​

2. Score Cliff (>15pt drop)​

3. Alert Storm (>5 alerts/hour)​

4. Tier A/S Customer with Warning or Critical​

Configuration Constants​

Task Structure​

Deduplication​

Fault Tolerance​

Usage​