Skip to main content

Sentinel -- Change Detection Layer

Source: byoc_agent/sentinel.py

The Sentinel is a pure Python change-detection layer that runs after every risk/score refresh (every 15 minutes). It compares current state against previous state and creates agent_tasks when triggers fire.

No LLM calls. Zero cost.

How It Works

refresh_unified.py / refresh_scores.py / refresh_risk.py

sentinel.check_triggers()

Run 4 trigger checks independently (one failure doesn't block others)

Deduplicate tasks by cluster_id (keep highest priority)

INSERT into alerts.agent_tasks

Trigger Checks

1. New Critical Cluster

Detects clusters whose risk_level is Critical in cluster_risk_snapshots but have no investigation task created in the last 4 hours.

FieldValue
Priority1 (highest)
Contextrisk_level, risk_reasons, account_name
Dedup window4 hours

2. Score Cliff (>15pt drop)

Detects clusters whose health score dropped more than SCORE_DROP_THRESHOLD (15 points) between the two most recent scoring snapshots.

FieldValue
Priority2
Contextprevious_score, current_score, drop amount
Sourcecluster_health_scores (two most recent computed_at timestamps)

3. Alert Storm (>5 alerts/hour)

Detects clusters with more than ALERT_STORM_COUNT (5) firing alerts in the last ALERT_STORM_WINDOW_HOURS (1 hour).

FieldValue
Priority1
Contextalert_count, alert_names (GROUP_CONCAT)
Sourcelark_alerts where alert_status = 'Firing'

4. Tier A/S Customer with Warning or Critical

Any Tier A or S customer with a Warning or Critical risk level always gets an investigation task, if not already queued in the last 4 hours. Joins cluster_risk_snapshots with dim_customer_profile.

FieldValue
Priority1 (Critical) or 2 (Warning)
Contextrisk_level, risk_reasons, customer_tier, customer_name
Dedup window4 hours

Configuration Constants

ConstantValueDescription
SCORE_DROP_THRESHOLD15Minimum point drop to trigger
ALERT_STORM_COUNT5Minimum firing alerts per cluster per hour
ALERT_STORM_WINDOW_HOURS1Time window for alert storm detection

Task Structure

Each task written to agent_tasks contains:

ColumnDescription
task_idUUID (truncated to 32 chars)
created_atTimestamp
task_typeAlways "investigate"
priority1 (highest) or 2
cluster_idTarget cluster UUID
cluster_nameDisplay name
customer_nameCustomer display name
customer_tierA, B, or S
trigger_reasonHuman-readable reason string
context_jsonJSON with trigger-specific details
statusAlways "pending" at creation
assigned_agentAlways "investigator"

Deduplication

Before inserting, tasks are deduplicated by cluster_id. If the same cluster triggered multiple checks, only the task with the highest priority (lowest number) is kept.

Fault Tolerance

Each trigger check runs independently in a try/except block. If one check fails (e.g., database timeout), the others still execute. Failures are logged but do not raise exceptions.

Usage

from byoc_agent.sentinel import check_triggers

# Called by refresh scripts after each scoring run
count = check_triggers()
# Returns: number of new agent_tasks created