UnifiedScorer -- Merged Scoring Pipeline
Source: byoc_agent/unified_scorer.py
Rules: byoc_agent/unified_rules.yaml (v1)
The UnifiedScorer merges health scoring and risk analysis into a single weighted pipeline. It replaces running cluster_risk_analyzer.py and health_scorer.py separately, providing one authoritative score per cluster.
Formula
Overall Score = metrics_score * 0.60 + alerts_score * 0.25 + tier_adjusted_score * 0.15
Component Breakdown
| Component | Weight | Description |
|---|---|---|
metrics_score | 60% | Weighted average of 15 metric dimensions (each scored 0-100) |
alerts_score | 25% | Score based on firing alert count in 7 days |
tier_adjusted_score | 15% | Amplifies issues for important customers |
Alert Scoring Bands
| Max Alerts (7d) | Score |
|---|---|
| 0 | 100 |
| 3 | 80 |
| 10 | 50 |
| 20 | 20 |
| 20+ | 0 |
Tier Multipliers
Tier multipliers amplify the penalty for important customers. A multiplier of 1.3 means a Tier A customer with a base gap of 20 points gets a 26-point gap instead.
| Tier | Multiplier |
|---|---|
| S | 1.5 |
| A | 1.3 |
| B | 1.0 (default) |
The tier-adjusted score formula:
base = (metrics * 0.60 + alerts * 0.25) / 0.85
tier_adjusted = max(0, 100 - (100 - base) * multiplier)
Classification
| Classification | Min Score |
|---|---|
| Healthy | 80 |
| Warning | 50 |
| Critical | 0 (below 50) |
Pipeline Steps
- Fetch metrics -- Gauge metrics, counter metrics (MAX-MIN delta), data size 7d ago, cluster enrichment data.
- Fetch alerts -- Firing alert counts per cluster from
lark_alerts(7-day window). - Fetch customer tiers -- Tier mapping from
dim_customer_profile. - Score each cluster -- 15 metric dimensions scored via linear interpolation, then weighted.
- Classify -- Apply classification bands.
- Identify AI investigation candidates -- Bottom 20 clusters or score < 30, whichever is fewer.
Metric Dimensions (15 total)
Storage & Compaction
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| compaction_score | 0.10 | 500 | 1,000 | 5,000 |
| disk_used_pct | 0.08 | 70% | 80% | 90% |
| data_growth_pct | 0.05 | 50% | 100% | 300% |
Query Performance
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| query_errors_7d | 0.10 | 5,000 | 10,000 | 50,000 |
| query_timeouts_7d | 0.06 | 200 | 500 | 5,000 |
| slow_queries_7d | 0.06 | 20,000 | 50,000 | 500,000 |
| internal_errors_7d | 0.08 | 500 | 1,000 | 10,000 |
Ingestion Health
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| txn_failures_7d | 0.07 | 5,000 | 10,000 | 50,000 |
| txn_rejects_7d | 0.07 | 50 | 100 | 5,000 |
Resource Pressure
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| memory_available_pct | 0.08 | 40% | 20% | 10% |
| fe_heap_used_pct | 0.06 | 75% | 85% | 95% |
| be_process_mem_gb | 0.06 | 80 GB | 120 GB | 180 GB |
FE & Metadata Health
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| fe_journal_write_latency_ms | 0.05 | 500 ms | 1,000 ms | 10,000 ms |
| fe_connection_total | 0.04 | 300 | 500 | 900 |
Process Health
| Dimension | Weight | Green | Yellow | Red |
|---|---|---|---|---|
| be_fd_usage | 0.04 | 20,000 | 30,000 | 50,000 |
UnifiedClusterScore Dataclass
Key fields:
| Field | Type | Description |
|---|---|---|
overall_score | float | Final composite score (0-100) |
metrics_score | float | Metrics component (0-100) |
alerts_score | float | Alerts component (0-100) |
tier_adjusted_score | float | Tier-adjusted component (0-100) |
classification | str | Healthy / Warning / Critical |
customer_tier | str | S, A, or B |
dimension_scores | dict | Per-dimension details (score, value, weight, status) |
risk_reasons | list[str] | Human-readable risk explanations |
suggested_actions | list[str] | Remediation steps |
firing_alert_count_7d | int | Number of firing alerts in 7 days |
AI Investigation Candidates
get_investigation_candidates() returns clusters that should be investigated by the AI agent, using whichever yields fewer results:
- Clusters with score below the threshold (default: 30)
- Bottom N clusters (default: 20)
Persistence
Scores are stored as snapshots in alerts.cluster_unified_scores (DUPLICATE KEY table on cluster_id, computed_at). Each run inserts a new snapshot.
Usage
from byoc_agent.unified_scorer import compute_unified_scores, persist_unified_scores
scores = compute_unified_scores()
persist_unified_scores(scores)
# Get investigation candidates
from byoc_agent.unified_scorer import get_investigation_candidates
candidates = get_investigation_candidates(scores)
# Load latest from DB
from byoc_agent.unified_scorer import load_latest_unified_scores
latest = load_latest_unified_scores()