Skip to main content

UnifiedScorer -- Merged Scoring Pipeline

Source: byoc_agent/unified_scorer.py Rules: byoc_agent/unified_rules.yaml (v1)

The UnifiedScorer merges health scoring and risk analysis into a single weighted pipeline. It replaces running cluster_risk_analyzer.py and health_scorer.py separately, providing one authoritative score per cluster.

Formula

Overall Score = metrics_score * 0.60 + alerts_score * 0.25 + tier_adjusted_score * 0.15

Component Breakdown

ComponentWeightDescription
metrics_score60%Weighted average of 15 metric dimensions (each scored 0-100)
alerts_score25%Score based on firing alert count in 7 days
tier_adjusted_score15%Amplifies issues for important customers

Alert Scoring Bands

Max Alerts (7d)Score
0100
380
1050
2020
20+0

Tier Multipliers

Tier multipliers amplify the penalty for important customers. A multiplier of 1.3 means a Tier A customer with a base gap of 20 points gets a 26-point gap instead.

TierMultiplier
S1.5
A1.3
B1.0 (default)

The tier-adjusted score formula:

base = (metrics * 0.60 + alerts * 0.25) / 0.85
tier_adjusted = max(0, 100 - (100 - base) * multiplier)

Classification

ClassificationMin Score
Healthy80
Warning50
Critical0 (below 50)

Pipeline Steps

  1. Fetch metrics -- Gauge metrics, counter metrics (MAX-MIN delta), data size 7d ago, cluster enrichment data.
  2. Fetch alerts -- Firing alert counts per cluster from lark_alerts (7-day window).
  3. Fetch customer tiers -- Tier mapping from dim_customer_profile.
  4. Score each cluster -- 15 metric dimensions scored via linear interpolation, then weighted.
  5. Classify -- Apply classification bands.
  6. Identify AI investigation candidates -- Bottom 20 clusters or score < 30, whichever is fewer.

Metric Dimensions (15 total)

Storage & Compaction

DimensionWeightGreenYellowRed
compaction_score0.105001,0005,000
disk_used_pct0.0870%80%90%
data_growth_pct0.0550%100%300%

Query Performance

DimensionWeightGreenYellowRed
query_errors_7d0.105,00010,00050,000
query_timeouts_7d0.062005005,000
slow_queries_7d0.0620,00050,000500,000
internal_errors_7d0.085001,00010,000

Ingestion Health

DimensionWeightGreenYellowRed
txn_failures_7d0.075,00010,00050,000
txn_rejects_7d0.07501005,000

Resource Pressure

DimensionWeightGreenYellowRed
memory_available_pct0.0840%20%10%
fe_heap_used_pct0.0675%85%95%
be_process_mem_gb0.0680 GB120 GB180 GB

FE & Metadata Health

DimensionWeightGreenYellowRed
fe_journal_write_latency_ms0.05500 ms1,000 ms10,000 ms
fe_connection_total0.04300500900

Process Health

DimensionWeightGreenYellowRed
be_fd_usage0.0420,00030,00050,000

UnifiedClusterScore Dataclass

Key fields:

FieldTypeDescription
overall_scorefloatFinal composite score (0-100)
metrics_scorefloatMetrics component (0-100)
alerts_scorefloatAlerts component (0-100)
tier_adjusted_scorefloatTier-adjusted component (0-100)
classificationstrHealthy / Warning / Critical
customer_tierstrS, A, or B
dimension_scoresdictPer-dimension details (score, value, weight, status)
risk_reasonslist[str]Human-readable risk explanations
suggested_actionslist[str]Remediation steps
firing_alert_count_7dintNumber of firing alerts in 7 days

AI Investigation Candidates

get_investigation_candidates() returns clusters that should be investigated by the AI agent, using whichever yields fewer results:

  • Clusters with score below the threshold (default: 30)
  • Bottom N clusters (default: 20)

Persistence

Scores are stored as snapshots in alerts.cluster_unified_scores (DUPLICATE KEY table on cluster_id, computed_at). Each run inserts a new snapshot.

Usage

from byoc_agent.unified_scorer import compute_unified_scores, persist_unified_scores

scores = compute_unified_scores()
persist_unified_scores(scores)

# Get investigation candidates
from byoc_agent.unified_scorer import get_investigation_candidates
candidates = get_investigation_candidates(scores)

# Load latest from DB
from byoc_agent.unified_scorer import load_latest_unified_scores
latest = load_latest_unified_scores()