Skip to main content

Critical Clusters API

Prefix: /api/critical-clusters | Tag: critical-clusters

Returns the top problematic clusters ranked by a composite severity score. Combines risk snapshots, health scores, alert data, and cluster infrastructure metadata.

Endpoints

MethodPathDescription
GET/api/critical-clustersTop 15 problematic clusters

GET /api/critical-clusters

Builds a composite ranking from:

  1. Risk snapshots (alerts.cluster_risk_snapshots) -- Warning/Critical only
  2. Health scores (alerts.cluster_health_scores) -- latest snapshot
  3. Alert counts (alerts.lark_alerts) -- last 14 days, Firing only
  4. Cluster infra (byoc.clusters + related tables)

Severity score = (risk_reasons * 10) + alert_count + health_penalty

Response:

{
"clusters": [
{
"cluster_id": "uuid",
"cluster_name": "prod-analytics",
"account_name": "Acme Corp",
"account_id": "abc123",
"region": "US East (N. Virginia) us-east-1",
"sr_version": "3.4.7-ee",
"email": "admin@acme.com",
"risk_level": "Critical",
"risk_reasons": ["High compaction score", "Disk > 85%"],
"suggested_actions": ["Add capacity", "Investigate compaction"],
"severity_score": 45.0,
"metrics": {
"compaction_score": 8500.0,
"disk_used_pct": 88.5,
"query_errors_7d": 350,
"query_error_pct": 1.2,
"qps": 45.2,
"node_count": 5
},
"health": {
"overall_score": 42.5,
"classification": "Critical",
"dimensions": {}
},
"alerts": [
{"alert_name": "HighCompaction", "count": 12, "first": "...", "last": "..."}
],
"alert_total": 25
}
],
"generated_at": "2026-03-25 10:00:00",
"total_critical": 5,
"total_warning": 10
}