Critical Clusters API
Prefix: /api/critical-clusters | Tag: critical-clusters
Returns the top problematic clusters ranked by a composite severity score. Combines risk snapshots, health scores, alert data, and cluster infrastructure metadata.
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/critical-clusters | Top 15 problematic clusters |
GET /api/critical-clusters
Builds a composite ranking from:
- Risk snapshots (
alerts.cluster_risk_snapshots) -- Warning/Critical only - Health scores (
alerts.cluster_health_scores) -- latest snapshot - Alert counts (
alerts.lark_alerts) -- last 14 days, Firing only - Cluster infra (
byoc.clusters+ related tables)
Severity score = (risk_reasons * 10) + alert_count + health_penalty
Response:
{
"clusters": [
{
"cluster_id": "uuid",
"cluster_name": "prod-analytics",
"account_name": "Acme Corp",
"account_id": "abc123",
"region": "US East (N. Virginia) us-east-1",
"sr_version": "3.4.7-ee",
"email": "admin@acme.com",
"risk_level": "Critical",
"risk_reasons": ["High compaction score", "Disk > 85%"],
"suggested_actions": ["Add capacity", "Investigate compaction"],
"severity_score": 45.0,
"metrics": {
"compaction_score": 8500.0,
"disk_used_pct": 88.5,
"query_errors_7d": 350,
"query_error_pct": 1.2,
"qps": 45.2,
"node_count": 5
},
"health": {
"overall_score": 42.5,
"classification": "Critical",
"dimensions": {}
},
"alerts": [
{"alert_name": "HighCompaction", "count": 12, "first": "...", "last": "..."}
],
"alert_total": 25
}
],
"generated_at": "2026-03-25 10:00:00",
"total_critical": 5,
"total_warning": 10
}