Health API
Prefix: /api/health | Tag: health
Cluster health scores, KPIs, enrichment metadata, and score refresh. Data comes from byoc_agent.health_scorer and the unified scorer. Results are cached for 60 seconds.
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/health/scores | All cluster health scores (sorted worst-first) |
| GET | /api/health/kpis | Summary KPI metrics |
| GET | /api/health/enrichment | Cluster metadata (names, regions, types) |
| POST | /api/health/refresh | Recompute scores from live metrics |
GET /api/health/scores
Returns cluster health scores enriched with cluster metadata and risk analysis.
Response:
{
"scores": [
{
"cluster_id": "uuid",
"cluster_name": "prod-analytics",
"customer_name": "Acme Corp",
"account_id": "abc123",
"overall_score": 42.5,
"classification": "Critical",
"top_risk": "Compaction: 35",
"cluster_type": "ELASTIC",
"region": "US East (N. Virginia) us-east-1",
"dimension_scores": [
{"name": "Compaction", "score": 35.0, "value": 8500.0, "weight": 0.2, "description": "..."}
],
"risk_level": "Critical",
"risk_reasons": ["High compaction score"],
"suggested_actions": ["Investigate tablet compaction backlog"],
"risk_metrics": {}
}
],
"source": "live",
"generated_at": "2026-03-25T10:00:00"
}
GET /api/health/kpis
Response:
{
"kpis": {
"total_clusters": 172,
"healthy_count": 150,
"warning_count": 15,
"critical_count": 7,
"avg_score": 78.3,
"avg_p99_latency": 245.6
},
"source": "live",
"generated_at": "2026-03-25T10:00:00"
}
GET /api/health/enrichment
Returns cluster metadata from byoc.clusters joined with byoc.region_details.
Response:
{
"clusters": [
{"cluster_id": "uuid", "cluster_name": "prod-analytics", "cluster_type": "ELASTIC", "region_name": "..."}
],
"source": "live",
"generated_at": "..."
}
POST /api/health/refresh
Triggers compute_cluster_scores() and persist_scores(). Invalidates the cache. Returns the newly computed scores in the same shape as GET /scores with an added "refreshed": true field.
Returns 503 if the refresh fails.