Skip to main content

Health API

Prefix: /api/health | Tag: health

Cluster health scores, KPIs, enrichment metadata, and score refresh. Data comes from byoc_agent.health_scorer and the unified scorer. Results are cached for 60 seconds.

Endpoints

MethodPathDescription
GET/api/health/scoresAll cluster health scores (sorted worst-first)
GET/api/health/kpisSummary KPI metrics
GET/api/health/enrichmentCluster metadata (names, regions, types)
POST/api/health/refreshRecompute scores from live metrics

GET /api/health/scores

Returns cluster health scores enriched with cluster metadata and risk analysis.

Response:

{
"scores": [
{
"cluster_id": "uuid",
"cluster_name": "prod-analytics",
"customer_name": "Acme Corp",
"account_id": "abc123",
"overall_score": 42.5,
"classification": "Critical",
"top_risk": "Compaction: 35",
"cluster_type": "ELASTIC",
"region": "US East (N. Virginia) us-east-1",
"dimension_scores": [
{"name": "Compaction", "score": 35.0, "value": 8500.0, "weight": 0.2, "description": "..."}
],
"risk_level": "Critical",
"risk_reasons": ["High compaction score"],
"suggested_actions": ["Investigate tablet compaction backlog"],
"risk_metrics": {}
}
],
"source": "live",
"generated_at": "2026-03-25T10:00:00"
}

GET /api/health/kpis

Response:

{
"kpis": {
"total_clusters": 172,
"healthy_count": 150,
"warning_count": 15,
"critical_count": 7,
"avg_score": 78.3,
"avg_p99_latency": 245.6
},
"source": "live",
"generated_at": "2026-03-25T10:00:00"
}

GET /api/health/enrichment

Returns cluster metadata from byoc.clusters joined with byoc.region_details.

Response:

{
"clusters": [
{"cluster_id": "uuid", "cluster_name": "prod-analytics", "cluster_type": "ELASTIC", "region_name": "..."}
],
"source": "live",
"generated_at": "..."
}

POST /api/health/refresh

Triggers compute_cluster_scores() and persist_scores(). Invalidates the cache. Returns the newly computed scores in the same shape as GET /scores with an added "refreshed": true field.

Returns 503 if the refresh fails.