LocalAnalyst -- LLM-Free Analysis Engine
Source: byoc_agent/local_analyst.py
The LocalAnalyst runs pre-built analyses without needing an LLM API. Each quick-action button in the UI maps to a function that queries live StarRocks data and returns a formatted markdown report.
Available Analyses
Cluster Health Overview
Function: cluster_health_overview()
Comprehensive health overview of all clusters over the last 7 days. Queries compaction scores (gauge, from MV), query latency (histogram, from v6 view), error counts, and total queries.
Output includes:
- Summary counts: X clusters monitored -- Y Healthy, Z Warning, W Critical
- Per-cluster table with: health status, avg/max compaction, avg latency, errors (with rate), total queries
- Recommendations for Warning and Critical clusters
Health classification:
- Critical: max compaction > 5,000 or avg latency > 1,000ms
- Warning: max compaction > 2,000 or avg latency > 200ms
- Healthy: below both thresholds
Query Latency Trends
Function: query_latency_trends()
Analyzes query latency trends over 7 days using window functions (FIRST_VALUE / LAST_VALUE) to compute percent change from start to end of the period.
Output includes:
- Table with cluster_id, first_val, last_val, pct_change
- Interpretation guidelines (positive = worsening, >20% = flag for review)
Clusters Near Breaking Point
Function: clusters_near_breaking_point()
Identifies clusters approaching their limits across two dimensions:
- Resource thresholds -- Clusters with max compaction score > 1,000, JVM heap, or BE process memory exceeding limits (gauge metrics from MV).
- High latency hours -- Clusters with query latency > 200ms (histogram from v6 view), counted as hours.
Output includes:
- Resources exceeding thresholds table
- Hours with high latency table
- Action items for compaction backlog, JVM heap, and latency patterns
Alert Summary
Function: alert_summary()
Notification-style summary from real Lark alerts.
Output includes:
- Recent Lark alerts (7d) grouped by alert_name and status
- Top firing alerts by cluster
- Error spikes from metrics (days with >100 errors)
Usage Patterns
Function: usage_patterns()
Analyzes usage patterns across the fleet.
Output includes:
- Cluster activity ranking (total queries, avg QPS hourly)
- Peak usage hours (UTC)
- Low-activity clusters (avg concurrent queries < 5) -- candidates for downsizing
Data Source Strategy
The LocalAnalyst uses two different data sources based on metric type:
| Metric Type | Source | Reason |
|---|---|---|
| Gauge (point-in-time) | metrics.amv_hourly_snapshots_v1 | MAX/AVG are meaningful for snapshot values |
| Counter/Histogram (deltas) | metrics.metrics_hourly_view_v6 | Provides pre-computed hourly deltas |
Function Registry
The ANALYSIS_FUNCTIONS dict maps UI button labels to functions:
ANALYSIS_FUNCTIONS = {
"Cluster Health Overview": cluster_health_overview,
"Query Latency Trends": query_latency_trends,
"Clusters Near Breaking Point": clusters_near_breaking_point,
"Alert Summary": alert_summary,
"Usage Patterns": usage_patterns,
}
Usage
from byoc_agent.local_analyst import ANALYSIS_FUNCTIONS
# Run a specific analysis
report_md = ANALYSIS_FUNCTIONS["Cluster Health Overview"]()
print(report_md)
# Or call directly
from byoc_agent.local_analyst import cluster_health_overview
report = cluster_health_overview()