Skip to main content

LocalAnalyst -- LLM-Free Analysis Engine

Source: byoc_agent/local_analyst.py

The LocalAnalyst runs pre-built analyses without needing an LLM API. Each quick-action button in the UI maps to a function that queries live StarRocks data and returns a formatted markdown report.

Available Analyses

Cluster Health Overview

Function: cluster_health_overview()

Comprehensive health overview of all clusters over the last 7 days. Queries compaction scores (gauge, from MV), query latency (histogram, from v6 view), error counts, and total queries.

Output includes:

  • Summary counts: X clusters monitored -- Y Healthy, Z Warning, W Critical
  • Per-cluster table with: health status, avg/max compaction, avg latency, errors (with rate), total queries
  • Recommendations for Warning and Critical clusters

Health classification:

  • Critical: max compaction > 5,000 or avg latency > 1,000ms
  • Warning: max compaction > 2,000 or avg latency > 200ms
  • Healthy: below both thresholds

Function: query_latency_trends()

Analyzes query latency trends over 7 days using window functions (FIRST_VALUE / LAST_VALUE) to compute percent change from start to end of the period.

Output includes:

  • Table with cluster_id, first_val, last_val, pct_change
  • Interpretation guidelines (positive = worsening, >20% = flag for review)

Clusters Near Breaking Point

Function: clusters_near_breaking_point()

Identifies clusters approaching their limits across two dimensions:

  1. Resource thresholds -- Clusters with max compaction score > 1,000, JVM heap, or BE process memory exceeding limits (gauge metrics from MV).
  2. High latency hours -- Clusters with query latency > 200ms (histogram from v6 view), counted as hours.

Output includes:

  • Resources exceeding thresholds table
  • Hours with high latency table
  • Action items for compaction backlog, JVM heap, and latency patterns

Alert Summary

Function: alert_summary()

Notification-style summary from real Lark alerts.

Output includes:

  • Recent Lark alerts (7d) grouped by alert_name and status
  • Top firing alerts by cluster
  • Error spikes from metrics (days with >100 errors)

Usage Patterns

Function: usage_patterns()

Analyzes usage patterns across the fleet.

Output includes:

  • Cluster activity ranking (total queries, avg QPS hourly)
  • Peak usage hours (UTC)
  • Low-activity clusters (avg concurrent queries < 5) -- candidates for downsizing

Data Source Strategy

The LocalAnalyst uses two different data sources based on metric type:

Metric TypeSourceReason
Gauge (point-in-time)metrics.amv_hourly_snapshots_v1MAX/AVG are meaningful for snapshot values
Counter/Histogram (deltas)metrics.metrics_hourly_view_v6Provides pre-computed hourly deltas

Function Registry

The ANALYSIS_FUNCTIONS dict maps UI button labels to functions:

ANALYSIS_FUNCTIONS = {
"Cluster Health Overview": cluster_health_overview,
"Query Latency Trends": query_latency_trends,
"Clusters Near Breaking Point": clusters_near_breaking_point,
"Alert Summary": alert_summary,
"Usage Patterns": usage_patterns,
}

Usage

from byoc_agent.local_analyst import ANALYSIS_FUNCTIONS

# Run a specific analysis
report_md = ANALYSIS_FUNCTIONS["Cluster Health Overview"]()
print(report_md)

# Or call directly
from byoc_agent.local_analyst import cluster_health_overview
report = cluster_health_overview()