Skip to main content

BYOCAgent -- Interactive Chat Agent

Source: byoc_agent/agent.py

The BYOCAgent is a Claude-powered interactive chat agent that operators use to ask ad-hoc questions about BYOC cluster health, query performance, usage patterns, and alerting.

Architecture

BYOCAgent wraps the Anthropic Messages API in an agentic tool-use loop:

  1. User sends a message.
  2. Agent calls Claude with the system prompt, conversation history, and MCP tool definitions.
  3. If Claude requests tool calls, the agent executes them via the MCP client and feeds results back.
  4. Loop continues until Claude returns end_turn (no more tool calls) or the round limit is reached.
User → BYOCAgent.chat() → Claude API → tool_use? → MCP Client → StarRocks
↑ |
└───── tool results ────────┘

Configuration

ParameterDefaultDescription
modelclaude-sonnet-4-5-20250929Anthropic model ID
MAX_TOOL_ROUNDS15Maximum agentic loop iterations to prevent runaway tool calls

System Prompt

The system prompt provides the agent with full schema awareness across three databases:

  • metrics -- Hourly aggregated metrics (amv_hourly_snapshots_v1), customer profiles, daily summaries, metric dictionary.
  • byoc -- 16 infrastructure tables covering organizations, accounts, clusters, resources, billing, VM catalog, and scaling policies.
  • alerts -- Parsed Lark alerts and auto-generated recommendations.

Cross-database join paths are documented in the prompt so the agent can correlate data across all three databases.

Key Capabilities

Schema Discovery

On the first interaction, the agent auto-discovers the warehouse schema via mcp_client.discover_schema(). The result is injected into conversation history as context.

Query Execution

The agent uses read_query MCP tool to execute SELECT statements against StarRocks. It uses cross-database syntax (e.g., metrics.amv_hourly_snapshots_v1).

Analysis

The agent interprets query results, computes trends, detects anomalies, and provides actionable insights. Key metrics it knows about:

CategoryMetrics
CPUstarrocks_be_cpu_util_percent
Memorystarrocks_be_jvm_heap_used_percent, starrocks_be_process_mem_bytes
Diskstarrocks_be_disks_data_used_pct
Query Latencystarrocks_fe_query_latency_ms_p99, p95
Query Volumestarrocks_fe_query_total, starrocks_be_pip_query_ctx_cnt
Errorsstarrocks_fe_query_err
Compactionstarrocks_be_compaction_score_average

Health Scoring Guidelines (in prompt)

LevelCriteria
HealthyCPU < 55% avg, memory < 55%, latency stable, error rate < 0.1%
WarningCPU 55-70% avg or 70-85% max, memory 55-75%, latency increasing
CriticalCPU > 70% avg or > 85% max, memory > 75%, error rate > 2%

Error Handling

The agent handles several Anthropic API error classes gracefully:

  • BadRequestError -- Detects credit/billing issues and provides a helpful message.
  • AuthenticationError -- Prompts the user to check their API key.
  • RateLimitError -- Asks the user to wait and retry.
  • APIError -- Generic API error with details.

In all error cases, the unanswered user message is popped from conversation history to keep the conversation state valid.

Usage

from byoc_agent.agent import BYOCAgent

agent = BYOCAgent(mcp_client=my_mcp_client)
response = await agent.chat("Which clusters have the highest error rates this week?")
print(response)

# Reset conversation for a fresh session
agent.reset_conversation()

Conversation Management

  • conversation_history -- List of message dicts maintained across turns.
  • _schema_cached -- Boolean flag preventing redundant schema discovery.
  • reset_conversation() -- Clears history and schema cache for a fresh session.