BYOCAgent -- Interactive Chat Agent
Source: byoc_agent/agent.py
The BYOCAgent is a Claude-powered interactive chat agent that operators use to ask ad-hoc questions about BYOC cluster health, query performance, usage patterns, and alerting.
Architecture
BYOCAgent wraps the Anthropic Messages API in an agentic tool-use loop:
- User sends a message.
- Agent calls Claude with the system prompt, conversation history, and MCP tool definitions.
- If Claude requests tool calls, the agent executes them via the MCP client and feeds results back.
- Loop continues until Claude returns
end_turn(no more tool calls) or the round limit is reached.
User → BYOCAgent.chat() → Claude API → tool_use? → MCP Client → StarRocks
↑ |
└───── tool results ────────┘
Configuration
| Parameter | Default | Description |
|---|---|---|
model | claude-sonnet-4-5-20250929 | Anthropic model ID |
MAX_TOOL_ROUNDS | 15 | Maximum agentic loop iterations to prevent runaway tool calls |
System Prompt
The system prompt provides the agent with full schema awareness across three databases:
metrics-- Hourly aggregated metrics (amv_hourly_snapshots_v1), customer profiles, daily summaries, metric dictionary.byoc-- 16 infrastructure tables covering organizations, accounts, clusters, resources, billing, VM catalog, and scaling policies.alerts-- Parsed Lark alerts and auto-generated recommendations.
Cross-database join paths are documented in the prompt so the agent can correlate data across all three databases.
Key Capabilities
Schema Discovery
On the first interaction, the agent auto-discovers the warehouse schema via mcp_client.discover_schema(). The result is injected into conversation history as context.
Query Execution
The agent uses read_query MCP tool to execute SELECT statements against StarRocks. It uses cross-database syntax (e.g., metrics.amv_hourly_snapshots_v1).
Analysis
The agent interprets query results, computes trends, detects anomalies, and provides actionable insights. Key metrics it knows about:
| Category | Metrics |
|---|---|
| CPU | starrocks_be_cpu_util_percent |
| Memory | starrocks_be_jvm_heap_used_percent, starrocks_be_process_mem_bytes |
| Disk | starrocks_be_disks_data_used_pct |
| Query Latency | starrocks_fe_query_latency_ms_p99, p95 |
| Query Volume | starrocks_fe_query_total, starrocks_be_pip_query_ctx_cnt |
| Errors | starrocks_fe_query_err |
| Compaction | starrocks_be_compaction_score_average |
Health Scoring Guidelines (in prompt)
| Level | Criteria |
|---|---|
| Healthy | CPU < 55% avg, memory < 55%, latency stable, error rate < 0.1% |
| Warning | CPU 55-70% avg or 70-85% max, memory 55-75%, latency increasing |
| Critical | CPU > 70% avg or > 85% max, memory > 75%, error rate > 2% |
Error Handling
The agent handles several Anthropic API error classes gracefully:
BadRequestError-- Detects credit/billing issues and provides a helpful message.AuthenticationError-- Prompts the user to check their API key.RateLimitError-- Asks the user to wait and retry.APIError-- Generic API error with details.
In all error cases, the unanswered user message is popped from conversation history to keep the conversation state valid.
Usage
from byoc_agent.agent import BYOCAgent
agent = BYOCAgent(mcp_client=my_mcp_client)
response = await agent.chat("Which clusters have the highest error rates this week?")
print(response)
# Reset conversation for a fresh session
agent.reset_conversation()
Conversation Management
conversation_history-- List of message dicts maintained across turns._schema_cached-- Boolean flag preventing redundant schema discovery.reset_conversation()-- Clears history and schema cache for a fresh session.