BYOCAgent -- Interactive Chat Agent

Source: byoc_agent/agent.py

The BYOCAgent is a Claude-powered interactive chat agent that operators use to ask ad-hoc questions about BYOC cluster health, query performance, usage patterns, and alerting.

Architecture

BYOCAgent wraps the Anthropic Messages API in an agentic tool-use loop:

User sends a message.
Agent calls Claude with the system prompt, conversation history, and MCP tool definitions.
If Claude requests tool calls, the agent executes them via the MCP client and feeds results back.
Loop continues until Claude returns end_turn (no more tool calls) or the round limit is reached.

User → BYOCAgent.chat() → Claude API → tool_use? → MCP Client → StarRocks
                              ↑                           |
                              └───── tool results ────────┘

Configuration

Parameter	Default	Description
`model`	`claude-sonnet-4-5-20250929`	Anthropic model ID
`MAX_TOOL_ROUNDS`	15	Maximum agentic loop iterations to prevent runaway tool calls

System Prompt

The system prompt provides the agent with full schema awareness across three databases:

metrics -- Hourly aggregated metrics (amv_hourly_snapshots_v1), customer profiles, daily summaries, metric dictionary.
byoc -- 16 infrastructure tables covering organizations, accounts, clusters, resources, billing, VM catalog, and scaling policies.
alerts -- Parsed Lark alerts and auto-generated recommendations.

Cross-database join paths are documented in the prompt so the agent can correlate data across all three databases.

Key Capabilities

Schema Discovery

On the first interaction, the agent auto-discovers the warehouse schema via mcp_client.discover_schema(). The result is injected into conversation history as context.

Query Execution

The agent uses read_query MCP tool to execute SELECT statements against StarRocks. It uses cross-database syntax (e.g., metrics.amv_hourly_snapshots_v1).

Analysis

The agent interprets query results, computes trends, detects anomalies, and provides actionable insights. Key metrics it knows about:

Category	Metrics
CPU	`starrocks_be_cpu_util_percent`
Memory	`starrocks_be_jvm_heap_used_percent`, `starrocks_be_process_mem_bytes`
Disk	`starrocks_be_disks_data_used_pct`
Query Latency	`starrocks_fe_query_latency_ms_p99`, `p95`
Query Volume	`starrocks_fe_query_total`, `starrocks_be_pip_query_ctx_cnt`
Errors	`starrocks_fe_query_err`
Compaction	`starrocks_be_compaction_score_average`

Health Scoring Guidelines (in prompt)

Level	Criteria
Healthy	CPU < 55% avg, memory < 55%, latency stable, error rate < 0.1%
Warning	CPU 55-70% avg or 70-85% max, memory 55-75%, latency increasing
Critical	CPU > 70% avg or > 85% max, memory > 75%, error rate > 2%

Error Handling

The agent handles several Anthropic API error classes gracefully:

BadRequestError -- Detects credit/billing issues and provides a helpful message.
AuthenticationError -- Prompts the user to check their API key.
RateLimitError -- Asks the user to wait and retry.
APIError -- Generic API error with details.

In all error cases, the unanswered user message is popped from conversation history to keep the conversation state valid.

Usage

from byoc_agent.agent import BYOCAgent

agent = BYOCAgent(mcp_client=my_mcp_client)
response = await agent.chat("Which clusters have the highest error rates this week?")
print(response)

# Reset conversation for a fresh session
agent.reset_conversation()

Conversation Management

conversation_history -- List of message dicts maintained across turns.
_schema_cached -- Boolean flag preventing redundant schema discovery.
reset_conversation() -- Clears history and schema cache for a fresh session.

Architecture​

Configuration​

System Prompt​

Key Capabilities​

Schema Discovery​

Query Execution​

Analysis​

Health Scoring Guidelines (in prompt)​

Error Handling​

Usage​

Conversation Management​