Skip to main content

Tech Stack

Reference of all technologies, frameworks, and infrastructure used in the platform.

Backend

TechnologyVersionPurpose
Python3.11Primary backend language
FastAPI>=0.109API framework (14 routers, async support)
Uvicorn>=0.27ASGI server
mysql-connector-python>=8.2StarRocks connectivity (MySQL-compatible protocol)
Pydantic>=2.5Request/response validation, settings
PyJWT>=2.8JWT authentication tokens
psycopg2-binary>=2.9Supabase (PostgreSQL) connectivity for auth/users
python-dotenv>=1.0Environment variable management
PyYAML>=6.0Health scoring rules, issue grouping rules
Flask(alert_webhook)Lightweight webhook receiver for Grafana alerts
Anthropic SDK-Claude API for Investigator, Patrol, and Chat agents
Pandas-Data manipulation in agent tools and scoring pipelines
Requests-HTTP client for Lark API and Knowledge Lake MCP

Agent Architecture

ComponentModuleDescription
AgentBasebyoc_agent/agent_base.pyReusable tool-use loop; supports Anthropic and OpenAI-compatible LLM providers
AgentToolsbyoc_agent/agent_tools.pySQL-backed tools for autonomous agents (no MCP subprocess needed)
KnowledgeLakeClientbyoc_agent/knowledge_lake_client.pyHTTP+SSE client for MCP-based knowledge search
UnifiedScorerbyoc_agent/unified_scorer.py14-dimension health scoring: 60% metrics + 25% alerts + 15% tier
IssueTrackerbyoc_agent/issue_tracker.pyAlert grouping with anomaly/failure/escalation strategy

Frontend

TechnologyVersionPurpose
React19.xUI framework
TypeScript~5.9Type-safe JavaScript
Vite6.xBuild tool and dev server
Tailwind CSSv4.2Utility-first CSS framework
React Router7.xClient-side routing (12 pages)
TanStack React Query5.xServer state management and caching
Lucide React0.577Icon library
React Markdown10.xRendering agent/patrol report markdown
Looker Embed SDK2.xEmbedded Looker dashboards with SSO
Axios1.xHTTP client for API calls
class-variance-authority0.7Component variant management (shadcn/ui pattern)

Frontend Pages

12 pages under frontend/src/pages/:

Overview.tsx        -- Fleet health dashboard
Issues.tsx -- Issue tracker with triage workflow
RawAlerts.tsx -- Raw lark_alerts table view
BreakingPoint.tsx -- Capacity and risk forecasting
UsagePatterns.tsx -- Cluster utilization trends
Investigations.tsx -- Agent investigation reports
AIIssues.tsx -- AI-grouped issue summaries
Patrol.tsx -- Fleet patrol reports
Chat.tsx -- Interactive BYOCAgent chat
LLMUsage.tsx -- Token and cost tracking
Settings.tsx -- User preferences
Help.tsx -- Platform documentation

Database

SystemPurposeAccess Method
StarRocksPrimary OLAP store (3 databases: metrics, byoc, alerts)mysql-connector-python over MySQL protocol on port 9030
SupabaseUser authentication and managementpsycopg2-binary (PostgreSQL) + Supabase Auth API

StarRocks Connection

  • Production (EC2): Direct connection to 1cogri9tn-internal.cloud-app.celerdata.com:9030 (same VPC)
  • Local development: SSH tunnel through bastion (100.100.118.18 via Tailscale) to 127.0.0.1:9030
  • Important: The mysql CLI does not work (protocol mismatch). Always use Python mysql.connector.

Infrastructure

ComponentServicePurpose
ComputeAWS EC2Bastion host, cron jobs (daily pipeline, Sentinel/Investigator every 15 min, Patrol 2x/day)
ServerlessAWS LambdaBastion wake-up function
IaCAWS CloudFormationInfrastructure provisioning
MonitoringAWS CloudWatchInfrastructure-level logging and alarms
ContainersDockerApplication packaging (python:3.11-slim base image)
NetworkingTailscale VPNSecure access to bastion host for local development
CI/CDGitHub ActionsAuto-deploy on push to main (paths: byoc_agent/**, Dockerfile, pyproject.toml)

Deployment Pipeline

The GitHub Actions workflow (.github/workflows/deploy.yml):

  1. Triggers on push to main for relevant paths
  2. Calls Lambda to wake the EC2 bastion (waits 75s for boot)
  3. SSHs into bastion and deploys updated code

External Integrations

SystemIntegrationPurpose
LarkOpen API (im_v1_message_list)Alert channel message fetching
GrafanaWebhook contact point + dashboardsAlert delivery and cluster dashboard links
LookerEmbed SDK + LookMLEmbedded analytics dashboards
MCP (Model Context Protocol)Streamable HTTPKnowledge Lake search (vector + fulltext), Lark API tools
Claude APIAnthropic SDKLLM backbone for Investigator, Patrol, and Chat agents

Configuration

Key configuration files:

FilePurpose
.envStarRocks credentials, Lark API keys, LLM provider config
byoc_agent/health_rules.yamlScoring thresholds per metric dimension
byoc_agent/issue_rules.yamlAlert grouping rules and severity mappings
backend/config.pyFastAPI settings (Pydantic Settings)
DockerfileContainer image definition
entrypoint.shContainer startup script
.github/workflows/deploy.ymlCI/CD pipeline definition