Tech Stack
Reference of all technologies, frameworks, and infrastructure used in the platform.
Backend
| Technology | Version | Purpose |
|---|---|---|
| Python | 3.11 | Primary backend language |
| FastAPI | >=0.109 | API framework (14 routers, async support) |
| Uvicorn | >=0.27 | ASGI server |
| mysql-connector-python | >=8.2 | StarRocks connectivity (MySQL-compatible protocol) |
| Pydantic | >=2.5 | Request/response validation, settings |
| PyJWT | >=2.8 | JWT authentication tokens |
| psycopg2-binary | >=2.9 | Supabase (PostgreSQL) connectivity for auth/users |
| python-dotenv | >=1.0 | Environment variable management |
| PyYAML | >=6.0 | Health scoring rules, issue grouping rules |
| Flask | (alert_webhook) | Lightweight webhook receiver for Grafana alerts |
| Anthropic SDK | - | Claude API for Investigator, Patrol, and Chat agents |
| Pandas | - | Data manipulation in agent tools and scoring pipelines |
| Requests | - | HTTP client for Lark API and Knowledge Lake MCP |
Agent Architecture
| Component | Module | Description |
|---|---|---|
AgentBase | byoc_agent/agent_base.py | Reusable tool-use loop; supports Anthropic and OpenAI-compatible LLM providers |
AgentTools | byoc_agent/agent_tools.py | SQL-backed tools for autonomous agents (no MCP subprocess needed) |
KnowledgeLakeClient | byoc_agent/knowledge_lake_client.py | HTTP+SSE client for MCP-based knowledge search |
UnifiedScorer | byoc_agent/unified_scorer.py | 14-dimension health scoring: 60% metrics + 25% alerts + 15% tier |
IssueTracker | byoc_agent/issue_tracker.py | Alert grouping with anomaly/failure/escalation strategy |
Frontend
| Technology | Version | Purpose |
|---|---|---|
| React | 19.x | UI framework |
| TypeScript | ~5.9 | Type-safe JavaScript |
| Vite | 6.x | Build tool and dev server |
| Tailwind CSS | v4.2 | Utility-first CSS framework |
| React Router | 7.x | Client-side routing (12 pages) |
| TanStack React Query | 5.x | Server state management and caching |
| Lucide React | 0.577 | Icon library |
| React Markdown | 10.x | Rendering agent/patrol report markdown |
| Looker Embed SDK | 2.x | Embedded Looker dashboards with SSO |
| Axios | 1.x | HTTP client for API calls |
| class-variance-authority | 0.7 | Component variant management (shadcn/ui pattern) |
Frontend Pages
12 pages under frontend/src/pages/:
Overview.tsx -- Fleet health dashboard
Issues.tsx -- Issue tracker with triage workflow
RawAlerts.tsx -- Raw lark_alerts table view
BreakingPoint.tsx -- Capacity and risk forecasting
UsagePatterns.tsx -- Cluster utilization trends
Investigations.tsx -- Agent investigation reports
AIIssues.tsx -- AI-grouped issue summaries
Patrol.tsx -- Fleet patrol reports
Chat.tsx -- Interactive BYOCAgent chat
LLMUsage.tsx -- Token and cost tracking
Settings.tsx -- User preferences
Help.tsx -- Platform documentation
Database
| System | Purpose | Access Method |
|---|---|---|
| StarRocks | Primary OLAP store (3 databases: metrics, byoc, alerts) | mysql-connector-python over MySQL protocol on port 9030 |
| Supabase | User authentication and management | psycopg2-binary (PostgreSQL) + Supabase Auth API |
StarRocks Connection
- Production (EC2): Direct connection to
1cogri9tn-internal.cloud-app.celerdata.com:9030(same VPC) - Local development: SSH tunnel through bastion (
100.100.118.18via Tailscale) to127.0.0.1:9030 - Important: The
mysqlCLI does not work (protocol mismatch). Always use Pythonmysql.connector.
Infrastructure
| Component | Service | Purpose |
|---|---|---|
| Compute | AWS EC2 | Bastion host, cron jobs (daily pipeline, Sentinel/Investigator every 15 min, Patrol 2x/day) |
| Serverless | AWS Lambda | Bastion wake-up function |
| IaC | AWS CloudFormation | Infrastructure provisioning |
| Monitoring | AWS CloudWatch | Infrastructure-level logging and alarms |
| Containers | Docker | Application packaging (python:3.11-slim base image) |
| Networking | Tailscale VPN | Secure access to bastion host for local development |
| CI/CD | GitHub Actions | Auto-deploy on push to main (paths: byoc_agent/**, Dockerfile, pyproject.toml) |
Deployment Pipeline
The GitHub Actions workflow (.github/workflows/deploy.yml):
- Triggers on push to
mainfor relevant paths - Calls Lambda to wake the EC2 bastion (waits 75s for boot)
- SSHs into bastion and deploys updated code
External Integrations
| System | Integration | Purpose |
|---|---|---|
| Lark | Open API (im_v1_message_list) | Alert channel message fetching |
| Grafana | Webhook contact point + dashboards | Alert delivery and cluster dashboard links |
| Looker | Embed SDK + LookML | Embedded analytics dashboards |
| MCP (Model Context Protocol) | Streamable HTTP | Knowledge Lake search (vector + fulltext), Lark API tools |
| Claude API | Anthropic SDK | LLM backbone for Investigator, Patrol, and Chat agents |
Configuration
Key configuration files:
| File | Purpose |
|---|---|
.env | StarRocks credentials, Lark API keys, LLM provider config |
byoc_agent/health_rules.yaml | Scoring thresholds per metric dimension |
byoc_agent/issue_rules.yaml | Alert grouping rules and severity mappings |
backend/config.py | FastAPI settings (Pydantic Settings) |
Dockerfile | Container image definition |
entrypoint.sh | Container startup script |
.github/workflows/deploy.yml | CI/CD pipeline definition |