Daily Alert Pipeline
A standalone cron script that fetches alerts from the Lark API, parses them, inserts into StarRocks, and generates daily recommendations.
Key file: daily_alert_pipeline.py
Usage
python3 daily_alert_pipeline.py # Process yesterday
python3 daily_alert_pipeline.py 2026-03-10 # Process specific date
python3 daily_alert_pipeline.py 2026-03-10 2026-03-11 # Process multiple dates
python3 daily_alert_pipeline.py --backfill 7 # Process last 7 days
python3 daily_alert_pipeline.py --catch-up # Yesterday + today (up to now)
python3 daily_alert_pipeline.py --today # Today only (up to now)
Environment Variables
| Variable | Default | Notes |
|---|---|---|
LARK_APP_ID | (required) | Lark app credentials |
LARK_APP_SECRET | (required) | Lark app credentials |
LARK_CHAT_ID | oc_392593bb5ace3f00ee10ab53bfe7681f | BYOC Online Alarm channel |
STARROCKS_HOST | 1cogri9tn-internal.cloud-app.celerdata.com | Direct internal endpoint (no SSH tunnel on EC2) |
STARROCKS_PORT | 9030 | |
STARROCKS_USER | kk | |
STARROCKS_PASSWORD | (required) |
Pipeline Steps
1. Fetch from Lark API
Uses a built-in LarkClient class that authenticates via tenant access token (/auth/v3/tenant_access_token/internal). Fetches messages from the chat container with time-range filtering and pagination (page_size: 50, sorted by ByCreateTimeAsc).
2. Parse Alerts
Contains a self-contained copy of parse_alert_card() (from load_lark_alerts.py). Parses each interactive card message into alert rows. Appends -f (Firing) or -r (Resolved) suffix to message IDs to handle multi-alert cards where a single Lark message contains both statuses.
3. Insert into StarRocks
Batch inserts parsed rows into alerts.lark_alerts via mysql.connector.
4. Generate Recommendations
The populate_recommendations() function runs after ingestion:
- Queries all alerts for the target date (by
created_atin Pacific time). - Aggregates by
alert_name: fired count, resolved count, affected clusters, regions. - Looks up severity and recommendation text from built-in dictionaries.
- Flags noisy alerts (fired 10+ times in a day, or affecting 3+ clusters).
- Deletes and re-inserts recommendations for that date (idempotent).
Severity Levels
| Severity | Example Alerts |
|---|---|
| Critical | ProcNotRunning, BeAliveAbnormal, ClusterStateAbnormal, FEQueryErrRateMoreThan60% |
| Warning | FEHeapUsageTooHigh, FEMaxTabletCompaction, RootFreeDiskLessThan10%, FEGCCount |
| Info | OperationDurationGT10m |
The full mapping covers 21 known alert types with specific recommendations for each.
Deployment
Designed to run on an EC2 instance in the same VPC as StarRocks (connects directly to the internal endpoint, no SSH tunnel needed). Schedule via cron to run daily after midnight Pacific.