Skip to main content

Daily Alert Pipeline

A standalone cron script that fetches alerts from the Lark API, parses them, inserts into StarRocks, and generates daily recommendations.

Key file: daily_alert_pipeline.py

Usage

python3 daily_alert_pipeline.py                       # Process yesterday
python3 daily_alert_pipeline.py 2026-03-10 # Process specific date
python3 daily_alert_pipeline.py 2026-03-10 2026-03-11 # Process multiple dates
python3 daily_alert_pipeline.py --backfill 7 # Process last 7 days
python3 daily_alert_pipeline.py --catch-up # Yesterday + today (up to now)
python3 daily_alert_pipeline.py --today # Today only (up to now)

Environment Variables

VariableDefaultNotes
LARK_APP_ID(required)Lark app credentials
LARK_APP_SECRET(required)Lark app credentials
LARK_CHAT_IDoc_392593bb5ace3f00ee10ab53bfe7681fBYOC Online Alarm channel
STARROCKS_HOST1cogri9tn-internal.cloud-app.celerdata.comDirect internal endpoint (no SSH tunnel on EC2)
STARROCKS_PORT9030
STARROCKS_USERkk
STARROCKS_PASSWORD(required)

Pipeline Steps

1. Fetch from Lark API

Uses a built-in LarkClient class that authenticates via tenant access token (/auth/v3/tenant_access_token/internal). Fetches messages from the chat container with time-range filtering and pagination (page_size: 50, sorted by ByCreateTimeAsc).

2. Parse Alerts

Contains a self-contained copy of parse_alert_card() (from load_lark_alerts.py). Parses each interactive card message into alert rows. Appends -f (Firing) or -r (Resolved) suffix to message IDs to handle multi-alert cards where a single Lark message contains both statuses.

3. Insert into StarRocks

Batch inserts parsed rows into alerts.lark_alerts via mysql.connector.

4. Generate Recommendations

The populate_recommendations() function runs after ingestion:

  1. Queries all alerts for the target date (by created_at in Pacific time).
  2. Aggregates by alert_name: fired count, resolved count, affected clusters, regions.
  3. Looks up severity and recommendation text from built-in dictionaries.
  4. Flags noisy alerts (fired 10+ times in a day, or affecting 3+ clusters).
  5. Deletes and re-inserts recommendations for that date (idempotent).

Severity Levels

SeverityExample Alerts
CriticalProcNotRunning, BeAliveAbnormal, ClusterStateAbnormal, FEQueryErrRateMoreThan60%
WarningFEHeapUsageTooHigh, FEMaxTabletCompaction, RootFreeDiskLessThan10%, FEGCCount
InfoOperationDurationGT10m

The full mapping covers 21 known alert types with specific recommendations for each.

Deployment

Designed to run on an EC2 instance in the same VPC as StarRocks (connects directly to the internal endpoint, no SSH tunnel needed). Schedule via cron to run daily after midnight Pacific.