Incident Report
At 14:32 UTC, your on-call alert fires: HTTP 500 error rate jumped from 0.1% to 18% on the payments service, 8 minutes after a production deployment. Latency p99 is at 4.2s (up from 340ms). The deployment included a refactor of the checkout flow and a new third-party analytics library. You have 15 minutes to respond before SLA breach.
Current Metrics
Error rate
18.3% (↑ from 0.1%)
p99 latency
4,200ms (↑ from 340ms)
CPU usage
42% (normal)
DB query time
38ms (normal)
Active deploys
v2.4.1 pushed 8 min ago
Pod restarts
0