Bilal Ahmad

~Engineers solve problems, I solve engineer's problems 🤘

Mastering the Art of Alerting: From Chaos to Clarity

Alerting is a journey, not a destination, and many organizations follow a familiar path:

  1. No alerts – the silence is unsettling.
  2. Too many alerts – the noise becomes overwhelming.
  3. Prioritized alerts – only critical ones wake you up at night.
  4. Ignored non-critical alerts – a new problem emerges.

The natural progression leads to a two-tiered alerting setup: critical alerts (those that require immediate action) and non-critical alerts (handled asynchronously via email or dashboards). While this structure reduces noise, it introduces a new challenge i.e. non-critical alerts are often ignored, leading to a pileup of unresolved issues.

To tackle this, we introduced regular Alerting review meetings. These biweekly sessions serve as an opportunity to refine our alerting system. For critical alerts, we evaluate if they’re truly urgent enough to justify waking someone up. For non-critical ones, we brainstorm ways to reduce noise. It can be done by tweaking thresholds, creating automation, or determining if the alert is still relevant.

This iterative process ensures alerting remains actionable and aligned with our evolving priorities, striking the perfect balance between responsiveness and sanity.

back