I don't know about that, frequently the "cause" that you're alerting on is also ...

falcolas · on Oct 14, 2014

> Misconfiguration, hardware failure, power outage? Heck, maybe the DB is not actually down but there was a network failure that made it unreachable from the monitoring server.

Sorry, I should have been more specific. DB daemon being down is a cause, and should be monitored. Hardware down is a cause, and should be monitored. Network availability is a cause and should be monitored. Power outage... you get my drift.

I think the root cause of my disagreement with this document is the lack of a proper dependency tree in the alerting tool. Their tool appears to want to alert on any and every monitored problem, which necessitates limiting what you monitor for fear of a pager flood. A proper tool can address this problem correctly.

rhizome · on Oct 14, 2014

911: PING TO GOOGLE >100ms