Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know about that, frequently the "cause" that you're alerting on is also a symptom. "DB down" is causing your webapp to fail, but why is the DB down? Misconfiguration, hardware failure, power outage? Heck, maybe the DB is not actually down but there was a network failure that made it unreachable from the monitoring server.

My point is that alerting on a "cause" may not actually get you to the root cause, and maybe not even all that much closer than a symtpom.



> Misconfiguration, hardware failure, power outage? Heck, maybe the DB is not actually down but there was a network failure that made it unreachable from the monitoring server.

Sorry, I should have been more specific. DB daemon being down is a cause, and should be monitored. Hardware down is a cause, and should be monitored. Network availability is a cause and should be monitored. Power outage... you get my drift.

I think the root cause of my disagreement with this document is the lack of a proper dependency tree in the alerting tool. Their tool appears to want to alert on any and every monitored problem, which necessitates limiting what you monitor for fear of a pager flood. A proper tool can address this problem correctly.


911: PING TO GOOGLE >100ms




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: