Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My Nagios awoken 3 AM brain finds no fault in your logic.


Even funnier are people doing server monitoring of (things in EC2) from within EC2. When the EC2 outage happens, there's obviously no problem because no alerts get sent...

Doh!


For some people, that might be fine. If you don't have plans for how to rapidly move out of EC2, you might as well just sleep through an all-of-EC2-goes-down outage for all you can do about it.


You should at least know there is an outage to have something to tell your downstream customers. It is really embarrassing to have a customer (or your boss) call to report an outage you don't yet know about, even if there is fuck all you can do to resolve it. Basic principle of ops.


I wasn't being entirely serious. :-)


> Basic principle of ops

For my benefit, what are some others?


This would actually be an interesting blog post.


This is why my sleepy 3AM brain was awoken by Pingdom. Hooray for having just enough redundancy to tell you that it's not quite enough.

Good night.


Me, i use specific load balancing for my trafic when Cloud outage is detected. And i sleep perfectly ;-)


Could you give a little more detail on your setup? I'm curious how others are designing around these issues.


If my case can help you, my company uses services of one company for load-balancing trafic across multiple CDN/Cloud. We are no longer impacted by the failure of some providers. You can read this http://tinyurl.com/7pwfza7 (i'm user, not vendor)


I can't figure out why you people are using URL shorteners on HN, but I believe it is not looked upon well. So, for others, these links are as follows:

http://www.theregister.co.uk/2012/02/17/cedexis_and_the_open...

http://translate.google.fr/translate?hl=fr&sl=fr&tl=...


Very interesting flojibi. Another about multi cloud: http://bit.ly/zg37FQ


Even funnier than that is watching the latency hit at Rackspace Cloud and Terremark as some non-trivial number of customers fail over.


Do you work for a DNS provider or CDN or something (so as to see this in near realtime)? Envy.

I haven't seen a lot of people using both EC2 and Terremark for the same app -- kind of different markets. Not technically unreasonable, but Terremark seems to be more enterprise IT outsourcing, and EC2 (followed at very far remove by the other clouds, including Rackspace) being Internet-delivered consumer, etc. apps, or at least larger scale public services.


Here's an idea I've thought about but don't have time to do anything with: a peer-to-peer monitoring network, so each new server on each new network makes it more robust. No idea how the details would work out.


That gets done for network/application performance monitoring (alternatives to keynote, gomez, etc., and is how some of their own products work). It's kind of overkill for basic application level monitoring -- there's a tradeoff between number of endpoints checking and frequency of checks. I guess you could round-robin checks across a larger number of end nodes, too, to get both.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: