It's operations. You fuck up, you suck it up, you fix it, then (and this is the important part) you prevent it from ever happening again. Feeling like shit for bringing something down is a good way to give yourself depression, given how often you will screw the pooch with root. In the same vein, anybody who says they'd fire the operator without any qualification on that remark should be given a wide berth.
People tend to forget that "fixing it" isn't just technical, it involves process, too. Every new hire that whines about change control and downtime windows would be the first to suggest them, were they troubleshooting the outage that demonstrated the need.
Nonsense. Someone has to be operating at the sharp end of the enable prompt, and sooner or later it'll be 0330 and that person will type Ethernet0 when they meant Ethernet1, whatever management you have in place.
When that happens, you do just what Joyent did here: you send out an embarrassed email to customers, everyone else in the ops team gets a few cheap laughs at the miscreant's expense, you have a meeting about it, discuss lessons learned, and you move on.
Everyone screws up. Everything goes down once in a while. This is why you build in redundancy at every level.
I've seen generally brilliant people be bit by bad process. The worst example was an important hard drive being wiped thanks to a lack of labeling, obviously taking a production server down with it.
Other things that have caused outages: lack of power capacity planning, unplugging an unrelated test server from the network (go go gadget BGP), cascading backup power failure, building maintenance taking down AC units, expensive equipment caching ARP replies indefinitely… the list goes on.
I had my own fun fuckup too. I learned SQL on PostgreSQL, and had to fix a problem with logged data in a MySQL database. Not trusting myself, I typed "BEGIN;" to enter a transaction, ran my update, and queried the table to check my results. I noticed my update did more than I expected, so I entered "ROLLBACK;" only to learn that MyISAM tables don't actually implement transactions.
Thankfully, in this case it turned out to be possible to undo the damage, but talk about a heart-stopping moment!
Shit happens. You deal with it, then do what you can to keep it from happening again. I've learned to respect early morning change windows as a way to limit damage caused by mistakes.
People tend to forget that "fixing it" isn't just technical, it involves process, too. Every new hire that whines about change control and downtime windows would be the first to suggest them, were they troubleshooting the outage that demonstrated the need.