Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Juniper bug takes down core Internet routers around the world (silicon.com)
39 points by webnzi on Nov 8, 2011 | hide | past | favorite | 16 comments


You can't really blame Juniper for these crashes. They released a fix for the issue months ago. These customers must not have upgraded. See the juniper software alert here: http://pastebin.com/HBWiH92j


Can't really blame a manufacturer for a bug that crashes routers by easily passed BGP parameters? It's not possible it's their fault this wasn't caught in dev-test?

I've seen many environments where it's a multi-year process to roll out new code, particularly in service providers. I wouldn't classify asking a service provider to change all their code in a matter of months a reasonable request. The amount of testing they do before rolling code is very time consuming. It's not uncommon for service providers to run on 3-4-5 year old code.


Running 3-5 year old code shows that the provider/company probably doesn't prioritize or fund testing as well as they should IMO. I'm sure Juniper would work with any company who asked for a patch or custom work around if they cannot upgrade now.

I see your point, Juniper missed a serious bug in their testing. But you can't hold them completely accountable when they've already announced and released a fix that corrects the issue.


The problem is, by releasing the patch they alert attackers to the existence of the bug. I don't know this to be the case here, but for things like Windows vulnerabilities the time between releasing the patch and it being exploited in the wild is only a few hours.

For systems (like core routers) that are simultaneously too critical and too available to permit timely maintenance cycles, the only solution is to not have any bugs ever.


good luck with that.

if something is too critical to be taken offline, it should have a hot standby, right?


The bulletin is 3 months old. I would find it surprising if they updated their core router firmware that often, and even if they did, it's probably with a version that is already several months old and has been through lab testing and small scale test deployment. A bulletin like that would have likely reset the cycle.


You could also say that this is a situation Cisco and Juniper have created. In the past, they have not been great at delivering good quality software. Some still say that Cisco's version 1 is really a beta. This has created a culture of caution at ISPs where they have had to create their own long acceptance testing procedures, causing long periods between updates.


It was infected with Bob Muglitis!!!


The frailties of monoculture are well known.

This is one of the reasons I'm hesitant to embrace the various NoSQL options at this time - often there's only one implementation of an API, and it's tied to that code.

Compare this to the various message queueing options that all support STOMP or AMPQ, or programming languages that have multiple implementations.

Networking needs to define a format spec for routing and switching, and then have vendors meet the spec. Fortunately we should be getting something like this with software defined networking projects like OpenFlow.


"Networking needs to define a format spec for routing and switching, and then have vendors meet the spec."

Please check out the IETF (www.ietf.org) -- This is exactly how it works.

But BGP has no security, is complicated from an implementation standpoint, and you are right, there is a bit of a software duoculture. Juniper and Cisco. That's it.

This has happened before... too bad the routers didn't crash BEFORE propagating the bad BGP updates. :-)


I'm just wondering, is Alcatel-Lucent still a player or are they no longer relevant?


They still make some great gear, as does Redback (now Ericsson) and a bunch of others. But for direct, Internet facing devices that manage the full global routing table, the preferred option is still Cisco or Juniper.


Even if there were many main stream options, Level 3 would not likely deploy much more than two in each level of it's network because of the complexity of managing and maintaining them. It might even be worse with OpenFlow if the majority of hardware vendors deployed the same open-source derived software, all with the same bug.


I agree with the advantages of a format spec for routing and switching and the promise of OpenFlow, but I think it's orthogonal. I could be misunderstanding your comment, but this failure was due to a software bug, which could exist whether or not there's a spec.


Are OSPF abs BGP not open standards for routing?

If a router is reset couldn't that cause some large changes in the BGP tables and subsequent route flapping as routers go on and off line. Perhaps large changes like that trigger the juniper bug if there is one.





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: