Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google's servers are incredibly complex. Probably too complex - the more complicated they make their infrastructure (datacenter failover, region failover, bla bla) the more unstable it seems to get.

Google uses a very bureaucratic code commit system that requires sign offs from different people. This process takes a long time, and devs can't move onto the next step until the previous step has been accepted [1]. While this system is awesome for catching the localized bugs (no buffer overflow is going to get past that kind of code review), there is a major tradeoff. A dev can only keep so much state in mind when building architecture. If he is only working on the problem once a week with large time gaps, is he not going to lose track of important pieces of the puzzle?

This is probably the age old problem - if you make something that is too clever for even the creator to fully understand, how are you possibly going to make sure it is bug free? The problem being some delay between Google servers hints at an inter-region datacenter problem. I wonder if anybody at Google even understands the entire failover and interlinked data center system completely?

[1] http://www.splinter.com.au/2012/12/26/behind-enemy-lines-goo...



That link seems to be dead.

I wish I could share with you the pictures of "the big picture" in which every piece of proprietary tech was given its own little circle on a whiteboard and then was connected to everything else which it uses or which uses it.

To say it was huge was an understatement.


Er, except Google's services aren't unstable.

That doesn't mean they're completely problem-free, but nothing is.

[and given the importance of services like gmail to vast swathes of the population, I certainly hope they require many sign-offs for code commits!]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: