Last week we started off with a tale of two outages by Amazon’s cloud and Microsoft’s Skype, showing us what it’s like when you can’t make Skype calls, watch your favorite show on Netflix, or command your smart-home personal assistant.
This week, however, we’ve got a taste of what it’s like when we’re cut off the social network, with both Facebook and Instagram suffering outages of around an hour. No formal clarifications from either as of yet. Interesting to note that Tinder got hit by both last week’s and this week’s outages despite the very different sources (more on that below).
This was Facebook’s 2nd outage in less than a week, and it’s the 3rd this month. Not the best record, even compared to last year. For Instagram it’s not the first outage either. In fact, both Facebook and Instagram suffered outage together beginning of this year, which shows how tightly Instagram’s system was coupled with Facebook’s services (and vulnerabilities) following the acquisition. The coupling stirred up the user community around the globe:
Facebook’s last outage from last week took down the service for 2.5 hours, in what Facebook described as “the worst outage we’ve had in over four years”. Facebook later wrote a detailed technical post explaining the root cause of the failure was a configuration issue:
An automated system for verifying configuration values ended up causing much more damage than it fixed.
Although this automated system is designed to prevent configuration problems, this time it caused them. This just shows us that even the most rigorous safeguards have limitations and no system is immuned, not even the major cloud vendors. We saw configuration problems taking down Amazon’s cloud last week and Microsoft’s cloud late last year, just to recall a few.
Applications which utilize this infrastructure are continually affected by the outages. One good example is Tinder, which got affected last week by Amazon’s outage as it uses its Amazon Web Services, and this week again, this time probably due to its use of Facebook services. But the good news are that though outages are bound to happen, there are things you can do to reduce the impact on your system. If you find that interesting, I highly recommend you have a look at last week’s post.
Follow Dotan on Twitter!