Tag Archives: Tinder

A Tale of Two (More) Outages, Featuring Facebook, Instagram and Poor Tinder

Last week we started off with a tale of two outages by Amazon’s cloud and Microsoft’s Skype, showing us what it’s like when you can’t make Skype calls, watch your favorite show on Netflix, or command your smart-home personal assistant.

This week, however, we’ve got a taste of what it’s like when we’re cut off the social network, with both Facebook and Instagram suffering outages of around an hour. No formal clarifications from either as of yet. Interesting to note that Tinder got hit by both last week’s and this week’s outages despite the very different sources (more on that below).

facebook-down-28-09-2015-a

This was Facebook’s 2nd outage in less than a week, and it’s the 3rd this month. Not the best record, even compared to last year. For Instagram it’s not the first outage either. In fact, both Facebook and Instagram suffered outage together beginning of this year, which shows how tightly Instagram’s system was coupled with Facebook’s services (and vulnerabilities) following the acquisition. The coupling stirred up the user community around the globe:

instagram-down-28-09-2015-a

Facebook’s last outage from last week took down the service for 2.5 hours, in what Facebook described as “the worst outage we’ve had in over four years”. Facebook later wrote a detailed technical post explaining the root cause of the failure was a configuration issue:

An automated system for verifying configuration values ended up causing much more damage than it fixed.

Although this automated system is designed to prevent configuration problems, this time it caused them. This just shows us that even the most rigorous safeguards have limitations and no system is immuned, not even the major cloud vendors. We saw configuration problems taking down Amazon’s cloud last week and Microsoft’s cloud late last year, just to recall a few.

Applications which utilize this infrastructure are continually affected by the outages. One good example is Tinder, which got affected last week by Amazon’s outage as it uses its Amazon Web Services, and this week again, this time probably due to its use of Facebook services. But the good news are that though outages are bound to happen, there are things you can do to reduce the impact on your system. If you find that interesting, I highly recommend you have a look at last week’s post.

1311765722_picons03 Follow Dotan on Twitter!

Advertisements

1 Comment

Filed under Cloud

A Tale of Two Outages Featuring Amazon, Microsoft And An Un-Smart Home

Update: Following the subsequent official announcements of Amazon and Microsoft I updated the post with more information on the outages and relevant links

Here it is again. A major outage in Amazon’s AWS data center in North Virginia takes down the cloud service in Amazon’s biggest region, and with it, taking down a multitude of cloud-based services such as Netflix, Tinder, AirBnB and Wink. This is not the first time it happens, and not even the worst. At least this time it didn’t last for days. This time it was their DynamoDB that went down and took down a host of other services, as Amazon describes in a lengthy blog post.

And Amazon is not alone in that. Microsoft today also suffered a major outage in its Skype service, which rendered the popular VoIP service unusable. In their update Skype reported the root cause was a bad configuration change:

We released a larger-than-usual configuration change, which some versions of Skype were unable to process correctly therefore disconnecting users from the network. When these users tried to reconnect, heavy traffic was created and some of you were unable to use Skype’s free services …

This time it was Microsoft’s Skype service, but we already saw how Microsoft’s Azure cloud can also suffer major outage, all on account of a configuration update.

One interesting effect was exposed due to this recent outage that is worth noting: up till now the impact was limited to online cloud services such as our movie or dating service. But now, with the penetration of the Internet of Things (IoT) to our homes, the effects of such cloud outage reach far beyond, and into our own homes and daily utilities, as nicely narrated by David Gewirtz’ piece on ZDnet, who tried voice-commanding its Amazon Echo (nicknamed “Alexa”) to turn on the lights and perform other home tasks and was left unanswered during the outage. The loss of faith in Alexas (they have 2 of them) which David described goes beyond technology realm and into psychological effects which extend beyond my field of expertise.

One conclusion could be that cloud computing is bad and should not be used. That would of course be the wrong conclusion, certainly when compared to outages in data centers. As I highlighted in the past, following simple guidelines can significantly reduce the impact of your cloud service to such infrastructure outages. If you are running a mission-critical system you may find that relying on a single cloud provider is not enough and may wish to use multi-cloud strategy to spread the risk and use disaster recovery policies between them. This will become increasingly important as the Internet of Things becomes ubiquitous in our homes and businesses, as heavily promoted by Amazon, Google, Samsung and the likes which combine IoT with their own cloud services.

One thing is for sure: if you connect your door locks to a cloud-based service – make sure you keep a copy of the good-old hard-copy key.

1311765722_picons03 Follow Dotan on Twitter!

2 Comments

Filed under Cloud, IoT