Tonight, these sites and perhaps hundreds more died. At least temporarily.
The Internet was ablaze (almost ironically) tonight when a massive storm on the East Coast caused a power outage that affected Amazon’s “Amazon Web Services” cloud server system. AWS is what power so many sites and services these days. It’s their reliability, scalability, cost and speed that have so many startups jumping to their system.
However, tonight highlights what happens when you rely on a single fail point. What’s the adage? “You’re only as strong as your weakest link.” Well tonight it was a single Amazon server location’s power outage that has many of the most popular sites offline. Amazingly, Twitter is not one of the sites shutdown.
Once the power is back on, or once Amazon starts shifting bandwidth and syncing servers, all of these sites will be back up and running. What’s disappointing is that even though Amazon has tons of servers all over the country, it only took one power outage to cripple their system.
Until they do, it’s not only annoying, it’s losing these companies tons of money. Oh and did you know this happened a week and two weeks ago? That enterprise cloud service Google’s releasing is looking sexier.
Having a backup in place is elementary. Even I, a simple blogger running WordPress, know better than to put all my eggs in one basket. I offload a ton of my bandwidth to Amazon’s cloud servers because their fast and reliable. However, when they go down, shouldn’t my site go down? Not in my case. I had a backup in place.
In fact, I use Amazon to literally backup all of my sites and databases as well. Those backups are mirrored and chronicled on two other locations. I wrote about my process over there. I wonder how many people cache/run their site off of AWS and backup their site on AWS. I’d wager more than we’d like to believe.
At the very least tonight is a wakeup call to web services that rely on a single cloud. A hurricane, earthquake, tornado, wildfire, flood… Any of these alone can cause these sorts of outages. There’s got to be a smarter way to run these clouds. If not, maybe we’re not ready for a cloud-based life.
Luckily Hulu, Twitter and Facebook are up, so none of us actually have to leave the house to find entertainment and socialization. What a relief.
@justex07 It’s the worst day ever.
@justex07 It’s the worst day ever.
@orchidhunter tomorrow will be better (or worse) I swear.
@orchidhunter tomorrow will be better (or worse) I swear.
@justex07 @dws179 Or what Justin said, more eloquently than I. 🙂
@justex07 @dws179 Or what Justin said, more eloquently than I. 🙂
@orchidhunter @justex07 Ahha! Well it all makes sense now…
@orchidhunter @justex07 Ahha! Well it all makes sense now…
@orchidhunter @dws179 haha, yeah power outage took out one of their 5 servers, but system failed, other nodes should have picked up slack.
@orchidhunter @dws179 haha, yeah power outage took out one of their 5 servers, but system failed, other nodes should have picked up slack.
There is always a point of failure. Even if you use multiple data centers and multiple providers a dns issue can still take out everything. I think the best way to stack is to use a group of racks servers that you physically control and spread the load out to the cloud as you need to scale.
There is always a point of failure. Even if you use multiple data centers and multiple providers a dns issue can still take out everything. I think the best way to stack is to use a group of racks servers that you physically control and spread the load out to the cloud as you need to scale.
@TheDigitalNinja Of course there is always a point of failure. The key is to have a backup plan. First of all, Amazon isn’t doing what they promise right now. These sorts of outages (fourth I can think of in the last couple of years) shouldn’t happen to begin with . Their system is supposed to heal instantaneously, or as close to that as possible.
A power outage that takes out one of their five server clusters is huge, but there are still four running just fine.
Companies like Instagram should have a backup plan as well.
It’s like backing up your computer. I have a local backup that does hourly incremental backups of my computer. I also have an online backup that does the same. As important, there are multiple versions of these backups to prevent a failure that overwrites an otherwise good backup.
Scaling this is the issue but if a company, Instagram, is worth $1,000,000, they should have the funds to keep at least some function of their site always up and running. Maybe we can’t upload photos but we can still browse while we wait. I came across a few sites that didn’t even have their “We’re down” error pages loading haha.
@TheDigitalNinja Of course there is always a point of failure. The key is to have a backup plan. First of all, Amazon isn’t doing what they promise right now. These sorts of outages (fourth I can think of in the last couple of years) shouldn’t happen to begin with . Their system is supposed to heal instantaneously, or as close to that as possible.
A power outage that takes out one of their five server clusters is huge, but there are still four running just fine.
Companies like Instagram should have a backup plan as well.
It’s like backing up your computer. I have a local backup that does hourly incremental backups of my computer. I also have an online backup that does the same. As important, there are multiple versions of these backups to prevent a failure that overwrites an otherwise good backup.
Scaling this is the issue but if a company, Instagram, is worth $1,000,000, they should have the funds to keep at least some function of their site always up and running. Maybe we can’t upload photos but we can still browse while we wait. I came across a few sites that didn’t even have their “We’re down” error pages loading haha.
@justex07 @reginabeth thank you for clarifying!
@justex07 @reginabeth thank you for clarifying!
@KristieKenney @reginabeth thanks for sharing and happy to help! 🙂
@KristieKenney @reginabeth thanks for sharing and happy to help! 🙂
@nate_pawley thanks again for sharing!
@nate_pawley thanks again for sharing!
@justex07 no problem!
@justex07 no problem!
@nate_pawley Thanks for the info! I found it odd that all those sites were down. And how stupid of them to all be linked to one server.
@nate_pawley Thanks for the info! I found it odd that all those sites were down. And how stupid of them to all be linked to one server.
@StacieCordray you don’t realize it but many sites are linked to one server, reason because the server is so strong.
@StacieCordray you don’t realize it but many sites are linked to one server, reason because the server is so strong.
@justex07 This. Is. Why. I. Love. You. Haha always educating me!
@justex07 This. Is. Why. I. Love. You. Haha always educating me!
@justex07 Overload, overload. Self-distruct in 10..9..8..7..6..5..4..3..2..1..
@justex07 Overload, overload. Self-distruct in 10..9..8..7..6..5..4..3..2..1..
@JackPrinya haha, never have I ever been told I use limited “nerdspeak”. Perhaps this is a new avenue for me 😀
@JackPrinya haha, never have I ever been told I use limited “nerdspeak”. Perhaps this is a new avenue for me 😀
@drewmaniac thanks for sharing the link 😀
@drewmaniac thanks for sharing the link 😀
@thejustins86 awe shucks 😀
@thejustins86 awe shucks 😀
@justex07 Oh there were sprinkles of nerd jargon, but it was still easy to follow and made interesting points about life in the clouds.
@justex07 Oh there were sprinkles of nerd jargon, but it was still easy to follow and made interesting points about life in the clouds.
@justex07 you make it simple. So thanks.
@justex07 you make it simple. So thanks.
@thejustins86 That’s what a few other people are telling me. The post has been shared almost 100 times in like 12 hours! Makes me happy 😀
@thejustins86 That’s what a few other people are telling me. The post has been shared almost 100 times in like 12 hours! Makes me happy 😀
@justex07 I usually retweet at the very least. I also appreciate when you link me to answers in your blog. Lol
@justex07 I usually retweet at the very least. I also appreciate when you link me to answers in your blog. Lol
@thejustins86 My goal is to someday have answers to all of the world’s questions available right here lol. Like a valuable @formspring 😛
@thejustins86 My goal is to someday have answers to all of the world’s questions available right here lol. Like a valuable @formspring 😛
@thejustins86 speaking of which, @formspring was down last night too https://t.co/AnDGATMX
@thejustins86 speaking of which, @formspring was down last night too https://t.co/AnDGATMX
@justex07 @formspring haha oh you!
@justex07 @formspring haha oh you!
@motoridersd Thanks and thanks for sharing it 😀
@motoridersd Thanks and thanks for sharing it 😀
@TheNYGalavant @motoridersd well it’s certainly evidence that the systems we have in place now aren’t as rock-solid as we’d like to believe.
@TheNYGalavant @motoridersd well it’s certainly evidence that the systems we have in place now aren’t as rock-solid as we’d like to believe.
@justex07 No problem man! Enjoyed the post.
@justex07 No problem man! Enjoyed the post.
@drewmaniac thanks! I’ve been so busy with work that my blog has lagged. Hope to get back into the swing again.
@drewmaniac thanks! I’ve been so busy with work that my blog has lagged. Hope to get back into the swing again.
@mikeasaursrex91 @RolandoCFC yay thanks for sharing my post! 🙂
@mikeasaursrex91 @RolandoCFC yay thanks for sharing my post! 🙂
@justex07 yeah. No problem. I tried tag you. But then it was too long.
@justex07 yeah. No problem. I tried tag you. But then it was too long.
@justex07 Shared this on my Facebook page. Sorry about the link not showing up the first time. It was a Facebook error. Link added and error corrected.
@justex07 Shared this on my Facebook page. Sorry about the link not showing up the first time. It was a Facebook error. Link added and error corrected.
@reddy2go Thanks so much, glad it was fixed. Facebook can be tricky.
@reddy2go Thanks so much, glad it was fixed. Facebook can be tricky.
What’s disappointing is that even though Amazon has tons of servers all over the country, it only took one power outage to cripple their system. Thanks for sharing..
What’s disappointing is that even though Amazon has tons of servers all over the country, it only took one power outage to cripple their system. Thanks for sharing..
What’s disappointing is that even though Amazon has tons of servers all over the country, it only took one power outage to cripple their system. Thanks for sharing..
@tracey14 Exactly. Part of the reason that happened was because on that day, totally coincidentally, we added 1 second to the day. Literally, all atomic-clock set clocks added a second to the day. So when the outage happened and the servers reset, they were off by one second which kept the from actually syncing.
Talk about a perfect storm!