If you’re complaining about the outage yesterday then you’re an id10t. I just read the byline on TechCrunch and the author and editors clearly do not understand the options and why the occasional cloud infrastructure outage is simply not that big of a deal.
Simply put you get what you pay for and diversity is king.
First of all there is nothing capable of 100% availability. Just look at all of the loopholes in the six sigma
specification. Second the more reliable or the more clicks you want to capture closest to an “event” the more it will cost to reconcile those last events; Just as your creditcard’s provider. And more of that cost will have to be passed on to the customer.
Sure, it sucks when a system goes down and even worse when it’s a system you or great number of people rely on. But your transaction is not any more or less important than anyone else’s… And this is not a call to move your services in-house. If these systems were in-house [a] would you be able to prevent this from happening? [b] could you detect it any faster? [c] wold you be able to resolve it any faster? [d] what do you tell your customers?
[a] would you be able to prevent this from happening? – It just depends on how much you want to spend waiting for the 1000 year event or is the 500 year even good enough. And the solution may be weaker letting it ride.
[b] could you detect it any faster? – unlikely. Cloud providers continue to instrument their systems and they are looking or the 1000 year events. Big outages have big costs in terms of reputation, ability to raise prices, restore customer faith.
[c] would you be able to resolve it any faster? – not unless you started hiring world class SREs and they do not come cheap.
[d] what do you tell your customers? – if it’s your hardware you have to dance and if it’s the cloud provider (providing it’s reputable) then you get to blame someone else and you can justify it based on cost to the customer.
Interestingly… in the creditcard business there is an agreement between mastercard and visa that if one or the other has a systemwide outage that they can rely on the other to carry their transactions.
Also, consider that after years of Microsoft Blue Screen Of Death and so may viruses that people still buy, install and upgrade Microsoft products.