Richard Bucker

Zero Downtime? WTF!

Posted at — May 1, 2013

When someone says that they want zero downtime; what is it that they really want? An absolute 100% zero downtime is absolutely impossible unless the downstate is the upstate or you live in Bizaro World. Zero Downtime should cover:facilities (the building)networkpowerserversservices (database; DNS; NTP; web servers, etc…)applications (userspace applications or REST services)monitoring toolssupport staffversions, versions, versions…What most sales people actually mean is completely different.no perceived change in service by any of the end-usersWhat a systems manager actually means is completely different.database schemadatabase stored proceduresapplication binding to the database - in particular strong binding through shared secretsrolling service availabilityWhat most managers forget:Zero downtime is hard as evidenced by modern H.A. solutionscomes with a monster price tagThe recovery model is based on tight coupling of componentssystems are typically master->slave with one-way artifact promotioneffective strategizing requires very detailed specific domain knowledgeClearly perception is king. previously I wrote about hot-plugging code in Erlang. It’s the worst thing you can do in a credit card, financial or any system with a high transaction consistency requirement.