Richard Bucker

The perils of shutting down a complex system

Posted at — May 1, 2015

I suppose I could be talking about any complex system, however, in this instance I am referring to computers; both server and desktop. And there are so many adjacent questions:when do you begin the shutdown process?triggered by the usertriggered by a clock eventtriggered by an external APItriggered by a lack of power or sufficient power to “save” the environmenttriggered by meltdown avoidanceHow long do you wait for all of the child processes and their dependencies to terminate before signalling the hardware to power off?In some of these cases it’s obvious that the process should receive a signal and then save some state and terminate. ┬áBut what if saving state is not practical? In particular modern CAP-type databases are not ACID and may or may not be responsive or reliable after a shutdown.This is a complex topic and get expensive when you consider the costs of rebooting a mainframe or a cluster of computers. Companies like Stratus solve part of the problem by implementing redundant and highly available hardware and they are dependent on a reliable and stable operating system and userspace. It’s up to the programmers to maintain the reliability of the platform in their own software.So,persist your data ASAPmake recovery possible; thing snapshots and WALhonor system signalswhen restarting from a crash give the user an option to recover or recover based on a default behaviorGood luck