I suppose I could be talking about any complex system, however, in this instance I am referring to computers; both server and desktop. And there are so many adjacent questions:
- when do you begin the shutdown process?
- triggered by the user
- triggered by a clock event
- triggered by an external API
- triggered by a lack of power or sufficient power to “save” the environment
- triggered by meltdown avoidance
- How long do you wait for all of the child processes and their dependencies to terminate before signalling the hardware to power off?
In some of these cases it’s obvious that the process should receive a signal and then save some state and terminate. But what if saving state is not practical? In particular modern CAP-type databases are not ACID and may or may not be responsive or reliable after a shutdown.
This is a complex topic and get expensive when you consider the costs of rebooting a mainframe or a cluster of computers. Companies like Stratus solve part of the problem by implementing redundant and highly available hardware and they are dependent on a reliable and stable operating system and userspace. It’s up to the programmers to maintain the reliability of the platform in their own software.
- persist your data ASAP
- make recovery possible; thing snapshots and WAL
- honor system signals
- when restarting from a crash give the user an option to recover or recover based on a default behavior