Wednesday, April 14, 2010

How often should I reboot Linux servers?

This Question sometimes raise a controversy whether or NOT i need to reboot my Linux server on a regular basis. I do agree that,Linux servers never need to be rebooted unless you absolutely need to change the running kernel version. Linux memory handling is very good and linux works in a modular order. Most updates even do not require a reboot, but Kernel updates do (you can't really replace the running kernel without rebooting!).

As far my understanding goes, one might want to reboot the linux server in one of the following scenarios:

1/ Upgrade or change the current Kernel version.
2/ Critical system library upgrade NOT behaving as expected.
3/ As one of DR plan, to make sure the server and all services come back as expected.
4/ Suspect for messed up settings file. This also adds to the "risk of downtime" when rebooting infrequently(like,
5/ Of course! When your system is in a hang state. May be an OOM issue or some new unstable library causing it to behave odd.
6/ Physical movement of the box.
7/ Tier-1/2 DC maintenance where not enough redundant power supply in case of emergency.
8/ Critical firmware upgrade. or hardware maintenance.
9/ Linux may handle its memory OK, but individual applications may not - their heaps could become fragmented if they run for a longer time.

Personally I prefer to reboot on a monthly cycle during a maintenance window to make sure the server and all services come back as expected. This way I can be reasonably certain if I have to do an out of schedule reboot (i.e. critical kernel update) that the system will come back up properly.

Its not a bad idea to reboot if it has been that long so you can run a disk check ( fsck ) on the root partition. Doing so can help you to be sure of data integrity. We also would want to do native disk checks (FSCK) by doing such regular reboots, and this will definitly reduce the the time it takes to get back up and running next time. So, its a good idea to anticipate this before hand and plan for it.

We also  have also discovered that config changes can sometimes get missed from one or the other server, (such as adding new multipath conguration/iptables rules, etc) and this does not get noticed until such reboot s are performed. This actually adds to the "risk of downtime" when we don't at all reboot our servers. And imagine how the  hardware and software failures will manifest themselves in such scenarios; and only when we reboot we can  find out and create a scope of proactive sanity check when we are  planning a reboot instead of living in the fear of an unplanned outage.

The proper method of rebooting a Linux system ensures data integrity by terminating processes and synchronizing the file systems. So a better and safe plan would be to get an approved downtime and  reboot our servers periodically - or if this is not a desirable option, we can set up our servers up in clusters so that reboots can be done if necessary without any downtime to our applications.

Although everyone knows how to reboot a linux server, still a few words on that -
Reboot and halt DO NOT SAFELY shutdown a system. reboot is a symlink to halt and halt nukes the platform. shutdown -rn now is how I usually reboot a machine of mine. shutdown -r by it's self will sync the disks, -n forces this (I like to be on the safe side).

Cheers! Happy Uptime!! Enjoy Runtime!!!

No comments:

Post a Comment

CICD Assessment

( A story on -  how we formulated a process around measuring and achieving CICD, and  how these journey lines has now become a means t...