Linux, politics, and other interesting things
For best system security you want to apply kernel security patches ASAP. For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to gain root access. So even if a machine is not used for providing public shell access or any other task which involves giving user access to potential hostile people, having the kernel be secure is an important part of system security.
One thing that gets little consideration is the overall effect of applying security updates on overall uptime. Over the last year there have been 14 security related updates (I count a silent data loss along with security issues) to the main Debian Etch kernel package. Of those 14, it seems that if you don’t use DCCP, NAT for CIFS or SNMP, IA64, the dialout group, then you will only need to patch for issues 2, 3 (for SMP machines), 4, 5, 7 (sound drivers get loaded on all machines by default), 9, 10, 11, 12, 13, and 14.
This means 11 reboots a year for SMP machines and 10 a year for uni-processor machines. If a reboot takes three minutes (which is an optimistic assumption) then that would be 30 or 33 minutes of downtime a year due to kernel upgrades. In terms of uptime we talk about the number of “nines”, where the ideal is generally regarded as “five nines” or 99.999% uptime. 33 minutes of downtime a year for kernel upgrades means that you get 99.993% uptime (which is “four nines”). If a reboot takes six minutes (which is not uncommon for servers) then it’s 99.987% uptime (“thee nines”).
While it doesn’t seem likely to affect the number of “nines” you get, not using SMP has the potential to avoid future security issues. So it seems that when using a Xen (or other virtualisation technology) assigning only one CPU to the DomUs that don’t need any more could improve uptime for them.
For Xen Dom0’s which don’t have local users or daemons, don’t use DCCP, NAT for CIFS or SNMP, wireless, CIFS, JFFS2, PPPoE, bluetooth, H.323 or SCTP connection tracking, then only issue 11 applies. However for “five nines” you need to have 5 minutes of downtime a year or less. It seems unlikely that a busy Xen server can be rebooted in 5 minutes as all the DomUs need to have their memory saved to disk (writing out the data to disk and reading it back in after a reboot will probably take at least a couple of minutes) or they need to be shutdown and booted again after the Dom0 is rebooted (which is a good procedure if the security fix affects both Dom0 and DomU use), and such shutdowns and reboots of DomU’s will take a lot of time.
Based on the past year, it seems that a system running as a basic server might get “four nines” if configured for a fast boot (it’s surprising that no-one seems to be talking about recent improvements to the speed of booting as high-availability features) and if the boot is slower then you are looking at “three nines”. For a Xen server unless you have some sort of cluster it seems that “five nines” is unattainable due to reboot times if there is one issue a year, but “four nines” should be easy to get.
Now while the 14 issues over the last year for the kernel seems likely to be a pattern that will continue, the one issue which affects Xen may not be representative (small numbers are not statistically significant). I feel confident in predicting a need for between 5 and 20 kernel updates next year due to kernel security issues, but I would not be prepared to bet on whether the number of issues affecting Xen will be 0, 1, or 4 (it seems unlikely that there would be 5 or more).
I will write a future post about some strategies for mitigating these issues.
Here is my summary of the Debian kernel linux-image-2.6.18-6-686 (Etch kernel) security updates according to it’s changelog, they are not in chronological order, it’s the order of the changelog file: