For best system security you want to apply kernel security patches ASAP. For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to gain root access. So even if a machine is not used for providing public shell access or any other task which involves giving user access to potential hostile people, having the kernel be secure is an important part of system security.
One thing that gets little consideration is the overall effect of applying security updates on overall uptime. Over the last year there have been 14 security related updates (I count a silent data loss along with security issues) to the main Debian Etch kernel package. Of those 14, it seems that if you don’t use DCCP, NAT for CIFS or SNMP, IA64, the dialout group, then you will only need to patch for issues 2, 3 (for SMP machines), 4, 5, 7 (sound drivers get loaded on all machines by default), 9, 10, 11, 12, 13, and 14.
This means 11 reboots a year for SMP machines and 10 a year for uni-processor machines. If a reboot takes three minutes (which is an optimistic assumption) then that would be 30 or 33 minutes of downtime a year due to kernel upgrades. In terms of uptime we talk about the number of “nines”, where the ideal is generally regarded as “five nines” or 99.999% uptime. 33 minutes of downtime a year for kernel upgrades means that you get 99.993% uptime (which is “four nines”). If a reboot takes six minutes (which is not uncommon for servers) then it’s 99.987% uptime (“thee nines”).
While it doesn’t seem likely to affect the number of “nines” you get, not using SMP has the potential to avoid future security issues. So it seems that when using a Xen (or other virtualisation technology) assigning only one CPU to the DomUs that don’t need any more could improve uptime for them.
For Xen Dom0’s which don’t have local users or daemons, don’t use DCCP, NAT for CIFS or SNMP, wireless, CIFS, JFFS2, PPPoE, bluetooth, H.323 or SCTP connection tracking, then only issue 11 applies. However for “five nines” you need to have 5 minutes of downtime a year or less. It seems unlikely that a busy Xen server can be rebooted in 5 minutes as all the DomUs need to have their memory saved to disk (writing out the data to disk and reading it back in after a reboot will probably take at least a couple of minutes) or they need to be shutdown and booted again after the Dom0 is rebooted (which is a good procedure if the security fix affects both Dom0 and DomU use), and such shutdowns and reboots of DomU’s will take a lot of time.
Based on the past year, it seems that a system running as a basic server might get “four nines” if configured for a fast boot (it’s surprising that no-one seems to be talking about recent improvements to the speed of booting as high-availability features) and if the boot is slower then you are looking at “three nines”. For a Xen server unless you have some sort of cluster it seems that “five nines” is unattainable due to reboot times if there is one issue a year, but “four nines” should be easy to get.
Now while the 14 issues over the last year for the kernel seems likely to be a pattern that will continue, the one issue which affects Xen may not be representative (small numbers are not statistically significant). I feel confident in predicting a need for between 5 and 20 kernel updates next year due to kernel security issues, but I would not be prepared to bet on whether the number of issues affecting Xen will be 0, 1, or 4 (it seems unlikely that there would be 5 or more).
I will write a future post about some strategies for mitigating these issues.
Here is my summary of the Debian kernel linux-image-2.6.18-6-686 (Etch kernel) security updates according to it’s changelog, they are not in chronological order, it’s the order of the changelog file:
- 05 Jun 2008: CVE-2008-2358 for DCCP and CVE-2008-1673 for ASN.1 (NAT for CIFS and SNMP).
- 23 May 2008: CVE-2008-2136 memory leak in IPv6 over IPv4 tunnels, CVE-2007-6712 timer related bugs, CVE-2008-1615 ptrace on AMD64 architecture, and CVE-2008-2137 “Validate address ranges regardless of MAP_FIXED”.
- 07 May 2008: CVE-2008-1669 SMP race
- 11 Apr 2008: CVE-2007-6694 PPC only, CVE-2008-0007 Add VM_DONTEXPAND to vm_flags in drivers that register a fault handler but do not bounds check the offset argument, CVE-2008-1294 prevent user escape from RLIMIT_CPU, and CVE-2008-1375 fix dnotify race.
- 10 Feb 2008: CVE-2008-0010 and CVE-2008-0600 Fix missing access check in vmsplice.
- 25 Jan 2008: Not a security issue, but silent data loss on IA64.
- 22 Jan 2008: CVE-2007-6151 ISDN memory overrun, CVE-2008-0001 something related to checking the access to a directory, CVE-2007-2878 FAT filesystem related, CVE-2007-4571 ALSA bug that allows user to read kernel memory.
- 17 Sep 2007: Fix minor DOS attack for slightly privileged users (EG members of dialout group).
- 18 Dec 2007: CVE-2007-6063 overflows in ISDN subsystem, CVE-2007-6206 core dumping over an existing file can get the wrong ownership (should be possible to use kernel.core_pattern to work around this ), CVE-2007-5966 timer issue, CVE-2006-6058 Minix fs DOS attack via corrupted fs, and CVE-2007-6417 tmpfs memory leak.
- 29 Nov 2007: CVE-2007-3104 local kernel DOS attack (Oops), CVE-2007-4997 malicious frame on wireless interface crashes system, CVE-2007-5500 potential system hang, and CVE-2007-5904 CIFS overflows from server sending corrupt data.
- 02 Oct 2007: CVE-2007-4573 Xen 64bit with 32bit DomU, CVE-2006-5755 Xen, CVE-2007-4133 memory management DOS, and CVE-2007-5093 DOS when unplugging a webcam that is in use.
- 25 Sep 2007: CVE-2007-3731 ptrace causing Oops, CVE-2007-3739 memory management Oops, CVE-2007-3740 CIFS not honoring umask, CVE-2007-4573 ptrace of 32bit process on AMD64 bug, and CVE-2007-4849 JFFS2 (flash media) filesystem bug.
- 27 Aug 2007: CVE-2007-2172 IPv4 memory related issue (local DOS or compromise?), CVE-2007-2875 local user can read kernel memory if cpuset filesystem is mounted, CVE-2007-3105 buffer overflow in random number generator, CVE-2007-3843 CIFS, and CVE-2007-4308 AAC RAID.
- 11 Aug 2007: CVE-2007-1353 bluetooth, CVE-2007-3513 usblcd, CVE-2007-2525 PPPoE, CVE-2007-3642 H.323 connection tracking, CVE-2007-2172 IPv4 local exploit, CVE-2007-2453 slightly less random numbers, CVE-2007-2876 SCTP connection tracking, CVE-2007-3851 i965 batch buffer usage, and CVE-2007-3848 potential privilege escalation.