5

The Cost of Owning a Car

There has been a lot of talk recently about the cost of petrol, Colin Charles is one of the few people to consider the issue of wages in this discussion [1]. Unfortunately almost no-one seems to consider the overall cost of running a vehicle.

While I can’t get the figures for Malaysia (I expect Colin will do that) I can get them for Australia. First I chose a car that’s cheap to buy, reasonably fuel efficient (small) and common (cheap parts from the wreckers) – the Toyota Corolla seemed like a good option.
Continue reading

7

ECC RAM is more useful than RAID

A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code [0]) RAM is only a server feature.

The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks for one person. Therefore when purchasing a desktop machine you can decide how much you are willing to spend for the safety and continuity of your work. For a server it’s more difficult as everyone has a different idea of how reliable a server should be in terms of uptime and in terms of data security. When running a server for a business there is the additional issue of customer confidence. If a server goes down occasionally customers start wondering what else might be wrong and considering whether they should trust their credit card details to the online ordering system.

So it is obviously apparent that servers need a different degree of reliability – and it’s easy to justify spending the money.

Desktop machines also need reliability, more so than most people expect. In a business when a desktop machine crashes it wastes employee time. If a crash wastes an hour (which is not unlikely given that previously saved work may need to be re-checked) then it can easily cost the business $100 (the value of the other work that the employee might have done). Two such crashes per week could cost the business as much as $8000 per year. The price difference between a typical desktop machine and a low-end workstation (or deskside server) is considerably less than that (when I investigated the prices almost a year ago desktop machines with server features ranged in price from $800 to $2400 [1]).

Some machines in a home environment need significant reliability. For example when students are completing high-school their assignments have a lot of time invested in them. Losing an assignment due to a computer problem shortly before it’s due in could impact their ability to get a place in the university course that they most desire! Then there is also data which is irreplaceable, one example I heard of was of a woman who’s computer had a factory pre-load of Windows, during a storm the machine rebooted and reinstalled itself to the factory defaults – wiping several years of baby photos… In both cases better backups would mostly solve the problem.

For business use the common scenario is to have file servers storing all data and have very little data stored on the PC (ideally have no data on the PC). In this case a disk error would not lose any data (unless the swap space was corrupted and something important was paged out when the disk failed). For home use the backup requirements are quite small. If a student is working on an important assignment then they can back it up to removable media whenever they reach a milestone. Probably the best protection against disk errors destroying assignments would be a bulk purchase of USB flash storage sticks.

Disk errors are usually easy to detect. Most errors are in the form of data which can not be read back, when that happens the OS will give an error message to the user explaining what happened. Then if you have good backups you revert to them and hope that you didn’t lose too much work in the mean-time (you also hope that your backups are actually readable – but that’s another issue). The less common errors are lost-writes – where the OS writes data to disk but the disk doesn’t store it. This is a little more difficult to discover as the drive will return bad data (maybe an old version of the file data or maybe data from a different file) and claim it to be good.

The general idea nowadays is that a filesystem should check the consistency of the data it returns. Two new filesystems, ZFS from Sun [2] and BTRFS from Oracle [3] implement checksums of data stored on disk. ZFS is apparently production ready while BTRFS is apparently not nearly ready. I expect that from now on whenever anyone designs a filesystem for anything but the smallest machines (EG PDAs and phones) they will include data integrity mechanisms in the design.

I believe that once such features become commonly used the need for RAID on low-end systems will dramatically decrease. A combination of good backups and knowing when your live data is corrupted will often be a good substitute for preserving the integrity of the live data. Not that RAID will necessarily protect your data – with most RAID configurations if a hard disk returns bad data and claims it to be good (the case of lost writes) then the system will not read data from other disks for checksum validation and the bad data will be accepted.

It’s easy to compute checksums of important files and verify them later. One simple way of doing so is to compress the files, every file compression program that I’ve seen has some degree of error detection.

Now the real problem with RAM which lacks ECC is that it can lose data without the user knowing. There is no possibility of software checks because any software which checks for data integrity could itself be mislead by memory errors. I once had a machine which experienced filesystem corruption on occasion, eventually I discovered that it had a memory error (memtest86+ reported a problem). I will never know whether some data was corrupted on disk because of this. Sifting through a large amount of stored data for some files which may have been corrupted due to memory errors is almost impossible. Especially when there was a period of weeks of unreliable operation of the machine in question.

Checking the integrity of file data by using the verify option of a file compression utility, fsck on a filesystem that stores checksums on data, or any of the other methods is not difficult.

I have a lot of important data on machines that don’t have ECC. One reason is that machines which have ECC cost more and have other trade-offs (more expensive parts, more noise, more electricity use, and the small supply makes it difficult to get good deals). Another is that there appear to be no laptops which support ECC (I use a laptop for most of my work). On the other hand RAID is very cheap and simple to implement, just buy a second hard disk and install software RAID – I think that all modern OSs support RAID as a standard installation option. So in spite of the fact that RAID does less good than a combination of ECC RAM and good backups (which are necessary even if you have RAID), it’s going to remain more popular in high-end desktop systems for a long time.

The next development that seems interesting is the large portion of the PC market which is designed not to have the space for more than one hard disk. Such compact machines (known as Small Form Factor or SFF) could easily be designed to support ECC RAM. Hopefully the PC companies will add reliability features in one area while removing them in another.

10

SE Linux Support in GPG

In May 2002 I had an idea for securing access to GNUPG [1]. What I did was to write SE Linux policy to only permit the gpg program to access the secret key (and other files in ~/.gnupg). This meant that the most trivial ways of stealing the secret key would be prevented. However an attacker could still use gpg to encrypt it’s secret key and write the data to some place that is accessible, for example the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg“. So what we needed was for gpg to either refuse to encrypt such files, or to spawn a child process for accessing such files (which could be granted different access to the filesystem). I filed the Debian bug report 146345 [2] requesting this feature.

In March upstream added this feature, the Debian package is currently not built with --enable-selinux-support so this feature isn’t enabled yet, but hopefully it will be soon. Incidentally the feature as currently implemented is not really SE Linux specific, it seems to me that there are many potential situations where it could be useful without SE Linux. For example if you were using one of the path-name based MAC systems (which I dislike – see what my friend Joshua Brindle wrote about them for an explanation [3]) then you could gain some benefits from this. A situation where there is even smaller potential for benefit is in the case of an automated system which runs gpg which could allow an attacker to pass bogus commands to it. When exploiting a shell script it might be easier to specify the wrong file to encrypt than to perform more sophisticated attacks.

When the feature in question is enabled the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg” will abort with the following error:
gpg: can’t open `/root/.gnupg/secring.gpg’: Operation not permitted
gpg: symmetric encryption of `/root/.gnupg/secring.gpg’ failed: file open error

Of course the command “gpg --export-secret-keys” will abort with the following error:
gpg: exporting secret keys not allowed
gpg: WARNING: nothing exported

Now we need to determine the correct way of exporting secret keys and modifying the GPG configuration. It might be best to allow exporting the secret keys when not running SE Linux (or other supported MAC systems), or when running in permissive mode (as in those situations merely copying the files will work). Although we could have an option in gpg.conf for this for the case where we want to prevent shell-script quoting hacks.

For editing the gpg.conf file and exporting the secret keys we could have a program similar in concept to crontab(1) which has PAM support to determine when it should perform it’s actions. Also it seems to me that crontab(1) could do with PAM support (I’ve filed Debian bug report 484743 [4] requesting this).

Finally one thing that should be noted is that the targeted policy for SE Linux does not restrict GPG (which runs in the unconfined_t domain). Thus most people who use SE Linux at the moment aren’t getting any benefits from such things. This will change eventually.

6

The Date Command and Seconds Since 1970-01-01

The man page for the date command says that the %s option will give “seconds since 1970-01-01 00:00:00 UTC“. I had expected that everything that date did would give output in my time zone unless I requested otherwise.. But it seems that in this case the result is in UTC, and the same seems to be also true for most programs that log dates with the number of seconds.

In a quick google search for how to use a shell script to convert from the number of seconds since 1970-01-01 00:00:00 UTC to a more human readable format the only remotely useful result I found was the command date -d "1970-01-01 1212642879 sec", which in my timezone gives an error of ten hours (at the moment – it would give eleven hours in summer). The correct method is date -d "1970-01-01 1212642879 sec utc", you can verify this with the command date -d "1970-01-01 $(date +%s) sec utc" (which should give the same result as date).

18

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

10

CPU Capacity for Virtualisation

Today a client asked me to advise him on how to dramatically reduce the number of servers for his business. He needs to go from 18 active servers to 4. Some of the machines in the network are redundant servers. By reducing some of the redundancy I can remove four servers, so now it’s a need to go from 14 to 4.

To determine the hardware requirements I analyzed the sar output from all machines. The last 10 days of data were available, so I took the highest daily average numbers from each machine for user and system CPU load and added them up, the result was 221%. So for the average daily CPU use three servers would have enough power to run the entire network. Then I looked at the highest 5 minute averages for user and system CPU load from each machine which add up to 582%. So if all machines were to have their peak usage times simultaneously (which doesn’t happen) then the CPU power of six machines would be needed. I conclude that the CPU power requirements are somewhere between 3 and 6 machines, so 4 machines may do an OK job.

The next issue is IO capacity. The current network has 2G of RAM in each machine and I plan to run it all on 4G Xen servers, so it’s a total of 16G of RAM instead of 36G. While some machines currently have unused memory I expect that the end result of this decrease in total RAM will be more cache misses and more swapping so the total IO capacity use will increase slightly. Now four of the servers (which will eventually become Xen Dom0’s) have significant IO capacity (large RAIDs – they appear to have 10*72G disks in a RAID-5) and the rest have a smaller IO capacity (they appear to have 4*72G disks in a RAID-10). The other 14 machines have the highest daily averages for iowait adding up to 9% and the highest 5 minute averages adding up to 105%. I hope that spreading that 105% of the IO capacity of a 4 DISK RAID-10 across four sets of 10 disk RAID-5’s won’t give overly bad performance.

I am concerned that there may be some flaw in the methodology that I am using to estimate capacity. One issue is that I’m very doubtful about the utility of measuring iowait, one issue is that iowait is the amount of IDLE CPU time when there are processes blocked on IO. So if for example you have 100% CPU time being used then iowait will be zero regardless of how much disk IO is in progress! One check that I performed was to add the maximum CPU time used, the maximum iowait, and the minimum IDLE time. Most machines gave totals that were very close to 100% when those columns were added, so it seems that if the maximum iowait for a 5 minute period plus the maximum CPU use plus the minimum idle time add up to 100% and the minimum idle time was not very low then it seems unlikely that there was any significant overlap between disk IO and CPU use to hide iowait. One machine had a total of 147% for those fields in the 5 minute average which suggests that the IO load may be higher than the 66% iowait number may indicate. But if I put that in a DomU on the machine with the most unused IO capacity then it should be OK.

I will be interested to read any suggestions for how to proceed with this. But unfortunately it will probably be impossible to consider any suggestion which involves extra hardware or abandoning the plan due to excessive risk…

I will write about the results.

5

Installing a Red Hat based DomU on a Debian Dom0

The first step is to copy /images/xen/vmlinuz and /images/xen/initrd.img from the Fedora (or RHEL or CentOS) DVD somewhere convenient, I use /boot/OS/ (where OS is the name of the image) but other locations will do.

Now choose a suitable Ethernet MAC address for the interface (see my previous post on how I choose them [1]).

Create a temporary block device for the install, I use /dev/VG0/OS-install (where OS is replaced by the name of the distribution, “f8” or “cent5“), it’s a logical volume in an LVM volume group named VG0. The device should be at least 2G in size for a basic Fedora install (512M for swap, 1G for files, and 512M free after the install). It is of course possible to use DOS partitions for the Xen block devices, but this would be unreasonably difficult to manage. An option for people who don’t like LVM would be to use files on an XFS filesystem (Ext3 performs poorly when creating and removing large files).

When configuring Xen on Debian systems I generally use /dev/hda type device names. The device name seems quite arbitrary and /dev/hda is a familiar name for hard drives that many people have been used to for 15+ years. But the Fedora install process doesn’t like it and I’m forced to use /dev/xvda etc.

I often install Fedora on a machine that only has 256M of RAM spare for the DomU. For recent versions of Fedora 256M of RAM is about the minimum for an install at the best of times, and a HTTP install takes even more because the root filesystem used for the install is copied via HTTP and stored in a RAM disk. It might be possible to use less RAM with a CD or DVD install or even a NFS install, but I couldn’t get CD/DVD installation working and I generally don’t give Xen DomU’s NFS access if I can avoid it. So I had to create a swap space (an attempt to do an install with 256M of RAM and no swap aborted when installing the kernel package). I expect that most serious use of Xen will have 256M of RAM or less for the DomU, part of the problem here is that Xen allocates RAM not virtual memory. VMWare allocates virtual memory so the total memory for virtual machines can be greater than physical RAM and thus this problem will be less common with VMWare.

I believe that the best way of configuring virtual machine images is to have the virtual machine manager (Xen in this case) provide block devices to the virtual machine and have the virtual machine implement no partitioning (no LVM or anything equivalent). The main reason is that DOS partition tables and LVM configuration on a block device used by Xen can not be used easily in the host environment (the Dom0 for Xen). I am not aware of how to access DOS partition tables (although I’m sure it’s possible somehow) and while LVM can be used it’s a bad idea due to the fact that there is no way to deactivate a LVM volume group that is active, and the fact that there is no support for having multiple volume groups of the same name. The lack of support for multiple volume groups of the same name is a reasonable limitation, but an insurmountable problem when using a virtual machine environment. It’s quite reasonable to create several cloned instances of a virtual machine and renaming an LVM volume group would require more changes inside the virtual machine than you would want. Also using snap-shots of old versions of the virtual machine data is difficult if the same volume group name is used.

So for ease of management I want to have filesystems on block devices (such as /dev/xvda) instead of partitions (such as /dev/xvda1). Unfortunately Anaconda (the Fedora installer) doesn’t support this. So I had to do the initial install with DOS partitions and then fix it afterwards. So use the manual option and create a primary partition for the root filesystem and then create a non-primary partition for swap (when using small amounts of RAM such as 256M) so that swap can be used during the install. The root filesystem needs to be at the start of the disk to make it easier to sort this out later.

After installing Fedora and shutting the virtual machine down the next step is to copy the block device to the desired configuration (filesystem on an unpartitioned device). If the root filesystem is the first partition then the first 63 sectors will be the partition table and reserved space so dd can be used to copy the data with the following commands:

dd if=/dev/VG0/OS-install of=/dev/VG0/OS bs=512 skip=63
e2fsck -f /dev/VG0/OS
resize2fs /dev/VG0/OS

The next step is to mount the device /dev/VG0/OS in the Dom0 to change /etc/fstab, I use /dev/xvda for the root device and /dev/xvdb for swap.

Now to remove the cruft:
Avahi is a network service discovery system, mainly used for laptops and isn’t needed on a server, it is installed by default on all recent Fedora, RHEL and CentOS releases but it is not useful in a DomU (any unused network service is a security risk if you don’t disable or remove it). Smartmontools is for detecting impending failure of a hard drive and does not do any good when using a virtual block device (you run it on the Dom0). It might be considered a bug that smartd doesn’t exit on startup when it sees a device such as /dev/xvda. The pcsc-lite package for managing smart cards is of no use to me and all the other people who don’t own readers for smart-cards, and it can therefore be removed. Bluetooth networking support (in the package bluez-utils) is also only usable in a Dom0 (AFAIK), and the only bluetooth device I own is my mobile phone so I can’t use it on my computer. The command “yum remove avahi smartmontools pcsc-lite bluez-utils” removes them.

For almost all of my DomU’s I don’t use NFS or do any printing, so I remove the packages related to them. Also autofs is in most cases only useful for servers when mounting NFS filesystems. I remove them with the command “yum remove nfs-utils portmap cups autofs“.

The GPM daemon (which supports cut/paste operations with a mouse on virtual consoles) is of no use on a Xen DomU, unfortunately the vim-enhanced package depends on it. I could just disable the daemon, but as I like to run small images I remove it with “yum remove gpm“. I may have to reinstall it on some images as some of my clients like the extra VIM functionality.

It’s unfortunate that debootstrap doesn’t work on CentOS (and presumably Fedora) so installing a Debian DomU on a CentOS/Fedora Dom0 requires creating an image on a Debian machine or downloading an image from www.jailtime.org .

Sample Xen Config for the install:

kernel = “/boot/OS/vmlinuz”
ramdisk = “/boot/OS/initrd.img”
memory = 256
name = “OS”
vif = [ ‘mac=00:16:3e:66:66:68, bridge=xenbr0’ ]
disk = [ ‘phy:/dev/VG0/OS-install,xvda,w’ ]
extra=”askmethod text”

Sample Xen Config for operation:

kernel = “/boot/cent5/vmlinuz-2.6.18-53.el5xen”
ramdisk = “/boot/cent5/initrd-2.6.18-53.el5xen.img”
memory = 256
name = “cent5”
vif = [ ‘mac=00:16:3e:66:66:68, bridge=xenbr0’ ]
disk = [ ‘phy:/dev/VG0/cent5,xvda,w’, ‘phy:/dev/VG0/cent5-swap,xvdb,w’ ]
root = “/dev/xvda ro”

5

Security Flaws in Free Software

I just wrote about the system administration issues related to the recent Debian SSL/SSH security flaw [1]. The next thing we need to consider is how we can change things to reduce the incidence of such problems.

The problem we just had was due to the most important part of the entropy supply for the random number generator not being used due to a mistake in commenting out some code. The only entropy that was used was the PID number of the process which uses the SSL library code which gives us 15 bits of entropy. It seems to me that if we had zero bits of entropy the problem would have been discovered a lot sooner (almost certainly before the code was released in a stable version). Therefore it seems that using a second-rate source of entropy (which was never required) masked the problem that the primary source of entropy was not working. Would it make sense to have a practice of not using such second-rate sources of entropy to reduce the risk of such problems being undetected for any length of time? Is this a general issue or just a corner case?

Joss makes some suggestions for process improvements [2]. He suggests that having a single .diff.gz file (the traditional method for maintaining Debian packages) that directly contains all patches can obscure some patches. The other extreme is when you have a patch management system with several dozen small patches and the reader has to try and discover what each of them does. For an example of this see the 43 patches which are included in the Debian PAM package for Etch, also note that the PAM system is comprised of many separate shared objects (modules), this means that the patching system lends itself to having one patch per module and thus 43 patches for PAM isn’t as difficult to manage as 43 patches for a complex package which is not comprised of multiple separate components might be. That said I think that there is some potential for separating out patches. Having a distinction between different types of patches might help. For example we could have a patch for Makefiles etc (including autoconf etc), a patch for adding features, and a patch for fixing bugs. Then people reviewing the source for potential bugs could pay a lot of attention to bug fixes, a moderate amount of attention to new features, and casually skim the Makefile stuff.

The problem began with this mailing list discussion [3]. Kurt’s first message starts with “When debbuging applications” and ends with “What do you people think about removing those 2 lines of code?“. The reply he received from Ulf (a member of the OpenSSL development team) is “If it helps with debugging, I’m in favor of removing them“. It seems to me that there might have been a miscommunication there, Ulf may have believed that the discussion only concerned a debugging built and not a build that would eventually end up on millions of machines.

It seems possible that the reaction would have been different if Kurt had mentioned that he wanted to have a single source tree for both debugging and for regular use. It also seems likely that his proposed change may have received more inspection if he had clearly stated that he was doing to include it in Debian where it would be used by millions of people. When I am doing Debian development I generally don’t mention all the time “this code will be used by millions of people so it’s important that we get it right“, although I do sometimes make such statements if I feel that my questions are not getting the amount of consideration from upstream that a binary package destined for use by millions of people deserves. Maybe it would be a good practice to clarify such things in the case of important packages. For a package that is relevant to the security of the entire distribution (and possibly to other machines around the net – as demonstrated in this case) it doesn’t seem unreasonable to include a post-script mentioning the scope of the code use (it could be done with an alternate SIG if using a MUA that allows selecting from multiple SIGs in a convenient manner).

In the response from the OpenSSL upstream [4] it is claimed that the mailing list used was not the correct one. Branden points out that the openssl-team mailing list address seems not to be documented anywhere [5]. One thing to be learned from this is that distribution developers need to be proactive in making contact with upstream developers. You might think that building packages for a major distribution and asking questions about it on the mailing list would result in someone from the team noticing and mentioning any other things that you might need to do. But maybe it would make sense to send private mail to one of the core developers, introduce yourself, and ask for advice on the best way to manage communication to avoid this type of confusion.

I think that it is ideal for distribution developers to have the phone numbers of some of the upstream developers. If the upstream work is sponsored by the employer of one of the upstream developers then it seems reasonable to ask for their office phone number. Sometimes it’s easier to sort things out by phone than by email.

Gunnar Wolf describes how the way this bug was discovered and handled shows that the Debian processes work [6]. A similar bug in proprietary software would probably not be discovered nearly as quickly and would almost certainly not be fixed in such a responsible manner.

Update: According to the OpenSSL project about page [7], Ulf is actually not a “core” member, just a team member. I had used the term “core” in a slang manner based on the fact that Ulf has an official @openssl.org email address.

14

Debian SSH Problems

It has recently been announced that Debian had a serious bug in the OpenSSL code [1], the most visible affect of this is compromising SSH keys – but it can also affect VPN and HTTPS keys. Erich Schubert was one of the first people to point out the true horror of the problem, only 2^15 different keys can be created [2]. It should not be difficult for an attacker to generate 2^15 host keys to try all combinations for decrypting a login session. It should also be possible to make up to 2^15 attempts to login to a session remotely if an attacker believes that an authorized key was being used – that would take less than an hour at a rate of 10 attempts per second (which is possible with modern net connections) and could be done in a day if the server was connected to the net by a modem.

John Goerzen has some insightful thoughts about the issue [3]. I recommend reading his post. One point he makes is that the person who made the mistake in question should not be lynched. One thing I think we should keep in mind is the fact that people tend to be more careful after they have made mistakes, I expect that anyone who makes a mistake in such a public way which impacts so many people will be very careful for a long time…

Steinar H. Gunderson analyses the maths in relation to DSA keys, it seems that if a DSA key is ever used with a bad RNG then it can be cracked by someone who sniffs the network [4]. It seems that it is safest to just not use DSA to avoid this risk. Another issue is that if a client supports multiple host keys (ssh version 2 can use three different key types, one for the ssh1 protocol, one for ssh2 with RSA, and one for ssh2 with DSA) then a man in the middle attack can be implemented by forcing a client to use a different key type – see Stealth’s article in Phrack for the details [5]. So it seems that we should remove support for anything other than SSHv2 with RSA keys.

To remove such support from the ssh server edit /etc/ssh/sshd_config and make sure it has a line with “Protocol 2“, and that the only HostKey line references an RSA key. To remove it from the ssh client (the important thing) edit /etc/ssh/ssh_config and make sure that it has something like the following:

Host *
Protocol 2
HostKeyAlgorithms ssh-rsa
ForwardX11 no
ForwardX11Trusted no

You can override this for different machines. So if you have a machine that uses DSA only then it would be easy to add a section:

Host strange-machine
Protocol 2
HostKeyAlgorithms ssh-dsa

So making the default configuration of the ssh client on all machines you manage has the potential to dramatically reduce the incidence of MITM attacks from the less knowledgable users.

When skilled users who do not have root access need to change things they can always edit the file ~/.ssh/config (which has the same syntax as /etc/ssh/ssh_config) or they can use command-line options to override it. The command ssh -o “HostKeyAlgorithms ssh-dsa” user@server will force the use of DSA encryption even if the configuration file requests RSA.

Enrico Zini describes how to use ssh-keygen to get the fingerprint of the host key [6]. One thing I have learned from comments on this post is how to get a fingerprint from a known hosts file. A common situation is that machine A has a known hosts file with an entry for machine B. I want to get the right key in machine C and there is no way of directly communicating between machine A and machine C (EG they are in different locations with no network access). In that situation the command “ssh-keygen -l -f ~/.ssh/known_hosts” can be used to display all the fingerprints of hosts that you have connected to in the past, then it’s a simple matter of grepping the output.

Docunext has an interesting post about ways of mitigating such problems [7]. One thing that they suggest is using fail2ban to block IP addresses that appear to be trying to do brute-force attacks. It’s unfortunate that the version of fail2ban in Debian uses /tmp/fail2ban.sock for it’s Unix domain socket for talking to the server (the version in Unstable uses /var/run/fail2ban/fail2ban.sock). They also mention patching network drivers to add entropy to the kernel random number generator. One thing that seems interesting is the package randomsound (currently in Debian/Unstable) which takes ALSA sound input as a source of entropy, note that you don’t need to have any sound input device connected.

When considering fail2ban and similar things, it’s probably best to start by restricting the number of machines which can connect to your SSH server. Firstly if you put it on a non-default port then it’ll take some brute-force to find it. This will waste some of the attacker’s time and also make the less persistent attackers go elsewhere. One thing that I am considering is having a few unused ports configured such that any IP address which connects to them gets added to my NetFilter configuration – if you connect to such ports then you can’t connect to any other ports for a week (or until the list becomes too full). So if for example I had port N configured in such a manner and port N+100 used for ssh listening then it’s likely that someone who port-scans my server would be blocked before they even discovered the SSH server. Does anyone know of free software to do this?

The next thing to consider is which IP addresses may connect. If you were to allow all the IP addresses from all the major ISPs in your country to connect to your server then it would still be a small fraction of the IP address space. Sure attackers could use machines that they already cracked in your country to launch their attacks, but they would have to guess that you had such a defense in place, and even so it would be an inconvenience for them. You don’t necessarily need to have a perfect defense, you only need to make the effort to reward ratio be worse for attacking you than for attacking someone else. Note that I am not advocating taking a minimalist approach to security, merely noting that even a small increment in the strength of your defenses can make a significant difference to the risk you face.

Update: based on comments I’m now considering knockd to open ports on demand. The upstream site for knockd is here [8], and some documentation on setting it up in Debian is here [9]. The concept of knockd is that you make connections to a series of ports which act as a password for changing the firewall rules. An attacker who doesn’t know those port numbers won’t be able to connect. Of course anyone who can sniff your network will discover the ports soon enough, but I guess you can always login and change the port numbers once knockd has let you in.

Also thanks to Helmut for advice on ssh-keygen.

9

Ideas to Copy from Red Hat

I believe that the Red Hat process which has Fedora for home users (with a rapid release cycle and new versions of software but support for only about one year) and Enterprise Linux (with a ~18 month release cycle, seven years of support, and not always having the latest versions) gives significant benefits for the users.

The longer freeze times of Enterprise Linux (AKA RHEL) mean that it often has older versions of software than a Fedora release occurring at about the same time. In practice the only time I ever notice users complaining about this is in terms of OpenOffice (which is always being updated for compatability with the latest MS changes). As an aside, a version of RHEL or CentOS with a back-port of the latest OpenOffice would probably get a lot of interest.

RHEL also has a significantly smaller package set than Fedora, there is a lot of software out there that you wouldn’t want to support for seven years, a lot of software that you might want to support if you had more resources, and plenty of software that is not really of interest to enterprise customers (EG games).

Now there are some down-sides to the Red Hat plan. The way that they run Fedora is to have new releases of software instead of back-porting fixes. This means that bugs can be fixed with less effort (simply compiling a new version is a lot less effort than back-porting a fix), and that newer versions of the upstream code get tested. With some things this isn’t a problem, but in the past I have had problems with the Fedora kernel. One example was when I upgraded the kernel on a bunch of remote Fedora machines only to find that the new kernel didn’t support the network card, so I had to talk the users through selecting the older kernel at the GRUB menu (this caused pain and down-time). A problem with RHEL (which I see regularly on the CentOS machines I run) is that it doesn’t have the community support that Fedora does, and therefore finding binary packages for RHEL can be difficult – and often the packages are outdated.

I believe that in Debian we could provide benefits for some of our users by copying some ideas from Red Hat. There is currently some work in progress on releasing packages that are half-way between Etch and Lenny (Etch is the current release, Lenny will be the next one). The term Etch and a half refers to the work to make Etch run on newer hardware [1]. It’s a good project, but I don’t think that it goes far enough. It certainly won’t fulfill the requirements of people who want something like Fedora.

I think that if we had half-way releases of Debian (essentially taking a snap-shot of Testing and then fixing the worst of the bugs) then we could accommodate user demand for newer versions (making available a release which is on average half as old). Users who want really solid systems would run the full releases (which have more testing pre-release and more attention paid to bug fixes), but users who need the new features could run a half-way release. Currently there are people working on providing security support for Testing so that people who need the more recent versions of software can use Testing, I believe that making a half-way release would provide better benefits to most users while also possibly taking less resources from the developers. This would not preclude the current “Etch and a half” work of back-porting drivers, in the Red Hat model such driver back-ports are done in the first few years of RHEL support. If we were to really follow Red Hat in this regard the “Etch and a half” work would operate in tandem with similar work for Sarge (version 3.1 of Debian which was released in 2005)!

In summary, the Red Hat approach is to have Fedora releases aimed at every 6 months, but in practice coming out every 9 months or so and to have Enterprise Linux releases aimed at every year, but in practice coming out every 18 months. This means among other things that there can be some uncertainty as to the release order of future Fedora and RHEL releases.

I believe that a good option for Debian would be to have alternate “Enterprise” (for want of a better word) and half-way releases (comparable to RHEL and Fedora). The Enterprise releases could be frozen in coordination with Red Hat, Ubuntu, and other distributions (Mark Shuttleworth now refers to this as being a “pulse” in the free software community [], while the half-way releases would come out either when it’s about half-way between releases, or when there is a significant set of updates that would encourage users to switch.

One of the many benefits to having synchronised releases is that if the work in back-porting support for new hardware lagged in Debian then users would have a reasonable chance of taking the code from CentOS. If nothing else I think that making kernels from other distributions available for easy install is a good thing. There is a wide combination of kernel patches that may be selected by distribution maintainers, and sometimes choices have to be made between mutually exclusive options. If the Debian kernel doesn’t work best for a user then it would be good to provide them with a kernel compiled from the RHEL kernel source package and possibly other kernels.

Mark also makes the interesting suggestion of having different waves of code freeze, the first for the kernel, GCC, and glibc, and possibly server programs such as Apache. The second for major applications and desktop environments. The third for distributions. One implication of this is that not all distributions will follow the second wave. If a distribution follows the kernel, GCC, and glibc wave but not the applications wave it will still save some significant amounts of effort for the users. It will mean that the distributions in question will all have the same hardware support and kernel features, and that they will be able to run each others’ applications (except when the applications in question use system libraries from later waves). Also let’s not forget the possibility of running a kernel from distribution A on distribution B, it’s something I’ve done on many occasions, but it does rely on the kernels in question being reasonably similar in terms of features.