EC2 and IP Addresses

One of the exciting things about having a cloud computing service is how to talk to the rest of the world. It’s all very well to have a varying number of machines in various locations, but you need constant DNS names at least (and sometimes constant IP addresses) to do most useful things.

I have previously described how to start an EC2 instance and login to it – which includes discovering it’s IP address [1]. It would not be difficult (in theory at least) to use nsupdate to change DNS records after an instance is started or terminated. One problem is that there is no way of knowing when an instance is undesirably terminated (IE killed by hardware failure) apart from polling ec2-describe-instances so it seems impossible to remove a DNS name before some other EC2 customer gets a dynamic IP address. So it seems that in most cases you will want a constant IP address (which Amazon calls an Elastic IP address) if you care about this possibility. For the case of an orderly shutdown you could have a script remove the DNS record, wait for the timeout period specified by the DNS server (so that all correctly operating DNS caches have purged the record) and then terminate the instance.

One thing that interests me is the possibility of running front-end mail servers on EC2. Mail servers that receive mail from the net can take significant amounts of CPU time and RAM for spam and virus filters. Instead of having the expense of running enough MX servers to sustain the highest possible load even while one of the servers has experienced a hardware failure there is a possibility of running an extra EC2 instance at peak times with the possibility of running a large instance for a peak time when one of the dedicated servers has experienced a problem. The idea of having a mail server die and have someone else’s server take the IP address and receive the mail is too horrible to contemplate, so an Elastic IP address is required.

It is quite OK to have a set of mail servers of which not all servers run all the time (this is why the MX record was introduced to the DNS) so having a server run periodically at periods of high load (one of the benefits of the EC2 service) will not require changes to the DNS. I think it’s reasonably important to minimise the amount of changes to the DNS due to the possibility of accidentally breaking it (which is a real catastrophe) and the possibility of servers caching DNS data for longer than they should. The alternative is to change the MX record to not point to the hostname of the server when the instance is terminated. I will be interested to read comments on this issue.

The command ec2-allocate-address will allocate a public IP address for your use. Once the address is allocated it will cost $0.01 per hour whenever it is unused. There are also commands ec2-describe-addresses (to list all addresses allocated to you), ec2-release-address (to release an allocated address), ec2-associate-address to associate an IP address with a running instance, and ec2-disassociate-address to remove such an association.

The command “ec2-associate-address -i INSTANCE ADDRESS” will associate an IP address with the specified instance (replace INSTANCE with the instance ID – a code starting with “i-” that is returned from ec2-describe-instances. The command “ec2-describe-instances |grep ^INSTANCE|cut -f2” will give you a list of all instance IDs in use – this is handy if your use of EC2 involves only one active instance at a time (all the EC2 API commands give output in tab-separated lists and can be easily manipulated with grep and cut). Associating an IP address with an instance is documented as taking several minutes, while Amazon provides no guarantees or precise figures as to how long the various operations take it seems that assigning an IP address is one of the slower operations. I expect that is due to the requirement for reconfiguring a firewall device (which services dozens or maybe hundreds of nodes) while creating or terminating an instance is an operation that is limited in scope to a single Xen host.

One result that I didn’t expect was that associating an elastic address is that the original address that was assigned to the instance is removed. I had a ssh connection open to an instance when I assigned an elastic address and my connection was broken. It makes sense to remove addresses that aren’t needed (IPv4 addresses are a precious commodity) and further reading of the documentation revealed that this is the documented behavior.

One thing I have not yet investigated is whether assigning an IP address from one instance to another is atomic. Taking a few minutes to assign an IP address is usually no big deal, but having an IP address be unusable for a few minutes while in the process of transitioning between servers would be quite inconvenient. It seems that a reasonably common desire would be to have a small instance running and to then transition the IP address to a large (or high-CPU) instance if the load gets high, having this happen without the users noticing would be a good thing.

Basics of EC2

I have previously written about my work packaging the tools to manage Amazon EC2 [1].

First you need to login and create a certificate (you can upload your own certificate – but this is probably only beneficial if you have two EC2 accounts and want to use the same certificate for both). Download the X509 private key file (named pk-X.pem) and the public key (named cert-X.pem). My Debian package of the EC2 API tools will look for the key files in the ~/.ec2 and /etc/ec2 directories and will take the first one it finds by default.

To override the certificate (when using my Debian package) or to just have it work when using the code without my package you set the variables EC2_PRIVATE_KEY and EC2_CERT.

This Amazon page describes some of the basics of setting up the client software and RSA keys [2]. I will describe some of the most important things now:

The command “ec2-add-keypair gsg-keypair > id_rsa-gsg-keypair” creates a new keypair for logging in to an EC2 instance. The public key goes to amazon and the private key can be used by any ssh client to login as root when you creat an instance. To create an instance with that key you use the “-k gsg-keypair” option, so it seems a requirement to use the same working directory for creating all instances. Note that gsg-keypair could be replaced by any other string, if you are doing something really serious with EC2 you might use one account to create instances that are run by different people with different keys. But for most people I think that a single key is all that is required. Strangely they don’t provide a way of getting access to the public key, you have to create an instance and then copy the /root/.ssh/authorized_keys file for that.

This Amazon page describes how to set up sample images [3].

The first thing it describes is the command ec2-describe-images -o self -o amazon which gives a list of all images owned by yourself and all public images owned by Amazon. It’s fairly clear that Amazon doesn’t expect you to use their images. The i386 OS images that they have available are Fedora Core 4 (four configurations with two versions of each) and Fedora 8 (a single configuration with two versions) as well as three other demo images that don’t indicate the version. The AMD64 OS images that they have available are Fedora Core 6 and Fedora Core 8. Obviously if they wanted customers to use their own images (which seems like a really good idea to me) they would provide images of CentOS (or one of the other recompiles of RHEL) and Debian. I have written about why I think that this is a bad idea for security [4], please make sure that you don’t use the ancient Amazon images for anything other than testing!

To test choose an i386 image from Amazon’s list, i386 is best for testing because it allows the cheapest instances (currently $0.10 per hour).

Before launching an instance allow ssh access to it with the command “ec2-authorize default -p 22“. Note that this command permits access for the entire world. There are options to limit access to certain IP address ranges, but at this stage it’s best to focus on getting something working. Of course you don’t want to actually use your first attempt at creating an instance, I think that setting up an instance to run in a secure and reliable manner would require many attempts and tests. As all the storage of the instance is wiped when it terminates (as we aren’t using S3 yet) and you won’t have any secret data online security doesn’t need to be the highest priority.

A sample command to run an instance is “ec2-run-instances ami-2b5fba42 -k gsg-keypair” where ami-2b5fba42 is a public Fedora 8 image available at this moment. This will give output similar to the following:

RESERVATION r-281fc441 999999999999 default
INSTANCE i-0c999999 ami-2b5fba42 pending gsg-keypair 0 m1.small 2008-11-04T06:03:09+0000 us-east-1c aki-a71cf9ce ari-a51cf9cc

The parameter after the word INSTANCE is the serial number of the instance. The command “ec2-describe-instances i-0c999999” will provide information on the instance, once it is running (which may be a few minutes after you request it) you will see output such as the following:

RESERVATION r-281fc441 999999999999 default
INSTANCE i-0c999999 ami-2b5fba42 domU-12-34-56-78-9a-bc.compute-1.internal running gsg-keypair 0 m1.small 2008-11-04T06:03:09+0000 us-east-1c aki-a71cf9ce ari-a51cf9cc

The command “ssh -i id_rsa-gsg-keypair” will then grant you root access. The part of the name such as 10-11-12-13 is the public IP address. Naturally you won’t see, it will instead be public addresses in the Amazon range – I replaced the addresses to avoid driving bots to their site.

The name domU-12-34-56-78-9a-bc.compute-1.internal is listed in Amazon’s internal DNS and returns the private IP address (in the range) which is used for the instance. The instance has no public IP address, all connections (both inbound and outbound) run through some sort of NAT. This shouldn’t be a problem for HTTP, SMTP, and most protocols that are suitable for running on such a service. But for FTP or UDP based services it might be a problem. The part of the name such as12-34-56-78-9a-bc is the MAC address of the eth0 device.

To halt a service you can run shutdown or halt as root in the instance, or run the ec2-terminate-instances command and give it the instance ID that you want to terminate. It seems to me that the best way of terminating an instance would be to run a script that produces a summary of whatver the instance did (you might not want to preserve all the log data, but some summary information would be useful), and give all operations that are in progress time to stop before running halt. A script could run on the management system to launch such an orderly shutdown script on the instance and then uses ec2-terminate-instances if the instance does not terminate quickly enough.

In the near future I will document many aspects of using EC2. This will include dynamic configuration of the host, dynamic DNS, and S3 storage among other things.

Integrity and Mailing Lists

One significant dividing factor between mailing lists is the difference between summary lists (where the person who asks a question receives replies off-list and then sends a summary to the list) and the majority of mailing lists which are discussion lists (where every reply goes to the list by default).

I have seen an argument put forward that trusting the answers on a mailing list that operates under the summary list model is inherently risky and that peer review is required.

It could be argued that the process of sending a summary to the list is the peer review. I’m sure that if someone posts a summary which includes some outrageously bad idea then there will be some commentary in response. Of course the down-side to this is that it takes a few days for responses to the question to arrive and as it’s common that computer problems need to be solved in hours not days the problem will be solved (one way or another) before the summary message is written. But the idea of peer review in mailing lists seems to fall down in many other ways.

The first problem with the idea of peer review is that the usual aim of mailing lists is that most people will first ask google and only ask the list if a reasonable google search fails (probably most mailing lists would fall apart under the load of repeated questions otherwise). Therefore I expect that the majority of such problems to be solved by reading a web page (with no peer review that is easily accessible). Some of those web pages contain bad advice and part of the skill involved in solving any problem relates to recognising which advice to follow. Also it’s not uncommon for a question on a discussion list to result in a discussion with two or more radically different points of view being strongly supported. I think that as a general rule there is little benefit in asking for advice if you lack any ability to determine whether the advice is any good, and which of the possible pieces of good advice actually apply to your situation. Sometimes you can recognise good advice by the people who offer it, in a small community such as a mailing list it’s easy to recognise the people who have a history of offering reasonable advice. It seems that the main disadvantage of asking google when compared to asking a mailing list is that the google results will in most cases contain links to web sites written by people who you don’t know.

Sometimes the advice is easy to assess, for example if someone recommends a little-known and badly documented command-line option for a utility it’s easy to read the man page and not overly difficult to read the source to discover whether it is a useful solution. Even testing a suggested solution is usually a viable option. Also it’s often the case that doing a google search on a recommended solution will be very informative (sometimes you see web pages saying “here’s something I tried which failed”). Recommendations based on personal experience are less reliable due to statistical issues (consider the the regular disagreements about the reliability of hard disks where some people claim that RAID is not necessary due to not having seen failures while others claim that RAID-5 is inadequate because it has failed them). There are also issues of different requirements, trivial issues such as the amount of money that can be spent will often determine which (if any) of the pieces of good advice can be adopted.

The fact that a large number of people (possibly the majority of Internet users) regularly forward as fact rumors that are debunked by (the main site for debunking urban legends) seems to indicate that it is always going to be impossible to increase the quality of advice beyond a certain level. A significant portion of the people on the net are either unwilling to spend a small amount of effort in determining the accuracy of information that they send around or are so gullible that they believe such things beyond the possibility of doubt. Consider that the next time you ask for advice on a technical issue, you may receive a response from someone who forwarded a rumor that was debunked by Snopes.

Sometimes technical advice is just inherently dangerous because it is impossible to verify the integrity of some code that is being shared, or because it may be based on different versions of software. In a previous blog post I analyse some issues related to security of the Amazon EC2 service [1]. While the EC2 service is great in many ways (and implements a good well-documented set of security features on the servers) the unsigned code for managing it and the old versions of the images that they offer to customers raise some serious issues that provide avenues for attack. Getting the EC2 management tools to work correctly on Debian is not trivial, I have released patches but will not release packages for legal reasons. It seems most likely to me that someone will release packages based on my patches (either because they don’t care about the legal issues or they have legal advice suggesting that such things are OK – maybe due to residing in a different jurisdiction). Then people who download such packages will have to determine whether they trust the person who built them. They may also have the issue of Amazon offering a newer version of the software than that which is packaged for Debian (for all I know Amazon released a new version yesterday).

The term integrity when applied to computers refers to either accidental or malicious damage to data [2]. In the context of mailing list discussions this means both poorly considered advice and acts of malice (which when you consider spam and undisclosed conflicts of interest are actually quite common).

If you ask for advice in any forum (and I use the term in it’s broadest sense to cover web “forums”, IRC, twitter, etc) then getting a useful result will depend on having the majority of members of the forum possessing sufficient integrity and skill, being able to recognise the people whose advice should be followed, or being able to recognise good advice on it’s own.

I can think of few examples of forums of which I have been involved where the level of skill was sufficient to provide quality answers (and refutations for bad answers) for all areas of discussion that were on topic. People whose advice should generally be followed will often offer advice on areas where their skills are less well developed, someone whose advice can be blindly followed in regard to topic A may not be a reliable source for advice on topic B – which can cause confusion if the topics in question are closely related.

Finally a fundamental difference between “peer review” (as applied to conferences and academic journals) is that review for conferences and journals is conducted before the presentation. Not only does the work have to be good enough to pass the review, but the people doing it will never be sure what the threshold is (and will generally want to do more than a minimal effort) so the quality will be quite high. While peer review in mailing lists is mostly based around the presence or absence of flames. A message which doesn’t attract flames will either have some minimal quality or be related to a topic that is not well known (so no-one regards it as being obviously wrong).

Update: The “peer review” process of publishing a post on my blog revealed that I had incorrectly used who’s instead of whose.

Upgrading a server to 64bit Xen

I have access to a server in Germany that was running Debian/Etch i386 but needed to be running Xen with the AMD64 version of Debian/Lenny (well it didn’t really need to be Lenny but we might as well get two upgrades done at the same time). Most people would probably do a complete reinstall, but I knew that I could do the upgrade while the machine is in a server room without any manual intervention. I didn’t achieve all my goals (I wanted to do it without having to boot the recovery system – we ended up having to boot it twice) but no dealings with the ISP staff were required.

The first thing to do is to get a 64bit kernel running. Based on past bad experiences I’m not going to use the Debian Xen kernel on a 64bit system (in all my tests it has had kernel panics in the Dom0 when doing any serious disk IO). So I chose the CentOS 5 kernel.

To get the kernel running I copied the kernel files (/boot/vmlinuz-2.6.18-92.1.13.el5xen /boot/ /boot/config-2.6.18-92.1.13.el5xen) and the modules (/lib/modules/2.6.18-92.1.13.el5xen) from a CentOS machine. I just copied a .tgz archive as I didn’t want to bother installing alien or doing anything else that took time. Then I ran the Debian mkinitramfs program to create the initrd (the 32bit tools for creating an initrd work well with a 64bit kernel). Then I created the GRUB configuration entry (just copied the one from the CentOS box and changed the root= kernel parameter and the root GRUB parameter), crossed my fingers and rebooted. I tested this on a machine in my own computer room to make sure it worked before deploying it in Germany, but there was still some risk.

After rebooting it the command arch reported x86_64 – so it had a 64bit Xen kernel running correctly.

The next thing was to create a 64bit Lenny image. I got the Lenny Beta 2 image and used debootstrap to create the image (I consulted my blog post about creating Xen images for the syntax [1] – one of the benefits of blogging about how you solve technical problems). Then I used scp to copy a .tgz file of that to the server in Germany. Unfortunately the people who had set up that server had used all the disk space in two partitions, one for root and one for swap. While I can use regular files for Xen images (with performance that will probably suck a bit – Ext3 is not a great filesystem for big files) I can’t use them for a new root filesystem. So I formatted the swap space as ext3.

Then to get it working I merely had to update the /etc/fstab, /etc/network/interfaces, and /etc/resolv.conf files to make it basically functional. Of course ssh access is necessary to do anything with the server once it boots, so I chrooted into the environment and ran “apt-get update ; apt-get install openssh-server udev ; apt-get dist-upgrade“.

I stuffed this up and didn’t allow myself ssh access the first time, so the thing to do is to start sshd in the chroot environment and make sure that you can really login. Without having udev running a ssh login will probably result in the message “stdin: is not a tty“, that is not a problem. Getting that to work by the commands ‘ssh root@server “mkdir /dev/pts”‘ and ‘ssh root@server “mount -t devpts devpts /dev/pts”‘ is not a challenge. But installing udev first is a better idea.

Then after that I added a new grub entry as the default which used the CentOS kernel and /dev/sda1 (the device formerly used for swap space) as root. I initially used the CentOS Xen kernel (all Red Hat based distributions bundle the Xen kernel with the Linux kernel – which makes some sense). But the Debian Xen utilities didn’t like that so I changed to the Debian Xen kernel.

Once I had this basically working I copied the 64bit installation to the original device and put the 32bit files in a subdirectory named “old” (so configuration can be copied). When I changed the configuration and rebooted it worked until I installed SE Linux. It seems that the Debian init scripts will in many situations quietly work when the root device is incorectly specified in /etc/fstab. This however requires creating a device node somewhere else for fsck and the SE Linux policy version 2:0.0.20080702-12 was not permitting this. I have since uploaded policy 2:0.0.20080702-13 to fix this bug and requested that the release team allow it in Lenny – I think that a bug which can make a server fail to boot is worthy of inclusion!

Finally to get the CentOS kernel working with Debian you need to load the following modules in the Dom0 (as discussed in my previous post about kernel issues [2]):

It seems that the Debian Xen kernel has those modules linked in and the Debian Xen utilities expect that.

Currently I’m using Debian kernels 2.6.18 and 2.6.26 for the DomUs. I have considered using the CentOS kernel but they decided that /dev/console is not good enough for the console of a DomU and decided to use something else. Gratuitous differences are annoying (every other machine both real and virtual has /dev/console). If I find problems with the Debian kernels in DomUs I will change to the CentOS kernel. Incidentally one problem I have had with a CentOS kernel for a DomU (when running on a CentOS Dom0) was that the CentOS initrd seems to have some strange expectations of the root filesystem, when they are not met things go wrong – a common symptom is that the nash process will go in a loop and use 100% CPU time.

One of the problems I had was converting the configuration for the primary network device from eth0 to xenbr0. In my first attempt I had not installed the bridge-utils package and the machine booted up without network access. In future I will setup xenbr1 (a device for private networking that is not connected to an Ethernet device) first and test it, if it works then there’s a good chance that the xenbr0 device (which is connected to the main Ethernet port of the machine) will work.

After getting the machine going I found a number of things that needed to be fixed with the Xen SE Linux policy. Hopefully the release team will let me get another version of the policy into Lenny (the current one doesn’t work).

Kernel issues with Debian Xen and CentOS Kernels

Last time I tried using a Debian 64bit Xen kernel for Dom0 I was unable to get it to work correctly, it continually gave kernel panics when doing any serious disk IO. I’ve just tried to reproduce that problem on a test machine with a single SATA disk and it seems to be working correctly so I guess that it might be related to using software RAID and LVM (LVM is really needed for Xen and RAID is necessary for every serious server IMHO).

To solve this I am now experimenting with using a CentOS kernel on Debian systems.

There are some differences between the kernels that are relevant, the most significant one is the choice of which modules are linked in to the kernel and which ones have to be loaded with modprobe. The Debian choice is to have the drivers blktap blkbk and netbk linked in while the Red Hat / CentOS choice was to have them as modules. Therefore the Debian Xen utilities don’t try and load those modules and therefore when you use the CentOS kernel without them loaded Xen simply doesn’t work.

Error: Device 0 (vif) could not be connected. Hotplug scripts not working.

You will get the above error (after a significant delay) from the command “xm create -c name” if you try and start a DomU that has networking when the driver netbk is not loaded.

XENBUS: Timeout connecting to device: device/vbd/768 (state 3)

You will get the above error (or something similar with a different device number) for every block device from the kernel of the DomU if using one of the Debian 2.6.18 kernels, if using a 2.6.26 kernel then you get “XENBUS: Waiting for devices to initialise“.

Also one issue to note is that when you use a file: block device (IE a regular file) then Xen will use a loopback device (internally it seems to only like block devices). If you are having this problem and you destroy the DomU (or have it abort after trying for 300 seconds) then it will leave the loopback device enabled (it seems that the code for freeing resources in the error path is buggy). I have filed Debian bug report #503044 [1] requesting that the Xen packages change the kernel configuration to allow more loopback devices and Debian bug report #503046 [2] requesting that the resources be freed correctly.

Finally the following messages appear in /var/log/daemon.log if you don’t have the driver blktap installed:
BLKTAPCTRL[2150]: couldn’t find device number for ‘blktap0’
BLKTAPCTRL[2150]: Unable to start blktapctrl

It doesn’t seem to cause a problem (in my tests I can’t find something I want to do with Xen that required blktap), but I have loaded the driver – even removing error messages is enough of a benefit.

Another issue is that the CentOS kernel packages include a copy of the Xen kernel, so you have a Linux kernel matching the Xen kernel. So of course it is tempting to try and run that CentOS Xen kernel on a Debian system. Unfortunately the Xen utilities in Debian/Lenny don’t match the Xen kernel used for CentOS 5 and you get messages such as the following in /var/log/xen/xend-debug.log:

sysctl operation failed — need to rebuild the user-space tool set?
Exception starting xend: (13, ‘Permission denied’)

Update: Added a reference to another Debian bug report.

Programming and Games for Children

The design of levels for computer games is a form of programming, particularly for games with deterministic NPCs. It seems to me that for a large portion of the modern computer user-base the design of games levels will be their first experience of programming computers, the people who don’t start programming by creating games levels would be writing spread-sheets. Probably a few people start programming by writing “batch files” and shell scripts, but I expect that they form a minute portion of the user-base.

I believe that learning some type of programming is becoming increasingly important, not just for it’s own sake (most people can get through their life quite well without doing any form of programming) but because of the sense of empowerment it gives. A computer is not a mysterious magic box that sometimes does things you want and sometimes doesn’t! It’s a complex machine that you can control. Knowing that you can control it gives you more options even if you don’t want to program it yourself, little things like knowing that you have an option of using a different choice of software or paying someone to write new software open significant possibilities to computer use in business environments.

Games which involve strategic or tactical thought seem to have some educational benefit (which may or may not outweigh the negative aspects of games). To empower children and take full advantage of the educational possibilities I think that there are some features that are needed in games.

Firstly levels that are created by the user need to be first class objects in the game. Having a game menu provide the option of playing predefined levels or user-defined levels clearly shows to the user that their work is somehow less important than that of the game designer. While the game designer’s work will tend to be of a higher quality (by objective measures), by the subjective opinion of the user their own work is usually the most important thing. So when starting a game the user should be given a choice of levels (and/or campaigns) to play with their levels being listed beside the levels of the game creator. Having the users levels displayed at the top of the list (before the levels from the game designer) is also a good thing. Games that support campaigns should allow the user to create their own campaigns.

The KDE game kgoldrunner [1] is the best example I’ve seen of this being implemented correctly (there may be better examples but I don’t recall seeing them).

In kgoldrunner when you start a game the game(s) that you created are at the bottom of the list. While I believe that it would be better to have my own games at the top of the list, having them in the same list is adequate.

When a user is playing the game they should be able to jump immediately from playing a level to editing it. For example in kgoldrunner you can use the Edit Any Level menu option at any time while playing and it will default to allowing you to edit the level you are playing (and give you a hint that you have to save it to your own level). This is a tremendous encouragement for editing levels, any time you play a level and find it too hard, too easy, or not aesthetically pleasing you can change it with a single menu selection!

When editing a level every option should have a description. There should be no guessing as to what an item does – it should not be assumed that the user has played the game enough to fully understand how each primary object works. Kgoldrunner provides hover text to describe the building blocks.

Operations that seem likely to be performed reasonably often should have menu options. While it is possible to move a level by loading it and saving it, having a Move Level menu option (as kgoldrunner does) is a really good feature. Kgoldrunner’s Edit Next Level menu option is also a good feature.

Finally a game should support sharing levels with friends. While kgoldrunner is great it falls down badly in this area. While it’s OK for a game to use multiple files for a campaign underneath the directory it uses for all it’s configuration, but it should be able to export a campaign to a single file for sharing. Being able to hook in to a MUA to enable sending a campaign as a file attached to an email as a single operation would also be a good feature. I have filed Debian bug #502372 [2] requesting this feature.

Some RAID Issues

I just read an interesting paper titled An Analysis of Data Corruption in the Storage Stack [1]. It contains an analysis of the data from 1,530,000 disks running at NetApp customer sites. The amount of corruption is worrying, as is the amount of effort that is needed to detect them.

NetApp devices have regular “RAID scrubbing” which involves reading all data on all disks at some quiet time and making sure that the checksums match. They also store checksums of all written data. For “Enterprise” disks each sector stores 520 bytes, which means that a 4K data block is comprised of 8 sectors and has 64 bytes of storage for a checksum. For “Nearline” disks 9 sectors of 512 bytes are used to store a 4K data block and it’s checksum. These 64byte checksum includes the identity of the block in question, the NetApp WAFL filesystem writes a block in a different location every time, this allows the storage of snapshots of old versions and also means that when reading file data if the location that is read has data from a different file (or a different version of the same file) then it is known to be corrupt (sometimes writes don’t make it to disk). Page 3 of the document describes this.

Page 13 has an analysis of error location and the fact that some disks are more likely to have errors at certain locations. They suggest configuring RAID stripes to be staggered so that you don’t have an entire stripe covering the bad spots on all disks in the array.

One thing that was not directly stated in the article is the connection between the different layers. On a Unix system with software RAID you have a RAID device and a filesystem layer on top of that, and (in Linux at least) there is no way for a filesystem driver to say “you gave me a bad version of that block, please give me a different one”. Block checksum errors at the filesystem level are going to be often caused by corruption that leaves the rest of the RAID array intact, this means that the RAID stripe will have a mismatching checksum. But the RAID driver won’t know which disk has the error. If a filesystem did checksums on metadata (or data) blocks and the chunk size of the RAID was greater than the filesystem block size then when the filesystem detected an error a different version of the block could be generated from the parity.

NetApp produced an interesting guest-post on the StorageMojo blog [2]. One point that they make is that Nearline disks try harder to re-read corrupt data from the disk. This means that a bad sector error will result in longer timeouts, but hopefully the data will be returned eventually. This is good if you only have a single disk, but if you have a RAID array it’s often better to just return an error and allow the data to be retrieved quickly from another disk. NetApp also claim that “Given the realities of today’s drives (plus all the trends indicating what we can expect from electro-mechanical storage devices in the near future) – protecting online data only via RAID 5 today verges on professional malpractice“, it’s a strong claim but they provide evidence to support it.

Another relevant issue is the size of the RAID device. Here is a post that describes the issue of the Unrecoverable Error Rate (UER) and how it can impact large RAID-5 arrays [3]. The implication is that the larger the array (in GB/TB) the greater the need for RAID-6. It has been regarded for a long time that a larger number of disks in the array drove a greater need for RAID-6, but the idea that larger disks in a RAID array gives a greater need for RAID-6 is a new idea (to me at least).

Now I am strongly advising all my clients to use RAID-6. Currently the only servers that I run which don’t have RAID-6 are legacy servers (some of which can be upgraded to RAID-6 – HP hardware RAID is really good in this regard) and small servers with two disks in a RAID-1 array.

EC2 Security

One thing that concerns me about using any online service is the security. When that service is a virtual server running in another country the risks are greater than average.

I’m currently investigating the Amazon EC2 service for some clients, and naturally I’m concerned about the security. Firstly they appear to have implemented a good range of Xen based security mechanisms, their documentation is worth reading by anyone who plans to run a Xen server for multiple users [1]. I think it would be a good thing if other providers would follow their example in documenting the ways that they protect their customers.

Next they seem to have done a good job at securing the access to the service. You use public key encryption for all requests to the service and they generate the keypair. While later in this article I identify some areas that could be improved, I want to make it known that overall I think that EC2 is a good service and it seems generally better than average in every way. But it’s a high profile service which deserves a good deal of scrutiny and I’ve found some things that need to be improved.

The first problem is when it comes to downloading anything of importance (kernel modules for use in a machine image, utility programs for managing AMIs, etc). All downloads are done via http (not https) and the files are not signed in any way. This is an obvious risk that anyone who controls a router could compromise EC2 instances by causing people to download hostile versions of the tools. The solution to this is to use https for the downloads AND to use GPG to sign the files, https is the most user-friendly way of authenticating the files (although it could be argued that anyone who lacks the skill needed to use GPG will never run a secure server anyway) and GPG allows end to end encryption and would allow me to verify files that a client is using if the signature was downloaded at the same time.

More likely problems start when it comes to the machine images that they provide. They have images of Fedora Core 4, Fedora Core 6, and Fedora 8 available. Fedora releases are maintained until one month after the release of two subsequent versions [6], so Fedora 8 will end support one month after the release of Fedora 10 (which will be quite soon) and Fedora Core 6 and Fedora Core 4 have been out of support for a long time. I expect that someone who wanted to 0wn some servers that are well connected could get a list of exploits that work on FC4 or FC6 and try them out on machines running on EC2. While it is theoretically possible for Amazon staff to patch the FC4 images for all security holes that are discovered, it would be a lot of work, and it wouldn’t apply to all the repositories of FC4 software. So making FC4 usable as a secure base for an online service really isn’t a viable option.

Amazon’s page on “Tips for Securing Your EC2 Instance” [3] mostly covers setting up ssh, I wonder whether anyone who needs advice on setting up ssh can ever hope to run a secure server on the net. It does have some useful information on managing the EC2 firewall that will be of general interest.

One of the services that Amazon offers is to have “shared images” where any Amazon customer can share an image with the world. Amazon has a document about AMI security issues [4], but it seems to only be useful against clueless security mistakes by the person who creates an image not malice. If a hostile party creates a machine image you can expect that you won’t discover the problem by looking for open ports and checking for strange processes. The Amazon web page says “you should treat shared AMIs as you would any foreign code that you might consider deploying in your own data center and perform the appropriate due diligence“, the difference of course is that most foreign code that you might consider deploying comes from companies and is shipped in shrink-wrap packaging. I don’t count the high quality free software available in a typical Linux distribution in the same category as this “foreign code”.

While some companies have accidentally shipped viruses on installation media in the past it has been quite rare. But I expect hostile AMIs on EC2 to be considerably more common. Amazon recommends that people know the source of the AMIs that they use. Of course there is a simple way of encouraging this, Amazon could refrain from providing a global directory of AMIs without descriptions (the output of “ec2dim -x all“) and instead force customers to subscribe to channels containing AMIs that have not been approved by Amazon staff (the images that purport to be from Oracle and Red Hat could easily have their sources verified and be listed as trusted images if they are what they appear to be).

There seems to be no way of properly tracking the identity of the person who created a machine image within the Amazon service. The ec2dim command only gives an ID number for the creator (and there seems to be no API tool to get information on a user based on their ID). The web interface gives a name and an Amazon account name.

The next issue is that of the kernel. Amazon notes that they include “vmsplice root exploit patch” in the 2.6.18 kernel image that they supply [2], however there have been a number of other Linux kernel security problems found since then and plenty of security issues for 2.6.18 were patched before the vmsplice bug was discovered – were they patched as well? The file date stamp on the kernel image and modules files of 20th Feb 2008 indicates that there are a few kernel security issues which are not patched in the Amazon kernel.

To fix this the obvious solution is to use a modern distribution image. Of course without knowing what other patches they include (they mention a patch for better network performance) this is going to be difficult. It seems that we need some distribution packages of kernels designed for EC2, they would incorporate the Amazon patches and the Amazon configuration as well as all the latest security updates. I’ve started looking at the Amazon EC2 kernel image to see what I should incorporate from it to make a Debian kernel image. It would be good if we could get such a package included in an update to Debian/Lenny. Also Red Hat is partnering with Amazon to offer RHEL on EC2 [5], I’m sure that they provide good kernels as part of that service – but as the costs for RHEL on EC2 more than double the cost of the cheapest EC2 instance I expect that only the customers that need the the larger instances all the time will use it. The source for the RHEL kernels will of course be good for CentOS (binaries produced from such sources may be in CentOS already, I haven’t checked).

This is not an exhaustive list of security issues related to EC2, I may end up writing a series of posts about this.

Update: Jef Spaleta has written an interesting post that references this one [7]. He is a bit harsher than I am, but his points are all well supported by the evidence.

RPC and SE Linux

One ongoing problem with TCP networking is the combination of RPC services and port based services on the same host. If you have an RPC service that uses a port less than 1024 then typically it will start at 1023 and try lower ports until it finds one that works. A problem that I have had in the past is that an RPC service used port 631 and I then couldn’t start CUPS (which uses that port). A similar problem can arise in a more insidious manner if you have strange networking devices such as a BMC [1] which uses the same IP address as the host and just snarfs connections for itself (as documented by [2]), this means that according to the OS the port in question is not in use, but connections to that port will go to the hardware BMC and the OS won’t see them.

Another solution is to give a SE Linux security context to the port which prevents the RPC service from binding to it. RPC applications seem to be happy to make as many bind attempts as necessary to get an available port (thousands of attempts if necessary) so reserving a few ports is not going to cause any problems. As far as I recall my problems with CUPS and RPC services was a motivating factor in some of my early work on writing SE Linux policy to restrict port access.

Of course the best thing to do is to assign IP addresses for IPMI that are different from the OS IP addresses. This is easy to do and merely requires an extra IP address for each port. As a typical server will have two Ethernet ports on the baseboard (one for the front-end network and one for the private network) that means an extra two IP addresses (you want to use both interfaces for redundancy in case the problem which cripples a server is related to one of the Ethernet ports). But for people who don’t have spare IP addresses, SE Linux port labeling could really help.

Getting Started with Amazon EC2

The first thing you need to do to get started using the Amazon Elastic Compute Cloud (EC2) [1] is to install the tools to manage the service. The service is run in a client-server manner. You install the client software on your PC to manage the EC2 services that you use.

There are the AMI tools to manage the machine images [2] and the API tools to launch and manage instances [3].

The AMI tools come as both a ZIP file and an RPM package and contain Ruby code, while the API tools are written in Java and only come as a ZIP file.

There are no clear license documents that I have seen for any of the software in question, I recall seeing one mention on one of the many confusing web pages of the code being “proprietary” but nothing else. While it seems most likely (but far from certain) that Amazon owns the copyright to the code in question, there is no information on how the software may be used – apart from an implied license that if you are a paying EC2 customer then you can use the tools (as there is no other way to use EC2). If anyone can find a proper license agreement for this software then please let me know.

To get software working in the most desirable manner it needs to be packaged for the distribution on which it is going to be used, as I prefer to use Debian that means packaging it for Debian. Also when packaging the software you can fix some of the silly things that get included in software that is designed for non-packaged release (such as demanding that environment variables be set to specify where the software is installed). So I have built packages for Debian/Lenny for the benefit of myself and some friends and colleagues who use Debian and EC2.

As I can’t be sure of what Amazon would permit me to do with their code I have to assume that they don’t want me to publish Debian packages for the benefit of all Debian and Ubuntu users who are (or might become) EC2 customers. So instead I have published the .diff.gz files from my Debian/Lenny packages [4] to allow other people to build identical packages after downloading the source from Amazon. At the moment the packages are a little rough, and as I haven’t actually got an EC2 service running with them yet they may have some really bad bugs. But getting the software to basically work took more time than expected. So even if there happen to be some bugs that make it unusable in it’s current state (the code for determining where it looks for PEM files at best needs a feature enhancement and at worst may be broken at the moment) then it would still save people some time to use my packages and fix whatever needs fixing.