Archives

Categories

Combat Wasps

One of the many interesting ideas in Peter F. Hamilton’s Night’s Dawn series [1] is that of Combat Wasps. These are robots used in space combat which may be armed with some combination of projectile weapons, MASERs, thermo-nuclear and anti-matter weapons.

In a lot of science fiction the space combat is limited to capital ships, a large source of this problem is technological issues such as the Star Trek process of making models of ships – it’s too expensive and time consuming to make lots of small models. Shows such as Babylon 5 [2] have fighters which make more sense. Sustaining life in space is difficult at the best of times and it seems likely for battles in space to have few if any survivors. So sending out fighters allows the capital ships to have a chance to survive. I suspect that a major motivating factor in the space battles in Babylon 5 was making it fit on a TV screen. Dramatic TV portrayal of small groups of fighters engaging in a battle is an art that has been perfected over the course of 80+ years. It’s about individuals being shown, whether it’s riders on horseback, pilots of biplanes, or space pilots, it’s much the same.

But a reasonable analysis of the facts suggests that without some strange religious motive adopted by all parties in a war (as used in Dune [3]) the trend in warfare is to ever greater mechanisation.

So while a medium size starship might be able to carry dozens or even hundreds of fighter craft, if using small robotic craft then thousands of fighters could be carried.

So the issue is how to effectively use such robots. It seems likely that an effective strategy would involve large numbers of robots performing different tasks, some would detonate thermo-nuclear weapons to remove enemies from an area while others would prepare to advance into the breach. The result would be a battle lasting seconds that involves large numbers of robots (too many to focus on in a group) while each robot matters to little that there’s no interest in following one. Therefore it just wouldn’t work on TV and in a book it’s given a couple of sentences to describe what would have been an epic battle if humans had done anything other than press the launch buttons.

One of the many things I would do if I had a lot more spare time would be to write a Combat Wasp simulator. There are already quite a number of computer games based on the idea of writing a program to control a robot and then having the robots do battle. This would be another variation on the theme but based in space.

In a comment on my previous post about programming and games for children [4], Don Marti suggests that a RTS game could allow programming the units. It seems to me that the current common settings for controlling units in RTS games (attack particular enemies, attach whichever enemies get in range, patrol, move to location, retreat, and defend other units or strategic positions) are about as complex as you can get without getting to the full programming language stage. Then of course if you have any real programming language for a unit then changing it takes more time than an RTS game allows, and if the programming is good then there won’t be much for a human to do during the game anyway. So I can’t imagine much potential for anything between RTS and fully programmed games.

There is some interesting research being conducted by the US military in simulating large numbers of people in combat situations. I think that the techniques in question could be more productively used in determining which of the various science fiction ideas for space combat could be most effectively implemented.

Programming and Games for Children

The design of levels for computer games is a form of programming, particularly for games with deterministic NPCs. It seems to me that for a large portion of the modern computer user-base the design of games levels will be their first experience of programming computers, the people who don’t start programming by creating games levels would be writing spread-sheets. Probably a few people start programming by writing “batch files” and shell scripts, but I expect that they form a minute portion of the user-base.

I believe that learning some type of programming is becoming increasingly important, not just for it’s own sake (most people can get through their life quite well without doing any form of programming) but because of the sense of empowerment it gives. A computer is not a mysterious magic box that sometimes does things you want and sometimes doesn’t! It’s a complex machine that you can control. Knowing that you can control it gives you more options even if you don’t want to program it yourself, little things like knowing that you have an option of using a different choice of software or paying someone to write new software open significant possibilities to computer use in business environments.

Games which involve strategic or tactical thought seem to have some educational benefit (which may or may not outweigh the negative aspects of games). To empower children and take full advantage of the educational possibilities I think that there are some features that are needed in games.

Firstly levels that are created by the user need to be first class objects in the game. Having a game menu provide the option of playing predefined levels or user-defined levels clearly shows to the user that their work is somehow less important than that of the game designer. While the game designer’s work will tend to be of a higher quality (by objective measures), by the subjective opinion of the user their own work is usually the most important thing. So when starting a game the user should be given a choice of levels (and/or campaigns) to play with their levels being listed beside the levels of the game creator. Having the users levels displayed at the top of the list (before the levels from the game designer) is also a good thing. Games that support campaigns should allow the user to create their own campaigns.

The KDE game kgoldrunner [1] is the best example I’ve seen of this being implemented correctly (there may be better examples but I don’t recall seeing them).

In kgoldrunner when you start a game the game(s) that you created are at the bottom of the list. While I believe that it would be better to have my own games at the top of the list, having them in the same list is adequate.

When a user is playing the game they should be able to jump immediately from playing a level to editing it. For example in kgoldrunner you can use the Edit Any Level menu option at any time while playing and it will default to allowing you to edit the level you are playing (and give you a hint that you have to save it to your own level). This is a tremendous encouragement for editing levels, any time you play a level and find it too hard, too easy, or not aesthetically pleasing you can change it with a single menu selection!

When editing a level every option should have a description. There should be no guessing as to what an item does – it should not be assumed that the user has played the game enough to fully understand how each primary object works. Kgoldrunner provides hover text to describe the building blocks.

Operations that seem likely to be performed reasonably often should have menu options. While it is possible to move a level by loading it and saving it, having a Move Level menu option (as kgoldrunner does) is a really good feature. Kgoldrunner’s Edit Next Level menu option is also a good feature.

Finally a game should support sharing levels with friends. While kgoldrunner is great it falls down badly in this area. While it’s OK for a game to use multiple files for a campaign underneath the directory it uses for all it’s configuration, but it should be able to export a campaign to a single file for sharing. Being able to hook in to a MUA to enable sending a campaign as a file attached to an email as a single operation would also be a good feature. I have filed Debian bug #502372 [2] requesting this feature.

Some RAID Issues

I just read an interesting paper titled An Analysis of Data Corruption in the Storage Stack [1]. It contains an analysis of the data from 1,530,000 disks running at NetApp customer sites. The amount of corruption is worrying, as is the amount of effort that is needed to detect them.

NetApp devices have regular “RAID scrubbing” which involves reading all data on all disks at some quiet time and making sure that the checksums match. They also store checksums of all written data. For “Enterprise” disks each sector stores 520 bytes, which means that a 4K data block is comprised of 8 sectors and has 64 bytes of storage for a checksum. For “Nearline” disks 9 sectors of 512 bytes are used to store a 4K data block and it’s checksum. These 64byte checksum includes the identity of the block in question, the NetApp WAFL filesystem writes a block in a different location every time, this allows the storage of snapshots of old versions and also means that when reading file data if the location that is read has data from a different file (or a different version of the same file) then it is known to be corrupt (sometimes writes don’t make it to disk). Page 3 of the document describes this.

Page 13 has an analysis of error location and the fact that some disks are more likely to have errors at certain locations. They suggest configuring RAID stripes to be staggered so that you don’t have an entire stripe covering the bad spots on all disks in the array.

One thing that was not directly stated in the article is the connection between the different layers. On a Unix system with software RAID you have a RAID device and a filesystem layer on top of that, and (in Linux at least) there is no way for a filesystem driver to say “you gave me a bad version of that block, please give me a different one”. Block checksum errors at the filesystem level are going to be often caused by corruption that leaves the rest of the RAID array intact, this means that the RAID stripe will have a mismatching checksum. But the RAID driver won’t know which disk has the error. If a filesystem did checksums on metadata (or data) blocks and the chunk size of the RAID was greater than the filesystem block size then when the filesystem detected an error a different version of the block could be generated from the parity.

NetApp produced an interesting guest-post on the StorageMojo blog [2]. One point that they make is that Nearline disks try harder to re-read corrupt data from the disk. This means that a bad sector error will result in longer timeouts, but hopefully the data will be returned eventually. This is good if you only have a single disk, but if you have a RAID array it’s often better to just return an error and allow the data to be retrieved quickly from another disk. NetApp also claim that “Given the realities of today’s drives (plus all the trends indicating what we can expect from electro-mechanical storage devices in the near future) – protecting online data only via RAID 5 today verges on professional malpractice“, it’s a strong claim but they provide evidence to support it.

Another relevant issue is the size of the RAID device. Here is a post that describes the issue of the Unrecoverable Error Rate (UER) and how it can impact large RAID-5 arrays [3]. The implication is that the larger the array (in GB/TB) the greater the need for RAID-6. It has been regarded for a long time that a larger number of disks in the array drove a greater need for RAID-6, but the idea that larger disks in a RAID array gives a greater need for RAID-6 is a new idea (to me at least).

Now I am strongly advising all my clients to use RAID-6. Currently the only servers that I run which don’t have RAID-6 are legacy servers (some of which can be upgraded to RAID-6 – HP hardware RAID is really good in this regard) and small servers with two disks in a RAID-1 array.

EC2 Security

One thing that concerns me about using any online service is the security. When that service is a virtual server running in another country the risks are greater than average.

I’m currently investigating the Amazon EC2 service for some clients, and naturally I’m concerned about the security. Firstly they appear to have implemented a good range of Xen based security mechanisms, their documentation is worth reading by anyone who plans to run a Xen server for multiple users [1]. I think it would be a good thing if other providers would follow their example in documenting the ways that they protect their customers.

Next they seem to have done a good job at securing the access to the service. You use public key encryption for all requests to the service and they generate the keypair. While later in this article I identify some areas that could be improved, I want to make it known that overall I think that EC2 is a good service and it seems generally better than average in every way. But it’s a high profile service which deserves a good deal of scrutiny and I’ve found some things that need to be improved.

The first problem is when it comes to downloading anything of importance (kernel modules for use in a machine image, utility programs for managing AMIs, etc). All downloads are done via http (not https) and the files are not signed in any way. This is an obvious risk that anyone who controls a router could compromise EC2 instances by causing people to download hostile versions of the tools. The solution to this is to use https for the downloads AND to use GPG to sign the files, https is the most user-friendly way of authenticating the files (although it could be argued that anyone who lacks the skill needed to use GPG will never run a secure server anyway) and GPG allows end to end encryption and would allow me to verify files that a client is using if the signature was downloaded at the same time.

More likely problems start when it comes to the machine images that they provide. They have images of Fedora Core 4, Fedora Core 6, and Fedora 8 available. Fedora releases are maintained until one month after the release of two subsequent versions [6], so Fedora 8 will end support one month after the release of Fedora 10 (which will be quite soon) and Fedora Core 6 and Fedora Core 4 have been out of support for a long time. I expect that someone who wanted to 0wn some servers that are well connected could get a list of exploits that work on FC4 or FC6 and try them out on machines running on EC2. While it is theoretically possible for Amazon staff to patch the FC4 images for all security holes that are discovered, it would be a lot of work, and it wouldn’t apply to all the repositories of FC4 software. So making FC4 usable as a secure base for an online service really isn’t a viable option.

Amazon’s page on “Tips for Securing Your EC2 Instance” [3] mostly covers setting up ssh, I wonder whether anyone who needs advice on setting up ssh can ever hope to run a secure server on the net. It does have some useful information on managing the EC2 firewall that will be of general interest.

One of the services that Amazon offers is to have “shared images” where any Amazon customer can share an image with the world. Amazon has a document about AMI security issues [4], but it seems to only be useful against clueless security mistakes by the person who creates an image not malice. If a hostile party creates a machine image you can expect that you won’t discover the problem by looking for open ports and checking for strange processes. The Amazon web page says “you should treat shared AMIs as you would any foreign code that you might consider deploying in your own data center and perform the appropriate due diligence“, the difference of course is that most foreign code that you might consider deploying comes from companies and is shipped in shrink-wrap packaging. I don’t count the high quality free software available in a typical Linux distribution in the same category as this “foreign code”.

While some companies have accidentally shipped viruses on installation media in the past it has been quite rare. But I expect hostile AMIs on EC2 to be considerably more common. Amazon recommends that people know the source of the AMIs that they use. Of course there is a simple way of encouraging this, Amazon could refrain from providing a global directory of AMIs without descriptions (the output of “ec2dim -x all“) and instead force customers to subscribe to channels containing AMIs that have not been approved by Amazon staff (the images that purport to be from Oracle and Red Hat could easily have their sources verified and be listed as trusted images if they are what they appear to be).

There seems to be no way of properly tracking the identity of the person who created a machine image within the Amazon service. The ec2dim command only gives an ID number for the creator (and there seems to be no API tool to get information on a user based on their ID). The web interface gives a name and an Amazon account name.

The next issue is that of the kernel. Amazon notes that they include “vmsplice root exploit patch” in the 2.6.18 kernel image that they supply [2], however there have been a number of other Linux kernel security problems found since then and plenty of security issues for 2.6.18 were patched before the vmsplice bug was discovered – were they patched as well? The file date stamp on the kernel image and modules files of 20th Feb 2008 indicates that there are a few kernel security issues which are not patched in the Amazon kernel.

To fix this the obvious solution is to use a modern distribution image. Of course without knowing what other patches they include (they mention a patch for better network performance) this is going to be difficult. It seems that we need some distribution packages of kernels designed for EC2, they would incorporate the Amazon patches and the Amazon configuration as well as all the latest security updates. I’ve started looking at the Amazon EC2 kernel image to see what I should incorporate from it to make a Debian kernel image. It would be good if we could get such a package included in an update to Debian/Lenny. Also Red Hat is partnering with Amazon to offer RHEL on EC2 [5], I’m sure that they provide good kernels as part of that service – but as the costs for RHEL on EC2 more than double the cost of the cheapest EC2 instance I expect that only the customers that need the the larger instances all the time will use it. The source for the RHEL kernels will of course be good for CentOS (binaries produced from such sources may be in CentOS already, I haven’t checked).

This is not an exhaustive list of security issues related to EC2, I may end up writing a series of posts about this.

Update: Jef Spaleta has written an interesting post that references this one [7]. He is a bit harsher than I am, but his points are all well supported by the evidence.

Updated EC2 API Tools package

I’ve updated my package of the Amazon EC2 API Tools for Debian [1]. Now it uses the Sun JDK. Kaffe doesn’t work due to not supporting annotations, I haven’t filed a bug because Kaffe is known to be incomplete.

OpenJDK doesn’t work – apparently because it doesn’t include trusted root certificates (see Debian bug #501643) [2].

GCJ doesn’t work, not sure why so I filed Debian bug #501743 [3].

I don’t think that Java is an ideal language choice for utility programs. It seems that Perl might be a better option as it’s supported everywhere and has always been free (the Sun JVM has only just started to become free). The lack of freeness of Java results in a lower quality, and in this case several hours of my time wasted.

I Won’t Use Drizzle

A couple of days ago I attended a lecture about the Drizzle database server [1].

Drizzle is a re-write of MySQL for use in large web applications. It is only going to run on 64bit platforms because apparently everyone uses 64bit servers – except of course people who are Amazon EC2 customers as the $0.10 per hour instances in EC2 are all 32bit. It’s also designed to use large amounts of RAM for more aggressive caching, it’s being optimised for large RAM at the expense of performance on systems with small amounts of RAM. This is OK if you buy one (or many) new servers to dedicate to the task of running a database. But if a database is one of many tasks running on the machine, or if the machine is a Xen instance then this isn’t going to be good.

There are currently no plans to support replication between MySQL and Drizzle databases (although it would not be impossible to write the support).

The good news is that regular MySQL development will apparently continue in the same manner as before, so people who have small systems, run on Xen instances, or use EC2 can keep using that. Drizzle seems just aimed at people who want to run very large sharded databases.

Now I just wish that they would introduce checksums on all data transfers and stores into MySQL. I consider that to be a really significant feature of Drizzle.

Future Video Games

I just watched an interesting TED.com talk about video games [1]. The talk focussed to a large degree on emotional involvement in games, so it seems likely that there will be many more virtual girlfriend services [2] (I’m not sure that “game” is the correct term for such things) in the future. The only reference I could find to a virtual boyfriend was a deleted Wikipedia page for V-Boy, but I expect that they will be developed soon enough. I wonder if such a service could be used by astronauts on long missions. An advantage of a virtual SO would be that there is no need to have a partner who is qualified, and if a human couple got divorced on the way to Mars then it could be a really long journey for everyone on the mission.

VR training has been used for a long time in the airline industry (if you have never visited an airline company and sat in a VR trainer for a heavy passenger jet then I strongly recommend that you do it). It seems that there are many other possible uses for this. The current prison system is widely regarded as a training ground for criminals, people who are sent to prison for minor crimes come out as hardened criminals. I wonder if a virtual environment for prisoners could do some good. Instead of prisoners having to deal with other prisoners they could deal with virtual characters who encourage normal social relationships, prisoners who didn’t want to meet other prisoners could be given the option of spending their entire sentence in “solitary confinement” with virtual characters, multi-player games, and Internet access if they behave well. Game systems such as the Nintendo Wii [3] would result in prisoners getting adequate exercise, so after being released from a VR prison it seems likely that the ex-con would be fitter, healthier, and better able to fit into normal society than most parolees. Finally it seems likely that someone who gets used to spending most all their spare time playing computer games will be less likely to commit crimes.

It seems to me that the potential for the use of virtual environments in schools is very similar to that of prisons, for similar reasons.

Update: Currently Google Adsense is showing picture adverts on this page for the “Shaiya” game, the pictures are of a female character wearing a bikini with the caption “your goddess awaits”. This might be evidence to support my point about virtual girlfriends.

The Security Benefits of Being Unimportant

A recent news item is the “hacking” of the Yahoo mailbox used by Sarah Palin (the Republican VP candidate) [1]. It seems most likely that it was a simple social-engineering attack on the password reset process of Yahoo (although we are unlikely to learn the details unless the case comes to trial). The email address in question had been used for some time to avoid government data-retention legislation but had only been “hacked” after she was listed as the VP candidate. The reason of course is that most people don’t care much about who is the governor of one of the least populous US states.

Remote attack on a mailbox (which is what we presume happened) is only one possible problem. Another of course is that of the integrity of the staff at the ISP. While I know nothing about what happens inside Yahoo, I have observed instances of unethical actions by employees at some ISPs where I have previously worked, I have no doubt that such people would have read the email of a VP candidate without any thought if they had sufficient access to do so. If an ISP stores unencrypted passwords then the way things usually work is that the helpdesk people are granted read access to the password data so that they can login to customer accounts to reproduce problems – this is a benefit for the customers in terms of convenience. But that also means that they can read the email of any customer at any time. I believe that my account on Gmail (the only webmail service I use) is relatively safe. I’m sure that there are a huge number of people who are more important than me who use Gmail. But if I was ever considered to have a reasonable chance of becoming Prime Minister then I would avoid using a Gmail account as a precaution.

There is a rumoured Chinese proverb and curse in three parts:
May you live in interesting times
May you come to the attention of those in authority
May you find what you are looking for [2]

In terms of your email, everyone who has root access to the machine which stores it (which includes employees of the companies that provide warranty service to all the hardware in the server room) and every help-desk person who can login to your account to diagnose problems is in a position of authority. Being merely one of thousands of customers (or millions of customers for a larger service) is a measure of safety.

As for the “interesting times” issue, the Republican party is trying to keep the issue focussed on the wars instead of on the economy. The problem with basing a campaign on wars is that many people will come to the conclusion that the election is not about people merely losing some money, but people dying. This could be sufficient to convince people that the right thing to do is not to abide by the usual standards for ethical behavior when dealing with private data, but to instead try and find something that can be used to affect the result of an election.

Mail that is not encrypted (most mail isn’t) and which is not transferred with TLS (so few mail servers support TLS that it hardly seems worth the effort of implementing it) can be intercepted at many locations (the sending system and routers are two options). But the receiving system is the easiest location. A big advantage for a hostile party in getting mail from the receiving system is that it can be polled quickly (an external attacker could use an open wireless access-point and move on long before anyone could catch them) and that if it is polled from inside the company that runs the mail server there is almost never any useful audit trail (if a sysadmin logs in to a server 10 times a day for real work reasons, and 11th login to copy some files will not be noticed).

One of the problems with leaks of secret data is that it is often impossible to know whether they have happened. While there is public evidence of one attack on Sarah Palin’s Yahoo account, there is no evidence that it was the first attack. If someone had obtained the password (through an insider in Yahoo, or through a compromised client machine) then they could have been copying all the mail for months without being noticed.

It seems to me that your choice of ISP needs to be partly determined by how many hostile parties will want to access your mail and what resources they may be prepared to devote to it. For a significant political candidate using a government email address seems like the best option, with the alternative being to use a server owned and run by the political party in question, if you can have the staff fired for leaking your mail then your email will be a lot safer!

RPC and SE Linux

One ongoing problem with TCP networking is the combination of RPC services and port based services on the same host. If you have an RPC service that uses a port less than 1024 then typically it will start at 1023 and try lower ports until it finds one that works. A problem that I have had in the past is that an RPC service used port 631 and I then couldn’t start CUPS (which uses that port). A similar problem can arise in a more insidious manner if you have strange networking devices such as a BMC [1] which uses the same IP address as the host and just snarfs connections for itself (as documented by pantz.org [2]), this means that according to the OS the port in question is not in use, but connections to that port will go to the hardware BMC and the OS won’t see them.

Another solution is to give a SE Linux security context to the port which prevents the RPC service from binding to it. RPC applications seem to be happy to make as many bind attempts as necessary to get an available port (thousands of attempts if necessary) so reserving a few ports is not going to cause any problems. As far as I recall my problems with CUPS and RPC services was a motivating factor in some of my early work on writing SE Linux policy to restrict port access.

Of course the best thing to do is to assign IP addresses for IPMI that are different from the OS IP addresses. This is easy to do and merely requires an extra IP address for each port. As a typical server will have two Ethernet ports on the baseboard (one for the front-end network and one for the private network) that means an extra two IP addresses (you want to use both interfaces for redundancy in case the problem which cripples a server is related to one of the Ethernet ports). But for people who don’t have spare IP addresses, SE Linux port labeling could really help.

Getting Started with Amazon EC2

The first thing you need to do to get started using the Amazon Elastic Compute Cloud (EC2) [1] is to install the tools to manage the service. The service is run in a client-server manner. You install the client software on your PC to manage the EC2 services that you use.

There are the AMI tools to manage the machine images [2] and the API tools to launch and manage instances [3].

The AMI tools come as both a ZIP file and an RPM package and contain Ruby code, while the API tools are written in Java and only come as a ZIP file.

There are no clear license documents that I have seen for any of the software in question, I recall seeing one mention on one of the many confusing web pages of the code being “proprietary” but nothing else. While it seems most likely (but far from certain) that Amazon owns the copyright to the code in question, there is no information on how the software may be used – apart from an implied license that if you are a paying EC2 customer then you can use the tools (as there is no other way to use EC2). If anyone can find a proper license agreement for this software then please let me know.

To get software working in the most desirable manner it needs to be packaged for the distribution on which it is going to be used, as I prefer to use Debian that means packaging it for Debian. Also when packaging the software you can fix some of the silly things that get included in software that is designed for non-packaged release (such as demanding that environment variables be set to specify where the software is installed). So I have built packages for Debian/Lenny for the benefit of myself and some friends and colleagues who use Debian and EC2.

As I can’t be sure of what Amazon would permit me to do with their code I have to assume that they don’t want me to publish Debian packages for the benefit of all Debian and Ubuntu users who are (or might become) EC2 customers. So instead I have published the .diff.gz files from my Debian/Lenny packages [4] to allow other people to build identical packages after downloading the source from Amazon. At the moment the packages are a little rough, and as I haven’t actually got an EC2 service running with them yet they may have some really bad bugs. But getting the software to basically work took more time than expected. So even if there happen to be some bugs that make it unusable in it’s current state (the code for determining where it looks for PEM files at best needs a feature enhancement and at worst may be broken at the moment) then it would still save people some time to use my packages and fix whatever needs fixing.