Archives

Categories

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

Mobile Facebook

A few of my clients have asked me to configure their routers to block access to Facebook and Myspace. Apparently some employees spend inappropriate amounts of time using those services while at work. Using iptables to block port 80 and configuring Squid to reject access to those sites is easy to do.

So I was interested to see an advertising poster in my local shopping centre promoting the Telstra “Next G Mobile” which apparently offers “Facebook on the go“. I’m not sure whether Telstra has some special service for accessing Facebook (maybe a Facebook client program running on the phone) or whether it’s just net access on the phone which can be used for Facebook (presumably with a version of the site that is optimised for a small screen).

I wonder if I’ll have clients wanting me to firewall the mobile phones of their employees (of course it’s impossible for me to do it – but they don’t know that).

I have previously written about the benefits of a 40 hour working week for productivity and speculated on the possibility that for some employees the optimum working time might be less than 40 hours a week [1]. I wonder whether there are employees who could get more work done by spending 35 hours working and 5 hours using Facebook than they could by working for 40 hours straight.

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?

Links May 2008

The Daily WTF has published an interesting essay on why retaining staff is not always a good thing [1]. The main point is that good people get bored and want to move on while mediocre people want to stay, but there are other points and it’s worth reading.

Following the links from that article led me to an article comparing managing IT departments to managing professional sports teams [2]. They chose US football (a sport I know little about and have no interest in) so I probably missed some of the content. But they have some good points.

John Goerzen gave a good presentation to the idea of increasing petrol taxes and decreasing other taxes to have a revenue neutral result while also discouraging petrol use [3]. I credit him with presenting the idea not inventing it because I have heard similar ideas several times before (but not nearly as well written and not written from a right-wing perspective). Hopefully some people who read his presentation will be more receptive than they were to the other versions of the same idea.

Craig Venter spoke at TED about his work in creating artificial life [4]. He spent some time talking about the possibilities of creating artificial organisms to produce fuels directly from CO2 and sunlight.

Nick Bostrom published a paper explaining why he hopes that the SETI projects find nothing [5]. His theory is that the fact that our solar system has not been colonised and that we have seen no immediate evidence of extra-terrestrial life indicates that there is a “Great Filter” which is a stage of evolution for which it is most unlikely that any species will pass. If the Great Filter is in our past (he cites the evolution of multi-celled life as one of the possibilities, and the evolution of prokaryotes into eukaryotes as another) then it means that our future might be more successful than if the Great Filter was something that tended to happen to advanced societies.

Jared Diamond (the author of Collapse), has written an interesting article about vengeance [6]. He focuses on an example in New Guinea and uses it to explain why personal vendettas tend to run wildly out of control and how society is best served by having the state punish criminals.

CPU Capacity for Virtualisation

Today a client asked me to advise him on how to dramatically reduce the number of servers for his business. He needs to go from 18 active servers to 4. Some of the machines in the network are redundant servers. By reducing some of the redundancy I can remove four servers, so now it’s a need to go from 14 to 4.

To determine the hardware requirements I analyzed the sar output from all machines. The last 10 days of data were available, so I took the highest daily average numbers from each machine for user and system CPU load and added them up, the result was 221%. So for the average daily CPU use three servers would have enough power to run the entire network. Then I looked at the highest 5 minute averages for user and system CPU load from each machine which add up to 582%. So if all machines were to have their peak usage times simultaneously (which doesn’t happen) then the CPU power of six machines would be needed. I conclude that the CPU power requirements are somewhere between 3 and 6 machines, so 4 machines may do an OK job.

The next issue is IO capacity. The current network has 2G of RAM in each machine and I plan to run it all on 4G Xen servers, so it’s a total of 16G of RAM instead of 36G. While some machines currently have unused memory I expect that the end result of this decrease in total RAM will be more cache misses and more swapping so the total IO capacity use will increase slightly. Now four of the servers (which will eventually become Xen Dom0’s) have significant IO capacity (large RAIDs – they appear to have 10*72G disks in a RAID-5) and the rest have a smaller IO capacity (they appear to have 4*72G disks in a RAID-10). The other 14 machines have the highest daily averages for iowait adding up to 9% and the highest 5 minute averages adding up to 105%. I hope that spreading that 105% of the IO capacity of a 4 DISK RAID-10 across four sets of 10 disk RAID-5’s won’t give overly bad performance.

I am concerned that there may be some flaw in the methodology that I am using to estimate capacity. One issue is that I’m very doubtful about the utility of measuring iowait, one issue is that iowait is the amount of IDLE CPU time when there are processes blocked on IO. So if for example you have 100% CPU time being used then iowait will be zero regardless of how much disk IO is in progress! One check that I performed was to add the maximum CPU time used, the maximum iowait, and the minimum IDLE time. Most machines gave totals that were very close to 100% when those columns were added, so it seems that if the maximum iowait for a 5 minute period plus the maximum CPU use plus the minimum idle time add up to 100% and the minimum idle time was not very low then it seems unlikely that there was any significant overlap between disk IO and CPU use to hide iowait. One machine had a total of 147% for those fields in the 5 minute average which suggests that the IO load may be higher than the 66% iowait number may indicate. But if I put that in a DomU on the machine with the most unused IO capacity then it should be OK.

I will be interested to read any suggestions for how to proceed with this. But unfortunately it will probably be impossible to consider any suggestion which involves extra hardware or abandoning the plan due to excessive risk…

I will write about the results.

Hosting a Xen Server

Yesterday I wrote about my search for a hosting provider for a Xen DomU [1]. One response was the suggestion to run a Dom0 and sell DomU’s to other people [2], it was pointed out that Steve Kemp’s Xen-Hosting.org project is an example of how to do this well [3]. Unfortunately Steve’s service is full and he is not planning to expand.

I would be open to the idea of renting a physical machine and running the Xen server myself, but that might be a plan for some other time (of course if a bunch of people want to sign up to such a thing and start hassling me I might change my mind). But at the moment I need to get some services online soon and I don’t want to spend any significant amount of money (I want to keep what I spend on net access below my Adsense revenue).

Also if someone else in the free software community wants to follow Steve’s example then I would be interested in hosting my stuff with them.

Xen Hosting

I’m currently deciding where to get a Xen DomU hosted. It will be used for a new project that I’m about to start which will take more bandwidth than my current ISP is prepared to offer (or at least they would want me to start paying and serious bandwidth is expensive in Australia). Below is a table of the options I’ve seriously considered so far (I rejected Dreamhost based on their reputation and some other virtual hosts were obviously not able to compare with the prices of the ones in the table). For each ISP I listed the two cheapest options, as I want to save money I’ll probably go for the cheapest option at the ISP I choose but want the option of upgrading if I need more.

I’m not sure how much storage I need, I think that 4.5G is probably not enough and even 6G might get tight. Also of course it depends on how many friends I share the server with.

Quantact has a reasonable cheap option for $15, but the $25 option is expensive and has little RAM. Probably 192M of RAM would be the minimum if I’m going to share the machine with two or more friends (to share the costs).

VPSland would have rated well if it wasn’t for the fact that they once unexpectedly deleted a DomU belonging to a client (they claimed that the bill wasn’t paid) and had no backups. Disabling a service when a bill is not paid is fair, charging extra for the “service” of reenabling it is acceptable, but just deleting it with no backups is unacceptable. But as I’m planning on serving mostly static data this won’t necessarily rule them out of consideration.

It seems that linode and slicehost are the best options (Slicehost seems the most clueful and Linode might be the second most). Does anyone have suggestions about other Xen servers that I haven’t considered?

XenEurope seems interesting. One benefit that they have is being based in the Netherlands which has a strong rule of law (unlike the increasingly corrupt US). A disadvantage is that the Euro is a strong currency and is expected to get even stronger. Services paid in Euros should be expected to cost more in future when paid in Australian dollars, while services paid in US dollars should be expected to cost less.

Gandi.net has an interesting approach, they divide a server into 64 “shares” and then you can buy as many as you want (up to 16 shares for 1/4 of a server) for your server. If at any time you run out of bandwidth then you just buy more shares. They also limit bandwidth by guaranteed transfer rate (in multiples of 3Mb/s) instead of limiting the overall data transferred on a per-monthly basis (as most providers do). They don’t mention whether you can burst above that 3Mb/s limit – while 3Mb/s for 24*7 is a significant amount of data transfer it isn’t that much if you have a 200MB file that will be downloaded a few times a day while interactive tasks are also in progress (something that may be typical usage for my server). Of course other providers generally don’t provide any information on how fast data can be transferred and will often be smaller than 3Mb/s.

Also if anyone who I know wants to share access to a server then please contact me via private mail.

ISP RAM Disk Bandwidth (per month) Price $US
Linode 360M 10G 200GB $20
Linode 540M 15G 300GB $30
Slicehost 256M 10G 100GB $20
Slicehost 512M 20G 200GB $38
VPSLand 192M 6G 150GB $16
VPSLand 288M 8G 200GB $22
Quantact 96M 4.5G 96GB $15
Quantact 128M 6G 128GB $25
rimuhosting 96M 4G 30G $20
XenEurope 128M 10G 100G $16 (E10)
XenEurope 256M 20G 150G $28 (E17.50)
Gandi.net 256M 5G 3Mb/s $7.50 or E6

IPSEC is Pain

I’ve been trying to get ipsec to work correctly as a basic VPN between two CentOS 5 systems. I set up the ipsec devices according to the IPSEC section of the RHEL4 security guide [1] (which is the latest documentation available and it seems that nothing has changed since). The documentation is quite good, but getting IPSEC working is really difficult. One thing I really don’t like about IPSEC is the way that it doesn’t have a device, I prefer to have my VPN have it’s own device so that I can easily run tcpdump on the encrypted or the unencrypted stream (or two separate tcpdump sessions) if that’s necessary to discover the cause of the problem).

I’ve got IPSEC basically working, and I probably could get it working fully, but it doesn’t seem worth the effort.

While fighting with IPSEC at the more complex site (multiple offices which each have multiple networks, and switching paths to route around unreliable networks) I set up an IPSEC installation at a very simple site (two offices within 200 meters both with static public IP addresses, no dynamic routing or anything else complex). The simple site STILL doesn’t work as well as desired, one issue that still concerns me is the arbitrary MTU sizes in some routing gear which (for some reason I haven’t diagnosed yet) lose packets if I have an MTU over 1470 bytes.

So today I set up a test network with OpenVPN [2]. It was remarkably simple, the server config file (/etc/openvpn/server.conf) is as follows:
dev tun
ifconfig 10.8.0.1 10.8.0.2
secret static.key
comp-lzo

This means that the IP address 10.8.0.1 will be used for the “server” end of the tunnel, and 10.8.0.2 is the “client” end. The secret is stored in /etc/openvpn/static.key (which is the same on both machines and is generated by “openvpn --genkey --secret static.key“).

The client config file (/etc/openvpn/client.conf) is as follows:
remote 10.5.0.2
dev tun
ifconfig 10.8.0.2 10.8.0.1
secret static.key
comp-lzo

Then I enable IP forwarding on both VPN machines, open UDP port 1194 (the command “lokkit -q -p 1194:udp” does this) and start the daemon on each end. The script /etc/init.d/openvpn (in Dag Wieers package for CentOS 5 – which I believe is the standard script) will just take every file matching /etc/openvpn/*.conf as a service to start.

The end result is a point to point link that I can easily route other packets to, I can easily get dynamic routing daemons to add routes pointing to it. Nothing like the IPSEC configuration where the config file needs to have the IP address range hard-coded, I can just add routes whenever I want.

This isn’t necessarily going to be the way I deploy it. The documentation notes that a disadvantage is “lack of perfect forward secrecy — key compromise results in total disclosure of previous sessions“. But what gives me confidence is the fact that it was so easy to get it going, if I have problems in adding further features to the configuration it should be easy to debug. As opposed to IPSEC where it’s all complex and if it doesn’t work then it’s all pain.

Also I tested this out with four Xen DomU’s, two running as VPN routers and two as clients on separate segments of the VPN. They were connected with three bridge devices. I’ll blog about how to set this up if there is interest.

School Bag Weight

Matt Bottrell has written about some issues related to the acceptable weight of laptops for school use [1].

Matt cites a reference from the Victorian government stating that a school bag should not be heavier than 10% of the body weight of the child who carries it [2]. So the next thing we need to do is to calculate what a student can carry without being at risk of spinal injuries.

Firstly we need to determine what children weigh, if we restrict this to the 14+ age range (the older children have more need of computers) then children are almost as heavy as they will be when they are 18 and can carry significantly heavier bags than those in the 10+ age range. Also it seems reasonable to consider the 25th percentile weight (the weight which is exceeded by 75% of children). Of course this means that 25% would be carrying overly heavy bags but it does give us a bit more weight allowance.

The 25th percentile weight for white girls in the US is 48Kg [3]. The 25th percentile weight for white boys in the US is also 48Kg [4]. When considering the Australian situation it seems that white children in the US will be most similar (in terms of genetics and diet).

The 25th percentile weights at age 18 are 53Kg for girls and 64Kg for boys.

So the acceptable bag weights would be 4.8Kg for 14yos, 5.3Kg for 18yo girls, and 6.4Kg for 18yo boys.

The next step is to determine the weight carried to school. The weight of one of my laptop bags is almost 1Kg, I think that a school-bag would have a similar weight.

When I was at school I recall that the worst days for carrying baggage were when I had PE (Physical Education – sports), the weight of PE gear was a significant portion of the overall weight I carried. I tried to estimate this by weighing a track-suit top, a t-shirt, and a pair of board-shorts (the only shorts I could find at short notice), and it was almost 1Kg. While the board-shorts might weigh more than PE uniform shorts I didn’t include the weight of track-suit pants. Assuming that a female PE uniform has the same weight as a male uniform is the least of my assumptions.

The weight of a good pair of sneakers (suitable for playing basketball and other school sports) that is in my size is just over 1Kg.

To get a rough estimate of the weight of lunch I put six slices of bread and a small apple in a lunch box and found that it weighed 500g. A real lunch would probably include four slices of bread but the other things would weigh at least as much as the extra two slices. If a drink was included then it could be more than 500g.

So the total is 3.5Kg for bare essentials (including PE gear) without including any books!

It seems that it would be impossible to stick to the government recommendations for school bag weight if a full set of PE gear is included. Probably the best thing that could be done would be to make a school uniform allow wearing sneakers which removes 1Kg from the overall bag weight.

So a bag with lunch and PE gear (minus sneakers) is about 2.5Kg, leaving 2.3Kg for books etc at age 14. As text books for 14yo children are considerably thinner than those for 18yo children, it seems that this might be achievable. Fitting a couple of text books and an EeePC into the 2.3Kg weight budget should be achievable. But fitting a full size laptop (which seems to start at about 1.8Kg) and some text books into a 2.3Kg budget will be a challenge – it might be possible to do that but wouldn’t leave any budget for the random junk that children tend to carry around.

For an 18yo girl, the weight budget (after the basics have been deducted) is 2.8Kg, it seems likely that on occasion the weight of year-12 text books will exceed that. Therefore it seems that the only way of avoiding spinal injuries in year-12 girls would be to have text books stored in digital form on a light laptop such as an EeePC. Rather than avoiding the use of laptops because of weight (as some people suggest), laptops with electronic text books should be used to replace traditional text books! An EeePC weighing less than 1Kg will give a significant amount of extra capacity for any other things that need to be carried. If there is little other stuff to be carried then 75% of 18yo girls should be able to carry a full size laptop plus PE gear without risk of back injuries. If digital text books are used then if in any journey two text books (which according to my previous measurements can be as much as 1.6Kg [5]) can be replaced with an EeePC then overall something like 600g is saved (depending on the configuration of the EeePC, if one battery was stored at home and another at school then it could save more than that).

It seems that a year 12 girl who has PE and three science subjects scheduled on the same day would be most likely to exceed the recommended weight in the current situation (even without having to carry a spare pair of shoes), and that carrying a laptop with digital text books would be the only way of avoiding back injury.

For an 18yo boy the weight budget is 3.9Kg after the basics have been deducted. So if they don’t carry other random stuff in their bag (I probably had 1Kg of random junk in my bag on occasion) then they could carry PE gear, a full sized laptop, AND a full set of text books.

It seems to me that there is no situation where children would be unable to have a laptop within the reasonable weight allowance if digital text books were used. The only way that the weight of a laptop could be a problem would be if it was carried in addition to all the text books.

One final point, it would be good if books from the Gutenberg project were used for studying English literature, that’s one easy way of reducing weight (and cost). Also it would be good if there was an option for a non-literature based English subject. Knowledge of English literature is of no value to most students. It would be better to teach students how to write while using topics that interest them. Maybe have blogging as an English project. ;)

Keith Olbermann on Bush

http://youtube.com/watch?v=TEBpC0GLr6Y

At the above Youtube page there is a video from MSNBC where Keith Olbermann discusses Bush’s record. Before I watched that I thought that it was impossible for me to have a lower opinion of Bush, however Keith’s presentation achieved the seemingly impossible task of making me despise the cretin even more.