6

The Date Command and Seconds Since 1970-01-01

The man page for the date command says that the %s option will give “seconds since 1970-01-01 00:00:00 UTC“. I had expected that everything that date did would give output in my time zone unless I requested otherwise.. But it seems that in this case the result is in UTC, and the same seems to be also true for most programs that log dates with the number of seconds.

In a quick google search for how to use a shell script to convert from the number of seconds since 1970-01-01 00:00:00 UTC to a more human readable format the only remotely useful result I found was the command date -d "1970-01-01 1212642879 sec", which in my timezone gives an error of ten hours (at the moment – it would give eleven hours in summer). The correct method is date -d "1970-01-01 1212642879 sec utc", you can verify this with the command date -d "1970-01-01 $(date +%s) sec utc" (which should give the same result as date).

18

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

7

Mobile Facebook

A few of my clients have asked me to configure their routers to block access to Facebook and Myspace. Apparently some employees spend inappropriate amounts of time using those services while at work. Using iptables to block port 80 and configuring Squid to reject access to those sites is easy to do.

So I was interested to see an advertising poster in my local shopping centre promoting the Telstra “Next G Mobile” which apparently offers “Facebook on the go“. I’m not sure whether Telstra has some special service for accessing Facebook (maybe a Facebook client program running on the phone) or whether it’s just net access on the phone which can be used for Facebook (presumably with a version of the site that is optimised for a small screen).

I wonder if I’ll have clients wanting me to firewall the mobile phones of their employees (of course it’s impossible for me to do it – but they don’t know that).

I have previously written about the benefits of a 40 hour working week for productivity and speculated on the possibility that for some employees the optimum working time might be less than 40 hours a week [1]. I wonder whether there are employees who could get more work done by spending 35 hours working and 5 hours using Facebook than they could by working for 40 hours straight.

6

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?