Archives

Categories

I Just Joined SAGE

I’ve just joined SAGE AU – the System Administrators Guild of Australia [1] .

I’ve known about SAGE for a long time, in 2006 I presented a paper at their conference [2] (here is the paper [3] – there are still some outstanding issues from that one, I’ll have to revisit it).

They have been doing good things for a long time, but I haven’t felt that there was enough benefit to make it worth spending money (there are a huge variety of free things that I can do related to Linux which I don’t have time to do). But now Matt Bottrell has been promoting Internode and SAGE, SAGE members get a 15% discount [4]. As I’ve got one home connection through Internode and will soon get another it seems like time to join SAGE.

BIND Stats

In Debian the BIND server will by default append statistics to the file /var/cache/bind/named.stats when the command rndc stats (which seems to be undocumented) is run. The default for RHEL4 seems to be /var/named/chroot/var/named/data/named_stats.txt.

The output will include the time-stamp of the log in the number of seconds since 1970-01-01 00:00:00 UTC (see my previous post explaining how to convert this to a regular date format [1]).

By default this only logs a summary for all zones, which is not particularly useful if you have multiple zones. If you edit the BIND configuration and put zone-statistics 1; in the options section then it will log separate statistics for each zone. Unfortunately if you add this and apply the change via rndc reload I don’t know of any convenient way that you can determine when this change was made and therefore the period of time for which the per-zone statistics were kept. So after applying this to my servers I restarted the named processes so that it will be obvious from the process start time when the statistics started.

The reason I became interested in this is when a member of a mailing list that I subscribe to was considering the DNSMadeEasy.com service. That company runs primary DNS servers for $US15 per annum which allows 1,000,000 queries per month, 3 zones, and 120 records (for either a primary or a secondary server). Based on three hours of statistics it seems like my zone coker.com.au is going to get about 360,000 queries a month (between both the primary and the secondary server). So the $15 a year package could accommodate 3 such zones for either primary or secondary (they each got about half the traffic). I’m not considering outsourcing my DNS, but it is interesting to consider how the various offers add up.

Another possibility for people who are considering DNS outsourcing is Xname.org which provides free DNS (primary and secondary) but request contributions from business customers (or anyone else).

Updated because I first published it without getting stats from my secondary server.

The Date Command and Seconds Since 1970-01-01

The man page for the date command says that the %s option will give “seconds since 1970-01-01 00:00:00 UTC“. I had expected that everything that date did would give output in my time zone unless I requested otherwise.. But it seems that in this case the result is in UTC, and the same seems to be also true for most programs that log dates with the number of seconds.

In a quick google search for how to use a shell script to convert from the number of seconds since 1970-01-01 00:00:00 UTC to a more human readable format the only remotely useful result I found was the command date -d "1970-01-01 1212642879 sec", which in my timezone gives an error of ten hours (at the moment – it would give eleven hours in summer). The correct method is date -d "1970-01-01 1212642879 sec utc", you can verify this with the command date -d "1970-01-01 $(date +%s) sec utc" (which should give the same result as date).

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

Mobile Facebook

A few of my clients have asked me to configure their routers to block access to Facebook and Myspace. Apparently some employees spend inappropriate amounts of time using those services while at work. Using iptables to block port 80 and configuring Squid to reject access to those sites is easy to do.

So I was interested to see an advertising poster in my local shopping centre promoting the Telstra “Next G Mobile” which apparently offers “Facebook on the go“. I’m not sure whether Telstra has some special service for accessing Facebook (maybe a Facebook client program running on the phone) or whether it’s just net access on the phone which can be used for Facebook (presumably with a version of the site that is optimised for a small screen).

I wonder if I’ll have clients wanting me to firewall the mobile phones of their employees (of course it’s impossible for me to do it – but they don’t know that).

I have previously written about the benefits of a 40 hour working week for productivity and speculated on the possibility that for some employees the optimum working time might be less than 40 hours a week [1]. I wonder whether there are employees who could get more work done by spending 35 hours working and 5 hours using Facebook than they could by working for 40 hours straight.

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?

Links May 2008

The Daily WTF has published an interesting essay on why retaining staff is not always a good thing [1]. The main point is that good people get bored and want to move on while mediocre people want to stay, but there are other points and it’s worth reading.

Following the links from that article led me to an article comparing managing IT departments to managing professional sports teams [2]. They chose US football (a sport I know little about and have no interest in) so I probably missed some of the content. But they have some good points.

John Goerzen gave a good presentation to the idea of increasing petrol taxes and decreasing other taxes to have a revenue neutral result while also discouraging petrol use [3]. I credit him with presenting the idea not inventing it because I have heard similar ideas several times before (but not nearly as well written and not written from a right-wing perspective). Hopefully some people who read his presentation will be more receptive than they were to the other versions of the same idea.

Craig Venter spoke at TED about his work in creating artificial life [4]. He spent some time talking about the possibilities of creating artificial organisms to produce fuels directly from CO2 and sunlight.

Nick Bostrom published a paper explaining why he hopes that the SETI projects find nothing [5]. His theory is that the fact that our solar system has not been colonised and that we have seen no immediate evidence of extra-terrestrial life indicates that there is a “Great Filter” which is a stage of evolution for which it is most unlikely that any species will pass. If the Great Filter is in our past (he cites the evolution of multi-celled life as one of the possibilities, and the evolution of prokaryotes into eukaryotes as another) then it means that our future might be more successful than if the Great Filter was something that tended to happen to advanced societies.

Jared Diamond (the author of Collapse), has written an interesting article about vengeance [6]. He focuses on an example in New Guinea and uses it to explain why personal vendettas tend to run wildly out of control and how society is best served by having the state punish criminals.

CPU Capacity for Virtualisation

Today a client asked me to advise him on how to dramatically reduce the number of servers for his business. He needs to go from 18 active servers to 4. Some of the machines in the network are redundant servers. By reducing some of the redundancy I can remove four servers, so now it’s a need to go from 14 to 4.

To determine the hardware requirements I analyzed the sar output from all machines. The last 10 days of data were available, so I took the highest daily average numbers from each machine for user and system CPU load and added them up, the result was 221%. So for the average daily CPU use three servers would have enough power to run the entire network. Then I looked at the highest 5 minute averages for user and system CPU load from each machine which add up to 582%. So if all machines were to have their peak usage times simultaneously (which doesn’t happen) then the CPU power of six machines would be needed. I conclude that the CPU power requirements are somewhere between 3 and 6 machines, so 4 machines may do an OK job.

The next issue is IO capacity. The current network has 2G of RAM in each machine and I plan to run it all on 4G Xen servers, so it’s a total of 16G of RAM instead of 36G. While some machines currently have unused memory I expect that the end result of this decrease in total RAM will be more cache misses and more swapping so the total IO capacity use will increase slightly. Now four of the servers (which will eventually become Xen Dom0’s) have significant IO capacity (large RAIDs – they appear to have 10*72G disks in a RAID-5) and the rest have a smaller IO capacity (they appear to have 4*72G disks in a RAID-10). The other 14 machines have the highest daily averages for iowait adding up to 9% and the highest 5 minute averages adding up to 105%. I hope that spreading that 105% of the IO capacity of a 4 DISK RAID-10 across four sets of 10 disk RAID-5’s won’t give overly bad performance.

I am concerned that there may be some flaw in the methodology that I am using to estimate capacity. One issue is that I’m very doubtful about the utility of measuring iowait, one issue is that iowait is the amount of IDLE CPU time when there are processes blocked on IO. So if for example you have 100% CPU time being used then iowait will be zero regardless of how much disk IO is in progress! One check that I performed was to add the maximum CPU time used, the maximum iowait, and the minimum IDLE time. Most machines gave totals that were very close to 100% when those columns were added, so it seems that if the maximum iowait for a 5 minute period plus the maximum CPU use plus the minimum idle time add up to 100% and the minimum idle time was not very low then it seems unlikely that there was any significant overlap between disk IO and CPU use to hide iowait. One machine had a total of 147% for those fields in the 5 minute average which suggests that the IO load may be higher than the 66% iowait number may indicate. But if I put that in a DomU on the machine with the most unused IO capacity then it should be OK.

I will be interested to read any suggestions for how to proceed with this. But unfortunately it will probably be impossible to consider any suggestion which involves extra hardware or abandoning the plan due to excessive risk…

I will write about the results.

Hosting a Xen Server

Yesterday I wrote about my search for a hosting provider for a Xen DomU [1]. One response was the suggestion to run a Dom0 and sell DomU’s to other people [2], it was pointed out that Steve Kemp’s Xen-Hosting.org project is an example of how to do this well [3]. Unfortunately Steve’s service is full and he is not planning to expand.

I would be open to the idea of renting a physical machine and running the Xen server myself, but that might be a plan for some other time (of course if a bunch of people want to sign up to such a thing and start hassling me I might change my mind). But at the moment I need to get some services online soon and I don’t want to spend any significant amount of money (I want to keep what I spend on net access below my Adsense revenue).

Also if someone else in the free software community wants to follow Steve’s example then I would be interested in hosting my stuff with them.

Xen Hosting

I’m currently deciding where to get a Xen DomU hosted. It will be used for a new project that I’m about to start which will take more bandwidth than my current ISP is prepared to offer (or at least they would want me to start paying and serious bandwidth is expensive in Australia). Below is a table of the options I’ve seriously considered so far (I rejected Dreamhost based on their reputation and some other virtual hosts were obviously not able to compare with the prices of the ones in the table). For each ISP I listed the two cheapest options, as I want to save money I’ll probably go for the cheapest option at the ISP I choose but want the option of upgrading if I need more.

I’m not sure how much storage I need, I think that 4.5G is probably not enough and even 6G might get tight. Also of course it depends on how many friends I share the server with.

Quantact has a reasonable cheap option for $15, but the $25 option is expensive and has little RAM. Probably 192M of RAM would be the minimum if I’m going to share the machine with two or more friends (to share the costs).

VPSland would have rated well if it wasn’t for the fact that they once unexpectedly deleted a DomU belonging to a client (they claimed that the bill wasn’t paid) and had no backups. Disabling a service when a bill is not paid is fair, charging extra for the “service” of reenabling it is acceptable, but just deleting it with no backups is unacceptable. But as I’m planning on serving mostly static data this won’t necessarily rule them out of consideration.

It seems that linode and slicehost are the best options (Slicehost seems the most clueful and Linode might be the second most). Does anyone have suggestions about other Xen servers that I haven’t considered?

XenEurope seems interesting. One benefit that they have is being based in the Netherlands which has a strong rule of law (unlike the increasingly corrupt US). A disadvantage is that the Euro is a strong currency and is expected to get even stronger. Services paid in Euros should be expected to cost more in future when paid in Australian dollars, while services paid in US dollars should be expected to cost less.

Gandi.net has an interesting approach, they divide a server into 64 “shares” and then you can buy as many as you want (up to 16 shares for 1/4 of a server) for your server. If at any time you run out of bandwidth then you just buy more shares. They also limit bandwidth by guaranteed transfer rate (in multiples of 3Mb/s) instead of limiting the overall data transferred on a per-monthly basis (as most providers do). They don’t mention whether you can burst above that 3Mb/s limit – while 3Mb/s for 24*7 is a significant amount of data transfer it isn’t that much if you have a 200MB file that will be downloaded a few times a day while interactive tasks are also in progress (something that may be typical usage for my server). Of course other providers generally don’t provide any information on how fast data can be transferred and will often be smaller than 3Mb/s.

Also if anyone who I know wants to share access to a server then please contact me via private mail.

ISP RAM Disk Bandwidth (per month) Price $US
Linode 360M 10G 200GB $20
Linode 540M 15G 300GB $30
Slicehost 256M 10G 100GB $20
Slicehost 512M 20G 200GB $38
VPSLand 192M 6G 150GB $16
VPSLand 288M 8G 200GB $22
Quantact 96M 4.5G 96GB $15
Quantact 128M 6G 128GB $25
rimuhosting 96M 4G 30G $20
XenEurope 128M 10G 100G $16 (E10)
XenEurope 256M 20G 150G $28 (E17.50)
Gandi.net 256M 5G 3Mb/s $7.50 or E6