20

Dell PowerEdge T105

Today I received a Dell PowerEDGE T105 for use by a client. My client had some servers for development and testing hosted in a server room at significant expense. They also needed an offsite backup of critical data. So I suggested that they buy a cheap server-class machine, put it on a fast ADSL connection at their home, and use Xen DomU’s on that for development, testing, and backup. My client liked the concept but didn’t like the idea of having a server in his home.

So I’m going to run the server from my home. I selected a Dell PowerEDGE tower system because it’s the cheapest server-class machine that can be purchased new. I have a slight preference for HP hardware but HP gear is probably more expensive and they are not a customer focussed company (they couldn’t even give me a price).

So exactly a week after placing my order I received my shiny new Dell system, and it didn’t work. I booted a CentOS CD and ran “memtest” and the machine performed a hard reset. When it booted again it informed me that the event log had a message, and the message was “Uncorrectable ECC Error” with extra data of “DIMM 2,2“. While it sucks quite badly to receive a new machine that doesn’t work, that’s about the best result you can hope for when you have a serious error on the motherboard or the RAM. A machine without ECC memory would probably just randomly crash every so often and maybe lose data (see my previous post on the relative merits of ECC RAM and RAID [1]).

So I phoned up Dell (it’s a pity that their “Packing Slip” was a low quality photocopy which didn’t allow me to read their phone number and that the shipping box also didn’t include the number so I had to look them up on the web) to get technical support. Once we had established that by removing the DIMMs and reinserting them I had proved that there was a hardware fault they agreed to send out a technician with a replacement motherboard and RAM.

I’m now glad that I bought the RAM from Dell. Dell’s business model seems to revolve around low base prices for hardware and then extremely high prices for extras, for example Dell sells 1TB SATA disks for $818.40 while MSY [1] has them for $215 or $233 depending on brand.

When I get the machine working I will buy two 1TB disks from MSY (or another company with similar prices). Not only does that save some money but it also means that I can get different brands of disk. I believe that having different brands of hard disk in a RAID-1 array will decrease the probability of having them both fail at the same time.

One interesting thing about the PowerEdge T105 is that Dell will only sell two disks for it, but it has four SATA connectors on the motherboard, one is used for a SATA DVD player so it would be easy to support three disks. Four disks could be installed if a PCIe SATA controller was used (one in the space for a FDD and another in the space for a second CD/DVD drive), and if you were prepared to go without a CD/DVD drive then five internal disks could probably work. But without any special hardware the space for a second CD/DVD drive is just begging to be used for a third hard disk, most servers only use the primary CD/DVD drive for installing the OS and I expect that the demand for two CD/DVD drives in a server is extremely low. Personally I would prefer it if servers shipped with external USB DVD drives for installing the OS. Then when I install a server room I could leave one or two drives there in case a system recovery is needed and use the rest for desktop machines.

One thing that they seem to have messed up is the lack of a filter for the air intake fan at the front of the case. The Opteron CPU has a fan that’s about 11cm wide which sucks in air from the front of the machine, in front of that fan there is a 4cm gap which would nicely fit a little sponge filter. Either they messed up the design or somehow my air filter got lost in transit.

Incidentally if you want to buy from Dell in Australia then you need to configure your OS to not use ECN (Explicit Congestion Notification [2] as the Dell web servers used for sales rejects all connections from hosts with ECN enabled. It’s interesting that the web servers used for providing promotional information work fine with ECN and it’s only if you want to buy that it bites you.

But in spite of these issues, I am still happy with Dell overall. Their machine was DOA, that happens sometimes and the next day service is good (NB I didn’t pay extra for better service). I expect that they will fix it tomorrow and I’ll buy more of their gear in future.

Update: I forgot to mention that Dell shipped the machine with two power cables. While two power cables is a good thing for the more expensive servers that have redundant PSUs, for a machine with only one PSU it’s a bit of a waste. For some time I’ve been collecting computer power cables faster than I’ve been using them (due to machines dying and due to clients who want machines but already have spare power cables). So I’ve started giving them away at meetings of my local LUG. At the last meeting I gave away a bag of power cables and I plan to keep taking cables to the meetings until people stop accepting them.

5

The Cost of Owning a Car

There has been a lot of talk recently about the cost of petrol, Colin Charles is one of the few people to consider the issue of wages in this discussion [1]. Unfortunately almost no-one seems to consider the overall cost of running a vehicle.

While I can’t get the figures for Malaysia (I expect Colin will do that) I can get them for Australia. First I chose a car that’s cheap to buy, reasonably fuel efficient (small) and common (cheap parts from the wreckers) – the Toyota Corolla seemed like a good option.
Continue reading

7

ECC RAM is more useful than RAID

A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code [0]) RAM is only a server feature.

The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks for one person. Therefore when purchasing a desktop machine you can decide how much you are willing to spend for the safety and continuity of your work. For a server it’s more difficult as everyone has a different idea of how reliable a server should be in terms of uptime and in terms of data security. When running a server for a business there is the additional issue of customer confidence. If a server goes down occasionally customers start wondering what else might be wrong and considering whether they should trust their credit card details to the online ordering system.

So it is obviously apparent that servers need a different degree of reliability – and it’s easy to justify spending the money.

Desktop machines also need reliability, more so than most people expect. In a business when a desktop machine crashes it wastes employee time. If a crash wastes an hour (which is not unlikely given that previously saved work may need to be re-checked) then it can easily cost the business $100 (the value of the other work that the employee might have done). Two such crashes per week could cost the business as much as $8000 per year. The price difference between a typical desktop machine and a low-end workstation (or deskside server) is considerably less than that (when I investigated the prices almost a year ago desktop machines with server features ranged in price from $800 to $2400 [1]).

Some machines in a home environment need significant reliability. For example when students are completing high-school their assignments have a lot of time invested in them. Losing an assignment due to a computer problem shortly before it’s due in could impact their ability to get a place in the university course that they most desire! Then there is also data which is irreplaceable, one example I heard of was of a woman who’s computer had a factory pre-load of Windows, during a storm the machine rebooted and reinstalled itself to the factory defaults – wiping several years of baby photos… In both cases better backups would mostly solve the problem.

For business use the common scenario is to have file servers storing all data and have very little data stored on the PC (ideally have no data on the PC). In this case a disk error would not lose any data (unless the swap space was corrupted and something important was paged out when the disk failed). For home use the backup requirements are quite small. If a student is working on an important assignment then they can back it up to removable media whenever they reach a milestone. Probably the best protection against disk errors destroying assignments would be a bulk purchase of USB flash storage sticks.

Disk errors are usually easy to detect. Most errors are in the form of data which can not be read back, when that happens the OS will give an error message to the user explaining what happened. Then if you have good backups you revert to them and hope that you didn’t lose too much work in the mean-time (you also hope that your backups are actually readable – but that’s another issue). The less common errors are lost-writes – where the OS writes data to disk but the disk doesn’t store it. This is a little more difficult to discover as the drive will return bad data (maybe an old version of the file data or maybe data from a different file) and claim it to be good.

The general idea nowadays is that a filesystem should check the consistency of the data it returns. Two new filesystems, ZFS from Sun [2] and BTRFS from Oracle [3] implement checksums of data stored on disk. ZFS is apparently production ready while BTRFS is apparently not nearly ready. I expect that from now on whenever anyone designs a filesystem for anything but the smallest machines (EG PDAs and phones) they will include data integrity mechanisms in the design.

I believe that once such features become commonly used the need for RAID on low-end systems will dramatically decrease. A combination of good backups and knowing when your live data is corrupted will often be a good substitute for preserving the integrity of the live data. Not that RAID will necessarily protect your data – with most RAID configurations if a hard disk returns bad data and claims it to be good (the case of lost writes) then the system will not read data from other disks for checksum validation and the bad data will be accepted.

It’s easy to compute checksums of important files and verify them later. One simple way of doing so is to compress the files, every file compression program that I’ve seen has some degree of error detection.

Now the real problem with RAM which lacks ECC is that it can lose data without the user knowing. There is no possibility of software checks because any software which checks for data integrity could itself be mislead by memory errors. I once had a machine which experienced filesystem corruption on occasion, eventually I discovered that it had a memory error (memtest86+ reported a problem). I will never know whether some data was corrupted on disk because of this. Sifting through a large amount of stored data for some files which may have been corrupted due to memory errors is almost impossible. Especially when there was a period of weeks of unreliable operation of the machine in question.

Checking the integrity of file data by using the verify option of a file compression utility, fsck on a filesystem that stores checksums on data, or any of the other methods is not difficult.

I have a lot of important data on machines that don’t have ECC. One reason is that machines which have ECC cost more and have other trade-offs (more expensive parts, more noise, more electricity use, and the small supply makes it difficult to get good deals). Another is that there appear to be no laptops which support ECC (I use a laptop for most of my work). On the other hand RAID is very cheap and simple to implement, just buy a second hard disk and install software RAID – I think that all modern OSs support RAID as a standard installation option. So in spite of the fact that RAID does less good than a combination of ECC RAM and good backups (which are necessary even if you have RAID), it’s going to remain more popular in high-end desktop systems for a long time.

The next development that seems interesting is the large portion of the PC market which is designed not to have the space for more than one hard disk. Such compact machines (known as Small Form Factor or SFF) could easily be designed to support ECC RAM. Hopefully the PC companies will add reliability features in one area while removing them in another.

11

Record Oil Prices

MarketWatch reports that oil prices had the biggest daily gain on record, going up $11 in one day.

They claim that this is due to an impending Israeli attack on Iran and a weak US economy. $150 per barrel is the price that they predict for the 4th of July. That’s an interesting choice of date, I wonder whether they will be talking about “independence from Arabian oil”…

The New York Times has an interesting article on fuel prices [1]. Apparently sales of SUVs are dropping significantly.

The US senate is now debating a cap on carbon-dioxide production. The NY Times article suggests that if the new “carbon taxes” could be combined with tax cuts in other areas. If implemented correctly it would allow people who want to save money to reduce their overall tax payments by reducing fuel use. Also as increasing prices will decrease demand (thus decreasing the price at import time) it would to some degree mean transferring some revenue from the governments of the middle east to the US government.

The article also states that the Ford F series of “pickup trucks” was the most popular line of vehicles in the US for more than 20 years! But last month they were beaten by the Toyota Corolla and Camry and the Honda Civic and Accord. Now Ford needs to put more effort into their medium to large cars. With the hybrid Camry apparently already on sale in the US (their web site refuses to provide any information to me because I don’t have Flash installed so I can’t check) and rumored to be released soon in other countries Ford needs to put some significant amounts of effort into developing fuel efficient vehicles.

According to a story in the Herald Sun (published on the 23rd of April), survey results show that 1/3 of Victorians would cease using their car to get to work if the petrol price reached $1.75/L [2]. Now the Herald Sun has run a prediction (by the assistant treasurer and the NRMA) that $1.75/L will be reached next week (an increase of just over 10 cents a liter) [3].

The good news is that there will be less pollution in Australia in the near future (even if $1.75 is not reached I am certain that the price will increase enough to encourage some people to use public transport). The bad news is that our public transport is inadequate at the moment and there will be significant levels of overcrowding.

10

SE Linux Support in GPG

In May 2002 I had an idea for securing access to GNUPG [1]. What I did was to write SE Linux policy to only permit the gpg program to access the secret key (and other files in ~/.gnupg). This meant that the most trivial ways of stealing the secret key would be prevented. However an attacker could still use gpg to encrypt it’s secret key and write the data to some place that is accessible, for example the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg“. So what we needed was for gpg to either refuse to encrypt such files, or to spawn a child process for accessing such files (which could be granted different access to the filesystem). I filed the Debian bug report 146345 [2] requesting this feature.

In March upstream added this feature, the Debian package is currently not built with --enable-selinux-support so this feature isn’t enabled yet, but hopefully it will be soon. Incidentally the feature as currently implemented is not really SE Linux specific, it seems to me that there are many potential situations where it could be useful without SE Linux. For example if you were using one of the path-name based MAC systems (which I dislike – see what my friend Joshua Brindle wrote about them for an explanation [3]) then you could gain some benefits from this. A situation where there is even smaller potential for benefit is in the case of an automated system which runs gpg which could allow an attacker to pass bogus commands to it. When exploiting a shell script it might be easier to specify the wrong file to encrypt than to perform more sophisticated attacks.

When the feature in question is enabled the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg” will abort with the following error:
gpg: can’t open `/root/.gnupg/secring.gpg’: Operation not permitted
gpg: symmetric encryption of `/root/.gnupg/secring.gpg’ failed: file open error

Of course the command “gpg --export-secret-keys” will abort with the following error:
gpg: exporting secret keys not allowed
gpg: WARNING: nothing exported

Now we need to determine the correct way of exporting secret keys and modifying the GPG configuration. It might be best to allow exporting the secret keys when not running SE Linux (or other supported MAC systems), or when running in permissive mode (as in those situations merely copying the files will work). Although we could have an option in gpg.conf for this for the case where we want to prevent shell-script quoting hacks.

For editing the gpg.conf file and exporting the secret keys we could have a program similar in concept to crontab(1) which has PAM support to determine when it should perform it’s actions. Also it seems to me that crontab(1) could do with PAM support (I’ve filed Debian bug report 484743 [4] requesting this).

Finally one thing that should be noted is that the targeted policy for SE Linux does not restrict GPG (which runs in the unconfined_t domain). Thus most people who use SE Linux at the moment aren’t getting any benefits from such things. This will change eventually.

6

The Date Command and Seconds Since 1970-01-01

The man page for the date command says that the %s option will give “seconds since 1970-01-01 00:00:00 UTC“. I had expected that everything that date did would give output in my time zone unless I requested otherwise.. But it seems that in this case the result is in UTC, and the same seems to be also true for most programs that log dates with the number of seconds.

In a quick google search for how to use a shell script to convert from the number of seconds since 1970-01-01 00:00:00 UTC to a more human readable format the only remotely useful result I found was the command date -d "1970-01-01 1212642879 sec", which in my timezone gives an error of ten hours (at the moment – it would give eleven hours in summer). The correct method is date -d "1970-01-01 1212642879 sec utc", you can verify this with the command date -d "1970-01-01 $(date +%s) sec utc" (which should give the same result as date).

18

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

7

Mobile Facebook

A few of my clients have asked me to configure their routers to block access to Facebook and Myspace. Apparently some employees spend inappropriate amounts of time using those services while at work. Using iptables to block port 80 and configuring Squid to reject access to those sites is easy to do.

So I was interested to see an advertising poster in my local shopping centre promoting the Telstra “Next G Mobile” which apparently offers “Facebook on the go“. I’m not sure whether Telstra has some special service for accessing Facebook (maybe a Facebook client program running on the phone) or whether it’s just net access on the phone which can be used for Facebook (presumably with a version of the site that is optimised for a small screen).

I wonder if I’ll have clients wanting me to firewall the mobile phones of their employees (of course it’s impossible for me to do it – but they don’t know that).

I have previously written about the benefits of a 40 hour working week for productivity and speculated on the possibility that for some employees the optimum working time might be less than 40 hours a week [1]. I wonder whether there are employees who could get more work done by spending 35 hours working and 5 hours using Facebook than they could by working for 40 hours straight.

6

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?

26

Xen Hosting

I’m currently deciding where to get a Xen DomU hosted. It will be used for a new project that I’m about to start which will take more bandwidth than my current ISP is prepared to offer (or at least they would want me to start paying and serious bandwidth is expensive in Australia). Below is a table of the options I’ve seriously considered so far (I rejected Dreamhost based on their reputation and some other virtual hosts were obviously not able to compare with the prices of the ones in the table). For each ISP I listed the two cheapest options, as I want to save money I’ll probably go for the cheapest option at the ISP I choose but want the option of upgrading if I need more.

I’m not sure how much storage I need, I think that 4.5G is probably not enough and even 6G might get tight. Also of course it depends on how many friends I share the server with.

Quantact has a reasonable cheap option for $15, but the $25 option is expensive and has little RAM. Probably 192M of RAM would be the minimum if I’m going to share the machine with two or more friends (to share the costs).

VPSland would have rated well if it wasn’t for the fact that they once unexpectedly deleted a DomU belonging to a client (they claimed that the bill wasn’t paid) and had no backups. Disabling a service when a bill is not paid is fair, charging extra for the “service” of reenabling it is acceptable, but just deleting it with no backups is unacceptable. But as I’m planning on serving mostly static data this won’t necessarily rule them out of consideration.

It seems that linode and slicehost are the best options (Slicehost seems the most clueful and Linode might be the second most). Does anyone have suggestions about other Xen servers that I haven’t considered?

XenEurope seems interesting. One benefit that they have is being based in the Netherlands which has a strong rule of law (unlike the increasingly corrupt US). A disadvantage is that the Euro is a strong currency and is expected to get even stronger. Services paid in Euros should be expected to cost more in future when paid in Australian dollars, while services paid in US dollars should be expected to cost less.

Gandi.net has an interesting approach, they divide a server into 64 “shares” and then you can buy as many as you want (up to 16 shares for 1/4 of a server) for your server. If at any time you run out of bandwidth then you just buy more shares. They also limit bandwidth by guaranteed transfer rate (in multiples of 3Mb/s) instead of limiting the overall data transferred on a per-monthly basis (as most providers do). They don’t mention whether you can burst above that 3Mb/s limit – while 3Mb/s for 24*7 is a significant amount of data transfer it isn’t that much if you have a 200MB file that will be downloaded a few times a day while interactive tasks are also in progress (something that may be typical usage for my server). Of course other providers generally don’t provide any information on how fast data can be transferred and will often be smaller than 3Mb/s.

Also if anyone who I know wants to share access to a server then please contact me via private mail.

ISP RAM Disk Bandwidth (per month) Price $US
Linode 360M 10G 200GB $20
Linode 540M 15G 300GB $30
Slicehost 256M 10G 100GB $20
Slicehost 512M 20G 200GB $38
VPSLand 192M 6G 150GB $16
VPSLand 288M 8G 200GB $22
Quantact 96M 4.5G 96GB $15
Quantact 128M 6G 128GB $25
rimuhosting 96M 4G 30G $20
XenEurope 128M 10G 100G $16 (E10)
XenEurope 256M 20G 150G $28 (E17.50)
Gandi.net 256M 5G 3Mb/s $7.50 or E6