etbe - Russell Coker

ECC RAM is more useful than RAID

A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code [0]) RAM is only a server feature.

The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks for one person. Therefore when purchasing a desktop machine you can decide how much you are willing to spend for the safety and continuity of your work. For a server it’s more difficult as everyone has a different idea of how reliable a server should be in terms of uptime and in terms of data security. When running a server for a business there is the additional issue of customer confidence. If a server goes down occasionally customers start wondering what else might be wrong and considering whether they should trust their credit card details to the online ordering system.

So it is obviously apparent that servers need a different degree of reliability – and it’s easy to justify spending the money.

Desktop machines also need reliability, more so than most people expect. In a business when a desktop machine crashes it wastes employee time. If a crash wastes an hour (which is not unlikely given that previously saved work may need to be re-checked) then it can easily cost the business $100 (the value of the other work that the employee might have done). Two such crashes per week could cost the business as much as $8000 per year. The price difference between a typical desktop machine and a low-end workstation (or deskside server) is considerably less than that (when I investigated the prices almost a year ago desktop machines with server features ranged in price from $800 to $2400 [1]).

Some machines in a home environment need significant reliability. For example when students are completing high-school their assignments have a lot of time invested in them. Losing an assignment due to a computer problem shortly before it’s due in could impact their ability to get a place in the university course that they most desire! Then there is also data which is irreplaceable, one example I heard of was of a woman who’s computer had a factory pre-load of Windows, during a storm the machine rebooted and reinstalled itself to the factory defaults – wiping several years of baby photos… In both cases better backups would mostly solve the problem.

For business use the common scenario is to have file servers storing all data and have very little data stored on the PC (ideally have no data on the PC). In this case a disk error would not lose any data (unless the swap space was corrupted and something important was paged out when the disk failed). For home use the backup requirements are quite small. If a student is working on an important assignment then they can back it up to removable media whenever they reach a milestone. Probably the best protection against disk errors destroying assignments would be a bulk purchase of USB flash storage sticks.

Disk errors are usually easy to detect. Most errors are in the form of data which can not be read back, when that happens the OS will give an error message to the user explaining what happened. Then if you have good backups you revert to them and hope that you didn’t lose too much work in the mean-time (you also hope that your backups are actually readable – but that’s another issue). The less common errors are lost-writes – where the OS writes data to disk but the disk doesn’t store it. This is a little more difficult to discover as the drive will return bad data (maybe an old version of the file data or maybe data from a different file) and claim it to be good.

The general idea nowadays is that a filesystem should check the consistency of the data it returns. Two new filesystems, ZFS from Sun [2] and BTRFS from Oracle [3] implement checksums of data stored on disk. ZFS is apparently production ready while BTRFS is apparently not nearly ready. I expect that from now on whenever anyone designs a filesystem for anything but the smallest machines (EG PDAs and phones) they will include data integrity mechanisms in the design.

I believe that once such features become commonly used the need for RAID on low-end systems will dramatically decrease. A combination of good backups and knowing when your live data is corrupted will often be a good substitute for preserving the integrity of the live data. Not that RAID will necessarily protect your data – with most RAID configurations if a hard disk returns bad data and claims it to be good (the case of lost writes) then the system will not read data from other disks for checksum validation and the bad data will be accepted.

It’s easy to compute checksums of important files and verify them later. One simple way of doing so is to compress the files, every file compression program that I’ve seen has some degree of error detection.

Now the real problem with RAM which lacks ECC is that it can lose data without the user knowing. There is no possibility of software checks because any software which checks for data integrity could itself be mislead by memory errors. I once had a machine which experienced filesystem corruption on occasion, eventually I discovered that it had a memory error (memtest86+ reported a problem). I will never know whether some data was corrupted on disk because of this. Sifting through a large amount of stored data for some files which may have been corrupted due to memory errors is almost impossible. Especially when there was a period of weeks of unreliable operation of the machine in question.

Checking the integrity of file data by using the verify option of a file compression utility, fsck on a filesystem that stores checksums on data, or any of the other methods is not difficult.

I have a lot of important data on machines that don’t have ECC. One reason is that machines which have ECC cost more and have other trade-offs (more expensive parts, more noise, more electricity use, and the small supply makes it difficult to get good deals). Another is that there appear to be no laptops which support ECC (I use a laptop for most of my work). On the other hand RAID is very cheap and simple to implement, just buy a second hard disk and install software RAID – I think that all modern OSs support RAID as a standard installation option. So in spite of the fact that RAID does less good than a combination of ECC RAM and good backups (which are necessary even if you have RAID), it’s going to remain more popular in high-end desktop systems for a long time.

The next development that seems interesting is the large portion of the PC market which is designed not to have the space for more than one hard disk. Such compact machines (known as Small Form Factor or SFF) could easily be designed to support ECC RAM. Hopefully the PC companies will add reliability features in one area while removing them in another.

Perpetual Motion

It seems that many blog posts related to fuel use (such as my post from yesterday about record oil prices [1]) are getting adverts about perpetual motion [2]. Note that the common usage of the term “Perpetual Motion” does not actually require something to move. A battery that gives out electricity forever would be regarded as fitting the description, as does any power source which doesn’t have an external source of energy.

The most common examples of this are claims about Oxyhydrogen [3], this is a mixture of hydrogen and oxygen in a 2:1 ratio. The wikipedia page is interesting, apparently oxyhydrogen is used for welding metals, glass, and plastics, and it was also used to heat lime to provide theatrical lighting (“lime light”). So a mixture of hydrogen and oxygen does have real-world uses.

The fraud comes in the issue of the claims about magnecules [4]. Magnecules are supposedly the reason for the “atomic” power of HHO gas (AKA Oxyhydrogen) which are repeated on many web sites. In brief, one mad so-called scientist (of course if he was a real scientist he would have experimental evidence to support his claims and such experiments would be repeatable) has invented entirely new areas of science, one of which involves magnetic bonds between atoms. He claims that such chemical species can be used to obtain free energy. The idea is that you start with water, end with water plus energy – then reuse the water in a closed system. Strangely the web sites promoting water fueled cars don’t seem to mention magnecules and just leave the “atomic energy” claim with no support – maybe magnecules are simply too crazy for them.

The water fuelled car wikpedia page is interesting – it lists five different ways that water can actually be used in a car engine (which are based on sound scientific principles and which have been tested) and compares them to the various water fueled car frauds [5].

I’m not accepting any more comments on my blog about perpetual motion solutions to the petrol crisis (they just take up valuable space and distract people who want to discuss science). I’ll allow some comments about such things on this post though.

Record Oil Prices

MarketWatch reports that oil prices had the biggest daily gain on record, going up $11 in one day.

They claim that this is due to an impending Israeli attack on Iran and a weak US economy. $150 per barrel is the price that they predict for the 4th of July. That’s an interesting choice of date, I wonder whether they will be talking about “independence from Arabian oil”…

The New York Times has an interesting article on fuel prices [1]. Apparently sales of SUVs are dropping significantly.

The US senate is now debating a cap on carbon-dioxide production. The NY Times article suggests that if the new “carbon taxes” could be combined with tax cuts in other areas. If implemented correctly it would allow people who want to save money to reduce their overall tax payments by reducing fuel use. Also as increasing prices will decrease demand (thus decreasing the price at import time) it would to some degree mean transferring some revenue from the governments of the middle east to the US government.

The article also states that the Ford F series of “pickup trucks” was the most popular line of vehicles in the US for more than 20 years! But last month they were beaten by the Toyota Corolla and Camry and the Honda Civic and Accord. Now Ford needs to put more effort into their medium to large cars. With the hybrid Camry apparently already on sale in the US (their web site refuses to provide any information to me because I don’t have Flash installed so I can’t check) and rumored to be released soon in other countries Ford needs to put some significant amounts of effort into developing fuel efficient vehicles.

According to a story in the Herald Sun (published on the 23rd of April), survey results show that 1/3 of Victorians would cease using their car to get to work if the petrol price reached $1.75/L [2]. Now the Herald Sun has run a prediction (by the assistant treasurer and the NRMA) that $1.75/L will be reached next week (an increase of just over 10 cents a liter) [3].

The good news is that there will be less pollution in Australia in the near future (even if $1.75 is not reached I am certain that the price will increase enough to encourage some people to use public transport). The bad news is that our public transport is inadequate at the moment and there will be significant levels of overcrowding.

SE Linux Support in GPG

In May 2002 I had an idea for securing access to GNUPG [1]. What I did was to write SE Linux policy to only permit the gpg program to access the secret key (and other files in ~/.gnupg). This meant that the most trivial ways of stealing the secret key would be prevented. However an attacker could still use gpg to encrypt it’s secret key and write the data to some place that is accessible, for example the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg“. So what we needed was for gpg to either refuse to encrypt such files, or to spawn a child process for accessing such files (which could be granted different access to the filesystem). I filed the Debian bug report 146345 [2] requesting this feature.

In March upstream added this feature, the Debian package is currently not built with --enable-selinux-support so this feature isn’t enabled yet, but hopefully it will be soon. Incidentally the feature as currently implemented is not really SE Linux specific, it seems to me that there are many potential situations where it could be useful without SE Linux. For example if you were using one of the path-name based MAC systems (which I dislike – see what my friend Joshua Brindle wrote about them for an explanation [3]) then you could gain some benefits from this. A situation where there is even smaller potential for benefit is in the case of an automated system which runs gpg which could allow an attacker to pass bogus commands to it. When exploiting a shell script it might be easier to specify the wrong file to encrypt than to perform more sophisticated attacks.

When the feature in question is enabled the command “gpg -c --output /tmp/foo.gpg ~/.gnupg/secring.gpg” will abort with the following error:
gpg: can’t open `/root/.gnupg/secring.gpg’: Operation not permitted
gpg: symmetric encryption of `/root/.gnupg/secring.gpg’ failed: file open error

Of course the command “gpg --export-secret-keys” will abort with the following error:
gpg: exporting secret keys not allowed
gpg: WARNING: nothing exported

Now we need to determine the correct way of exporting secret keys and modifying the GPG configuration. It might be best to allow exporting the secret keys when not running SE Linux (or other supported MAC systems), or when running in permissive mode (as in those situations merely copying the files will work). Although we could have an option in gpg.conf for this for the case where we want to prevent shell-script quoting hacks.

For editing the gpg.conf file and exporting the secret keys we could have a program similar in concept to crontab(1) which has PAM support to determine when it should perform it’s actions. Also it seems to me that crontab(1) could do with PAM support (I’ve filed Debian bug report 484743 [4] requesting this).

Finally one thing that should be noted is that the targeted policy for SE Linux does not restrict GPG (which runs in the unconfined_t domain). Thus most people who use SE Linux at the moment aren’t getting any benefits from such things. This will change eventually.

I Just Joined SAGE

I’ve just joined SAGE AU – the System Administrators Guild of Australia [1] .

I’ve known about SAGE for a long time, in 2006 I presented a paper at their conference [2] (here is the paper [3] – there are still some outstanding issues from that one, I’ll have to revisit it).

They have been doing good things for a long time, but I haven’t felt that there was enough benefit to make it worth spending money (there are a huge variety of free things that I can do related to Linux which I don’t have time to do). But now Matt Bottrell has been promoting Internode and SAGE, SAGE members get a 15% discount [4]. As I’ve got one home connection through Internode and will soon get another it seems like time to join SAGE.

BIND Stats

In Debian the BIND server will by default append statistics to the file /var/cache/bind/named.stats when the command rndc stats (which seems to be undocumented) is run. The default for RHEL4 seems to be /var/named/chroot/var/named/data/named_stats.txt.

The output will include the time-stamp of the log in the number of seconds since 1970-01-01 00:00:00 UTC (see my previous post explaining how to convert this to a regular date format [1]).

By default this only logs a summary for all zones, which is not particularly useful if you have multiple zones. If you edit the BIND configuration and put zone-statistics 1; in the options section then it will log separate statistics for each zone. Unfortunately if you add this and apply the change via rndc reload I don’t know of any convenient way that you can determine when this change was made and therefore the period of time for which the per-zone statistics were kept. So after applying this to my servers I restarted the named processes so that it will be obvious from the process start time when the statistics started.

The reason I became interested in this is when a member of a mailing list that I subscribe to was considering the DNSMadeEasy.com service. That company runs primary DNS servers for $US15 per annum which allows 1,000,000 queries per month, 3 zones, and 120 records (for either a primary or a secondary server). Based on three hours of statistics it seems like my zone coker.com.au is going to get about 360,000 queries a month (between both the primary and the secondary server). So the $15 a year package could accommodate 3 such zones for either primary or secondary (they each got about half the traffic). I’m not considering outsourcing my DNS, but it is interesting to consider how the various offers add up.

Another possibility for people who are considering DNS outsourcing is Xname.org which provides free DNS (primary and secondary) but request contributions from business customers (or anyone else).

Updated because I first published it without getting stats from my secondary server.

[1] http://etbe.coker.com.au/2008/06/05/date-1970-01-01/

The Date Command and Seconds Since 1970-01-01

The man page for the date command says that the %s option will give “seconds since 1970-01-01 00:00:00 UTC“. I had expected that everything that date did would give output in my time zone unless I requested otherwise.. But it seems that in this case the result is in UTC, and the same seems to be also true for most programs that log dates with the number of seconds.

In a quick google search for how to use a shell script to convert from the number of seconds since 1970-01-01 00:00:00 UTC to a more human readable format the only remotely useful result I found was the command date -d "1970-01-01 1212642879 sec", which in my timezone gives an error of ten hours (at the moment – it would give eleven hours in summer). The correct method is date -d "1970-01-01 1212642879 sec utc", you can verify this with the command date -d "1970-01-01 $(date +%s) sec utc" (which should give the same result as date).

Moving a Mail Server

Nowadays it seems that most serious mail servers (IE mail servers suitable for running an ISP) use one file per message. In the old days (before about 1996) almost all Internet email was stored in Mbox format [1]. In Mbox you have a large number of messages in a single file, most users would have a single file with all their mail and the advanced users would have multiple files for storing different categories of mail. A significant problem with Mbox is that it was necessary to read the entire file to determine how many messages were stored, as determining the number of messages was the first thing that was done in a POP connection this caused significant performance problems for POP servers. Even more serious problems occurred when messages were deleted as the Mbox file needed to be compacted.

Maildir is a mail storage method developed by Dan Bernstein based around the idea of one file per message [2]. It solves the performance problems of Mbox and also solves some reliability issues (file locking is not needed). It was invented in 1996 and has since become widely used in Unix messaging systems.

The Cyrus IMAP server [3] uses a format similar to Maildir. The most significant difference is that the Cyrus data is regarded as being private to the Cyrus system (IE you are not supposed to mess with it) while Maildir is designed to be used by any tools that you wish (EG my Maildir-Bulletin project [4]).

One down-side to such formats that many people don’t realise (except at the worst time) is the the difficulty in performing backups. As a test I used LVM volume stored on a RAID-1 array of two 20G 7200rpm IDE disks with 343M of data used (according to “df -h” and 39358 inodes in use (as there were 5000 accounts with maildir storage that means 25,000 directories for the home directories and Maildir directories). So there were 14,358 files. To create a tar file of that (written to /dev/null via dd to avoid tar’s optimisation of /dev/null) took 230.6 seconds, 105MB of data was transferred for a transfer rate of 456KB/s. It seems that tar stores the data in a more space efficient manner than the Ext3 filesystem (105MB vs 343MB). For comparison either of the two disks can deliver 40MB/s for the inner tracks. So it seems that unless the amount of used space is less than 1% of the total disk space it will be faster to transfer a filesystem image.

If you have disks that are faster than your network (EG old IDE disks can sustain 40MB/s transfer rates on machines with 100baseT networking and RAID arrays can easily sustain hundreds of megabytes a second on machines with gigabit Ethernet networking) then compression has the potential to improve the speed. Of course the fastest way of transferring such data is to connect the disks to the new machine, this is usually possible when using IDE disks but the vast number of combinations of SCSI bus, disk format, and RAID controller makes it almost impossible on systems with hardware RAID.

The first test I made of compression was on a 1GHz Athlon system which could compress (via gzip -1) 100M of data with four seconds of CPU time. This means that compression has the potential to reduce the overall transfer time (the machine in question has 100baseT networking and no realistic option of adding Gig-E).

The next test I made was on a 3.2GHz Pentium-4 Xeon system. It compressed 1000M of data in 77 seconds (it didn’t have the same data as the Athlon system so it can’t be directly compared), as 1000M would take something like 10 or 12 seconds to transfer at Gig-E speeds that obviously isn’t a viable option.

The gzip -1 compression however compressed the data to 57% of it’s original size, the fact that it compresses so well with gzip -1 suggests to me that there might be a compression method that uses less CPU time while still getting a worth-while amount of compression. If anyone can suggest such a compression method then I would be very interested to try it out. The goal would be a program that can compress 1G of data in significantly less than 10 seconds on a 3.2GHz P4.

Without compression the time taken to transfer 500G of data at Gig-E speeds will probably approach two hours. Not a good amount of down-time for a service that runs 24*7. Particularly given that some time would be spent in getting the new machine to actually use the data.

As for how to design a system to not have these problems, I’ll write a future post with some ideas for how to alleviate that.

Mobile Facebook

A few of my clients have asked me to configure their routers to block access to Facebook and Myspace. Apparently some employees spend inappropriate amounts of time using those services while at work. Using iptables to block port 80 and configuring Squid to reject access to those sites is easy to do.

So I was interested to see an advertising poster in my local shopping centre promoting the Telstra “Next G Mobile” which apparently offers “Facebook on the go“. I’m not sure whether Telstra has some special service for accessing Facebook (maybe a Facebook client program running on the phone) or whether it’s just net access on the phone which can be used for Facebook (presumably with a version of the site that is optimised for a small screen).

I wonder if I’ll have clients wanting me to firewall the mobile phones of their employees (of course it’s impossible for me to do it – but they don’t know that).

I have previously written about the benefits of a 40 hour working week for productivity and speculated on the possibility that for some employees the optimum working time might be less than 40 hours a week [1]. I wonder whether there are employees who could get more work done by spending 35 hours working and 5 hours using Facebook than they could by working for 40 hours straight.

[1] http://etbe.coker.com.au/2007/11/02/increasing-efficiency-through-less-work/

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?

etbe – Russell Coker

Archives

Categories

ECC RAM is more useful than RAID

Perpetual Motion

Record Oil Prices

SE Linux Support in GPG

I Just Joined SAGE

BIND Stats

The Date Command and Seconds Since 1970-01-01

Moving a Mail Server

Mobile Facebook

Shelf-life of Hardware

Archives

Email and RSS

Archives

Categories

Tags

Archives

Email and RSS