RAID etc

On a closed mailing list someone wrote:
2 X 120gb ide drives installed as slaves on each ide channels. … Presto. A 230’ish GB storage NAS for all my junk.

I’m not going to write a long technical response on a closed list so I’ll blog about it instead.

Firstly I wonder whether by “junk” the poster means stuff that is not important and which won’t be missed if it goes away.

If P is the probability of a drive not dying in a given time period (as a number between 1 being certain death and 0 an immortal drive) then the probability of serious data loss is P^2 for the configuration in question.

If P has a value of 0.5 over the period of 7 years (approximately what I’m seeing in production for IDE drives) then your probability of not losing data over that period is 0.25, IE there’s a 75% chance that at least one of the drives will die and data will be lost.

If the data in question really isn’t that important then this might be OK. About half the data on my file server consists of ISO images of Linux distributions and other things which aren’t of particularly great value as I can download them again at any time. Of course it would be a major PITA if a client had a problem with an old distribution and I had to wait for a 3G download to finish before fixing it, this factor alone makes it worth my effort in using RAID and backups for such relatively unimportant data. 300G IDE and S-ATA disks aren’t that expensive nowadays, if buying a pair of bigger disks saves you one data loss incident and your time has any value greater than $10 per hour then you are probably going to win by buying disks for RAID-1.

As another approach, LVM apparently has built-in functionality equivalent to RAID-1. One thing I have idly considered is using ATA over Ethernet with LVM or GFS to build some old P3 machines into a storage solution.

P3 machines use 38W of power each (with one disk, maybe as much as 70W with 4 disks but I haven’t checked) and should have the potential to perform well if they each have 4 IDE disks installed. That way a large number of small disks could combine to give a decent capacity with data mirroring. Among other things having more spindles decreases seek times when under heavy load. If you do work that involves large numbers of seeks then this could deliver significant performance benefits. If I had more spare time I would do some research on this, it would probably make for a good paper at a Linux conference.

root-kits on robots

This story on 365tomorrows.com on the topic of rootkits is interesting (note the OSs involved). Also it made me wonder about the other possibilities for a root-kitted robot, the mind boggles at how it might determine whether you need an enlargement to some body part…

365tomorrows is a good site, they post a short sci-fi story every day and it’s all free (paid for by merchandise and AdSense). When you read the storys make sure you check out the AdSense links, it’s sometimes rather amusing when Google gets some unusual interpretations of sci-fi storys and supplies adverts to match, I don’t think that AdSense was designed to work well with fiction.

first significant project goes live

One advantage of not being a permanent employee is that I am free to do paid work for other people. This not only gives a greater income but also a wider scope of work.

I’ve just completed my first significant project since leaving Red Hat. The Inumbers project provides an email address for every mobile phone. If you know someone’s mobile phone number but don’t have an email address then you can send email to NNN@inumbers.com where NNN is the international format mobile phone number. The recipient will receive an SMS advising them how to sign up and collect the email.

It was fun work, I had to learn how to implement SRS (which I had been meaning to do for a few years), write scripts to interface with a bulk SMS service, and do a few other things that were new to me.

SRS development

I’ve been working on a mail forwarding system which required me to implement SRS to allow people who use SPF to be customers of the service (as I use SPF on my domain it’s fairly important to me). Reading the web pages before actually trying to implement it things seemed quite easy. All over the web you will see instructions to just set up an /etc/aliases file that pipes mail through the srs utility.

The problem is that none of the srs utility programs actually support piped mail. It seems that the early design idea was to support piped mail but no-one actually implemented it that way. So you can call the srs utility to discover what the munged (cryptographically secure hash signed) originator of the email should be but you have to do the actual email via something else.

This wasn’t so much of a problem for me as I use my own custom maildrop agent to forward the mail instead of using /etc/aliases (Postfix doesn’t support what I want to do with /etc/aliases – dynamically changing the email routing as you receive it isn’t something that Postfix handles internally).

However I still have one problem. Sometimes I get two or three copies of the SPF header from Postfix when it checks them.

In my main.cf file I have a smtpd_recipient_restrictions configuration directive that contains check_policy_service unix:private/spfpolicy and the Postfix master.cf file has the following:

spfpolicy unix - n n - - spawn user=USER argv=/PATH/spf-policy.pl

Does anyone have any ideas why I would get multiple SPF checks and therefore multiple email header lines such as:

Received-SPF: none (smtp.sws.net.au: domain of SRS0=MUyCQ6=CO=coker.com.au=russell@inumbers.com does not designate permitted sender hosts)
Received-SPF: none (smtp.sws.net.au: domain of SRS0=MUyCQ6=CO=coker.com.au=russell@inumbers.com does not designate permitted sender hosts)
[some other headers]
Received-SPF: pass (inumbers: domain of russell@coker.com.au designates 61.95.69.6 as permitted sender)
Received-SPF: pass (inumbers: domain of russell@coker.com.au designates 61.95.69.6 as permitted sender)
Received-SPF: pass (inumbers: domain of russell@coker.com.au designates 61.95.69.6 as permitted sender)

The email went through one mail router and then hit the destination machine, but somehow got 5 SPF checks along the way. Also the pair of identical checks had no lined between them and the set of three identical checks also had no lines between them. So multiple checks were performed without any forwarding. It seems that a single port 25 connection is giving two or three checks. Both machines run Postfix with SPF checking that is essentially idential (apart from being slightly different versions, Debian/unstable and RHEL4).

Any advice on how to fix this would be appreciated.

Linux on the Desktop

I started using Linux in 1993. I initially used it only in text-mode as I didn’t have enough RAM to run XFree86 on my Linux machine. I ran text-mode Linux server machines from 1993 to 1998. In 1998 I purchased my first laptop and installed Linux with KDE on it. I chose KDE because it had the greatest similarity to OS/2 which I had used as my desktop OS prior to that time. At the same time I purchased an identical laptop for my sister and gave her an identical configuration of Linux and KDE.

Running a Linux laptop in 1998 was a lot harder for a non-technical person than it is today. There was little compatability with MS file formats and few options for support for Internet connections and third-party hardware and software (most things worked but you needed to know what to do). One advantage of using Linux in this regard is that the remote support options have always been good, I was able to fix my sister’s laptop no matter which country she was in and which country I was in. Her laptop kept working for more than 5 years without the need for a reinstall (try that on Windows).

It was when VMWare first became available (maybe 2000) that I converted my parents to using Linux. At first they complained a bit about it being different and found VMWare less convenient than the OS/2 Dos box for running their old DOS programs. But eventually they broke their dependence on DOS programs and things ran fairly smoothly. There were occasions when they complained about not having perceived benefits of Windows (such as the supposed ability to plug in random pieces of hardware and have things all work perfectly). The fact that using OS/2 and then Linux has given them 14 years of computer use with no viruses and no trojans tends to get overlooked.

Of recent times the only problem that my parents have experienced is when they bought a random cheap printer without asking my advice. The printer in question turned out to not work with Fedora Core 4, but when Fedora Core 5 came out the printer worked. Waiting 6 months for a printer upgrade isn’t really a serious problem (the old printer which had worked 6+ years was still going strong).

My parents and my sister now have second-hand P3 desktop machines running Fedora. P3 CPUs dissipate significantly less heat than P4 and Athlon CPUs, this significantly reduces the risk of hard drives dying when machines are left on in unairconditioned rooms as well as saving money on electricity. For the typical home user who doesn’t play 3D games there is no real need for a CPU that’s more powerful than a 1GHz P3. This of course means that there is less need for me to reinstall on newer hardware which also means more reliability.

I always find it strange when people claim that Linux isn’t ready for the desktop. I provide all the support for three non-technical users of Linux on the desktop and it really doesn’t take much work because things just work. Corporate desktops are even easier, in a company you install what people need for their work and don’t permit them to do anything different.

It seems to me that Linux has been ready for the desktop since 1998.

more on anti-spam

In response to my last entry about anti-spam measures and the difficulty of blocking SPAM at the SMTP protocol level I received a few responses. Brian May pointed out that the exiscan-acl feature of Exim allows such blocking, and Johannes Berg referred me to his web site http://johannes.sipsolutions.net/Projects for information on how he implemented Exim SPAM blocking at the SMTP level.

It seems that this is not possible in Postfix at this time. The only way I know of to do this in Postfix would be to have a SMTP proxy in front of the Postfix server that implements the anti-SPAM features. I have considered doing this in the past but not had enough time.

Also a comment on my blog criticises SORBS for blocking Tor (an anonymous Internet system). As I don’t want to receive anonymous email and none of the companies I work for want to receive it either this is something I consider a feature not a bug!

blocking spam

There are two critical things that any anti-spam system must do, it must not lose email and it must not cause damage to the rest of the net.

To avoid losing email every message must be either accepted for delivery or the sender must be notified.

To avoid causing damage to the rest of the net spam should not be bounced to innocent third parties. To accept mail, process it, and then bounce messages that appear to be spam will result in spam being bounced to innocent third parties.

The only exception to these two conditions is for virus email which can be positively identified as being bad and therefore they can be silently discarded. For any other category of unwanted mail there is always a possibility of a false-positive and therefore the sender should be notified if the mail will not be accepted.

Therefore the only acceptable method of dealing with spam is to reject it at the SMTP protocol level. Currently I am not aware of any software that supports Bayesian filtering while the message is being received so that it can be rejected if it appears to be spam, it would be possible to do this (I could write the code myself if I had enough spare time) but AFAIK no-one has done it.

The most popular methods of recognising SPAM before it is accepted are through DNSBL lists (DNS based lists of IP addresses known to send SPAM), RHSBL lists (DNS based lists identifying domains that are known to be run by spammers), and Gray-listing (giving a transient error condition in the expectation that many spammers won’t try again).

Gray-listing is not effective enough to be used on it’s own, therefore DNSBL and RHSBL systems are required for a usable email system. Quickly reviewing the logs of some of my clients’ mail servers suggests that the DNSBL dnsbl.sorbs.net alone is stopping an average of 20 SPAMs per user per day! The SORBS system is designed to block open relays, machines that send mail to spam-trap addresses, and some other categories of obviously inappropriate use. The number of false-positives is very small. On average I add about one white-list entry per month, which isn’t much for the email for a dozen small companies. For every white list entry I have added I have known that the sender has had a SPAM problem. I have not had to add a white-list entry because of a DNSBL making a mistake, just because people want to receive mail from a system that also sends SPAM.

I was prompted to write about anti-spam measures by an ill-informed and unsubstantiated comment on my blog regarding DNSBL services.

If anyone wants to comment on this please feel free. But keep in mind that I have a lot of experience running mail servers including large ISPs with more than a million customers. The advice I give in terms of anti-spam measures concerns techniques that I have successfully used on ISPs of all sizes and that I have found to work well even when both ends use them. Make sure that you substantiate any comments you make and explain them clearly. Saying that something is stupid is not going to impress me when I’ve seen it work for over a million users.

wasted votes

In a mailing list to which I subscribe there is currently a discussion on US politics with the inevitable discussion of wasted votes. As I don’t want to waste my writing on this topic on a closed list I’m posting to my blog.

There is ongoing discussion on the topic of wasted votes. As a matter of principle, if a vote is considered to be wasted, then that should be considered a failure of the electoral system.

Having representatives for regions makes some sense in that a regional representative will have more interest in the region than a central government with no attachment to the region. I expect that representatives of regions were initially used because it was not feasible for people to vote for people that weren’t geographically local. Now there is no real requirement for geographical locality (only a very small fraction of the voters get to meet the person they are voting for anyway) but having a representative for a region still makes sense.

The requirement for a regional representative means that if you live in a region mostly filled with people who disagree with you then your vote won’t change much. For example I live in a strong Labor region so the REAL fight for the lower house seat (both state and federal) occurs in the Labor party room.

My vote for the senate counts as that is done on a state-wide basis. So of the two votes entered in one election one of them can be considered to not be wasted.

For the US system, the electoral college was developed in a time when it was impossible for the majority of voters to assess the presidential candidates, and it solved the requirements of those times reasonably well. Today it is quite easy to add up all the votes and use either a simple majority or the “Australian ballot”.

Currently there is some controversy over the actions of Senator Joe Lieberman who lost the support of his party and then immediately declared that he would stand as an independent candidate. I believe that this illustrates a failure of the electoral system. It should be possible to have multiple candidates from each party on the list. In the Australian system it is possible to do that, but as they are in random order on the voting cards no-one would be sure of which candidate of the winning party would get the seat unless there were actual reasons for preferring one candidate over another (which sadly often isn’t the case). This is good for voters (the minority of voters who care enough about internal party policies to prefer one party candidate over another should make the decision) but not good for the candidates who want a better chance of winning without actually demonstrating that they can represent their voters better than other candidates.

The Australian government system has nothing equivalent to the US presidential election. The prime minister is voted in by the members of parliament. So there is little chance of getting multiple candidates from one party contesting one position. For the US presidential election I think that the best thing to do would be to have an “Australian ballot” and permit multiple candidates from each party. For example you could have Bush and Cheney running as candidates for president with each promising to make the other their VP if they get elected. With the Australian ballot it wouldn’t matter if you put Bush and Cheney as the last two votes on your ticket, the order you use for them will still matter.

I think that with the US presidential and state governor elections there is enough knowledge of the candidates among the voters to make it worth-while for each of the major parties to run multiple candidates.

One of many advantages of having multiple candidates is that you might have real debates. If the main candidates from the two big parties have a set of strict rules for their debate that prevents any surprise then the people who are the less likely candidates from those parties (and who therefore have less to lose) could go for a no-holds-barred debate with a selection of random members of the public asking questions.

Of course none of this is likely to happen. Any serious change would have the potential to adversely affect at least one of the major parties, and any improvement would necessarily have a negative impact on most of the current politicians. Votes ARE being wasted, and most politicians seem to like it that way.