etbe - Russell Coker

Five ways SE Linux may surprise you

Frank Mayer of Tresys has written a great article on the techtarget.com site about SE Linux.

It seems mostly aimed at managers and novice users and explains how SE Linux isn’t really that difficult to use but is however a foundation technology that is needed for secure systems.

Check it out!

permalinks in wordpress, Apache redirection, and other blog stuff

When I first put my new blog online I didn’t think to set the custom permalinks option to avoid having /index.php in all URLs (which wastes a few bytes and looks nasty).

So I decided to change to better URLs but unfortunately many people have already bookmarked the bad URLs. I wanted to give a HTTP 301 redirection when someone uses the old index.php version (so that bookmarks get updated) and then redirect to the PHP file. Unfortunately having a redirection from ^/index.php to a version without it and then a local rewrite to include index.php again doesn’t seem to work (any advice would be appreciated). So I put the following in my /etc/wordpress/htaccess file (the location for such things in Debian) so that foo.php is used instead where foo.php is a sym-link to index.php. I’m wondering whether I should file a bug report against the Debian package requesting that a sym-link be in the package to facilitate such things – if it’s not possible to do what I desire without the symlink.

RewriteEngine On
RewriteBase /
#RewriteCond %{REQUEST_URI} ^/index.php/?(.*$) [NC]
#RewriteRule . /%1 [R=301,L]
RewriteCond ^/robots.txt [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
#RewriteRule . /foo.php%1 [L]
RewriteRule . /index.php%1 [L]

Update: I am now using the permalink-redirect plugin (thanks for the tip Method) which solves the problem of the obsolete URLs as well as solving the problem of having two representations of the URL (with and without a trailing slash). I have updated the above htaccess file sample to reflect my new configuration (with the old settings commented out for the benefit of people who don’t want the permalink-redirect plugin).

The way WordPress allows the table prefix to be stored in the MySQL configuration section is very handy. Some time ago I asked for advice on a blog server for multiple users and WordPress-MU was recommended, but it seems that for most situations where you want multiple blogs the non-MU version of WordPress will do the job. It seems that the main benefit of WordPress-MU is that setting up multiple blogs doesn’t require running shell scripts, which for the cases I’m most interested in doesn’t compete with the benefit that the non-MU version has of being packaged in Debian.

On the topic of WordPress in Debian, it’s a pity that none of the plugins are packaged in Debian. I plan to create a repository for plugins and themes that I use if no-one else has started such a repository. I believe that a repository of Debian packages for such things will provide significant benefits to users, including updates for security reasons and having plugins that are known to work (some of the plugins appear to only work on Windows).

Also there are a few issues that I would like to improve in WordPress. One is that the Uncategorised category is selected by default so if I select another category and forget to de-select Uncategorised then it’s a little confusing. Another is that the categories are displayed in the side-bar without mentioning the number of matching posts. The way blogger lists the number of posts per category (and sorts the categories in order) is much more convenient. Also another advantage of blogger is the handling of archives where you can click on a month to see a list of the names of all posts in that month. I’m not about to go back, but it would be nice to have those features. Does anyone have any ideas how to solve these problems?

Update2:
I have added a rule to make robots.txt not redirect. Before adding this rule /robots.txt was redirected to /index.php/robots.txt which caused a WordPress page to load, this wasted a lot of bandwidth (robots.txt is hit often) and probably caused some spiders to ignore my site.

lemonup.com – pirates

The URL http://linuxresource.lemonup.com/ currently has a mirror of my blog. Disregarding the DMCA take-down notice I sent them a week ago (which is also mirrored on their own site) they have again copied the content from my site without permission (I only allow non-commercial use). But this time they go even further and claim copyright over my text!

This is going way too far. Now I’m going to ask their ISP to deal with them.

Update: Their site is now offline. Their ISP acted quite quickly and less than 3.5 hours after my complaint the entire site was offline (not only the section that had my posts). I suspect that it was the fact that they mirrored blog posts such as this one which made it appear to be willfull infringement which got such a fast response – but the only response I got from the ISP was to say that they would do what seemed right and not comment to me about it due to privacy reasons.

This is not an ideal outcome. I would much rather have had them respect my license terms without such measures. I only contacted their ISP because the first take-down request took four days to complete (after receiving a response on day 0 so it wasn’t four days of holiday for the operator) and because they then mirrored my site again under a different URL. I am still unsure of whether this was a genuine mistake (as claimed by the operator) due to lack of communication between multiple people involved in running the site, or whether they just didn’t think I would catch them.

I don’t have any malice towards the operators of lemonup, I have already offered some suggestions that may help them in future business ventures and would be happy to make some more suggestions if asked.

In response to a comment. The traditional meaning of the word pirate is violent acts at sea that don’t have state sponsorship, this usually involves armed robbery but the main criteria is violence without state sponsorship. The slang use means anything which goes against the wishes of a copyright holder.

school rating

The web site http://au.ratemyteachers.com/ allows Australian students to rate their teachers. Ratings are anonymous and give teachers a score out of 5 as well as allowing students to comment on teachers.

The Sydney Morning Herald has an article about the site that describes the actions that the NSW Department of Education and the NSW Teachers Federation are taking to block the site.

The solution to this however is really quite simple. There needs to be a formal method for students to rate their teachers which will be used when it comes time to give pay rises to good teachers and dismiss or transfer to non-teaching duties the teachers who can’t do their job.

I encourage students to submit essays and debate topics about the anonymous news-papers published in the Soviet Union and other repressive states, why they were necessary (because criticism of the government was prohibited) and why they were morally right (a system with no method of correction will inevitably do bad things). Then teachers will have a choice of supporting the actions of the Soviet Union or the use of ratemyteacher.com, it will be interesting to see which option they choose. I think that it’s most likely that they will take the hypocritical path and support anonymous newspapers in the Soviet Union while attacking such free speech in supposedly free countries.

It’s interesting that an article on the failures of Mentone Grammar has just been published. Maybe if Mentone had been listed on the ratemyteachers.com site the Taylor’s would not have made the mistake of sending their son there. Or maybe if the Mentone senior staff had been reading that site they would have been able to correct the problems before they became cause for a legal dispute.

DMCA etc

A few days ago I wrote my first DMCA take-down notice, I followed the instructions on the Wikipedia page. The reason for this was that someone was mirroring my blog and putting google adverts on the copy. Before I started putting Google adverts on my web sites I wouldn’t have been bothered about this. But now that I’m making a small amount of money from Google advertising I don’t want someone else just mirroring my content and taking the money away from me.

The person who managed the site in question took a surprisingly large amount of time to comply with the request (a discussion of several messages plus a couple of reminders over the course of a few days).

The most recent news about DMCA abuse is the case of trying to prevent the distribution of a code used for decrypting DVD-HD. It is widely believed that copyright was used to prevent the distribution. Strangely many people who otherwise have a good understanding of technology have been saying “you can’t copyright a number”. What precisely is a program binary if not a long series of numbers (or a single large number depending on how you look at it)? For that matter a JPEG file or the ASCII representation of a book is also either a very large number or a series of small numbers. Also apparently it’s not protected under copyright but under the anti-circumvention clause of the DMCA.

If it was a matter of copyright it would not be an issue of whether a number can be copyrighted, but what defines such a number. One criteria for copyright is that it has to be on something non-trivial (EG I couldn’t copyright the use of “a few days ago” as an introduction) so length is a criteria. Another is that it has to be a creative expression (so an encryption key can’t be copyright). However in many jurisdictions there are separate laws regarding distributing passwords without permission, such laws are designed for preventing people from granting unauthorised access to computers but I believe that they can be used more generally (I have been advised that such laws exist in the state of Pennsylvania in the US – I’m not sure what the law is in other regions but expect that something so useful would be copied).

Another breaking story is that the RIAA has created an organisation with a US government mandate to collect royalties on ALL music that is played over Internet radio. This includes music for which the copyright owner is not an RIAA member and does not consent to have the royalties applied. You can create your own music, grant free access to everyone out of philanthropy, and then have the RIAA tax the music!

It’s unfortunate that only the down-side of this dramatic change in copyright law has been discussed. Compulsory licenses have a lot of potential in other areas of copyright material. Recently people have been complaining that government sponsored scientific research is often only published in journals that cost large amounts of money. Why not have a compulsory license for journals at a fair price that everyone can afford? Software is often unreasonably expensive (Windows Vista with the latest version of MS Office can cost up to twice as much as a new PC), let’s have compulsory licenses for software at a reasonable fee! Software vendors often cease selling old versions of software to force customers to upgrade, a compulsory license scheme would permit us to buy MS-DOS 3.30 at a reasonable price regardless of whether MS wants to sell it.

Finally there is at least one evil cult that claims it’s “religious” texts are copyright as a way of preventing the public from seeing what a drug-addled second-rate sci-fi author produces. Let’s have a compulsory license for them so everyone can read them!

The only thing that’s wrong with the RIAA scheme is that there is no option for copyright owners to directly license their material to the users (including granting a free license if they so desire). The up-side of this is that it proves beyond all doubt that the RIAA is not representing copyright owners.

Update: I initially accepted the claims about the DMCA take-down notices being based on copyright rather than anti-circumvention. Since learning of my mistake I modified this post to reflect the fact that it was not a copyright issue.

LUG talks today

Today I gave three talks at my local LUG. The first was my latest SE Linux talk (I’ll put the notes online soon). The second was a talk about voting.

I asked for a show of hands, who has already decided which party they will vote for at the next federal election (about 12 people put their hands up). I then asked people to put their hands down if they were not a member of the party that they intend to vote for, including myself there were only two raised hands in the room (including mine)!

With the way party politics works nowadays the major parties are not very interested in representing their core voters. Why try to please for people who will vote for you anyway? Instead they try to appeal to swinging voters and pressure groups. If you have decided to vote for a party they have no reason to try and impress you. Therefore you should join the party and try and influence the policy decision making process from within.

The issues that I believe are most important to the Linux community are free software use in government, sane intellectual property laws, the right to a fair trial, and not pandering to the US (which is related to the previous two points).

If you have already decided who to vote for then you should join that party and make your vote count in the party room.

One member of the audience said that he had been a member of one of the major parties but that the internal politics turned him off. If that is your experience then I think you should ask yourself whether you want to vote for a group of people that you can’t work with.

The final talk I gave was about getting speakers for Linux Users’ Groups. There is always difficulty in finding speakers for clubs. Ideally we would have meetings planned a few months ahead of time so that they could be advertised in various ways. Newspapers often have columns dedicated to providing information about public meetings but the lead time is usually at least a week (and the meeting would have to be advertised at least two weeks in advance – so more than a month’s planning ahead is required).

Getting a larger number and variety of speakers will attract new members, encourage existing members to attend more meetings, and inspire members in their Linux work.

Talks can be given by almost anyone. There is a constant demand for speakers who have expert knowledge in the topic, but anyone who is a decent speaker and has the confidence to stand up at the podium can give a good talk. For expert speakers possibilities include academics, industry leaders, leaders of free software development projects, and journalists. But that’s not all, anyone who wants to spend the time researching a topic can give a talk on it. For example I’ve been learning about MySQL recently for my own servers and will probably offer a talk about MySQL aimed at sys-admins who don’t want to become DBAs but who just want to get a database running. I’m not a MySQL expert (and don’t plan to become one) but I believe that there are many people who want to do the things I do with MySQL and who could benefit from a talk that I might give.

The best place to find speakers is a conference or trade-show. If they give a talk that works well you can suggest that they give it again for your local LUG. You can also find speakers at conferences that you can’t attend. If someone visits your country for a trade-show in a different city you could send them an email saying “unfortunately I can’t attend your talk, but if you are interested in visiting my city in the same trip then there will be an audience of X people interested in seeing you”.

There’s no harm in asking, the worst that they can do is decline. Ask everyone who you think can do a good job. Also make sure that you don’t make any commitment (unless you are member of the LUG committee).

more about Heartbeat

In a comment on my blog post “a Heartbeat developer comments on my blog post” Alan Robertson writes:
I got in a hurry on my math because of the emergency. So, there are even more assumptions (errors?) than I documented.
In particular, the probability model I gave was for a particular node to fail. So the probability of either of two failing would be double that, and either of three failing would be triple that.
Note that the probability of multiple simultaneous failures goes up as a power, but the probability of either of only goes up linearly.
I really need to sit down and do the math carefully – but the idea of the simultaneous failures going up as a power is true. And the “any of” probability goes up linearly. That’s also true. This is why people can actually use larger HA clusters ;-).
The 5 years figure is the industry standard quoted figure for an average Intel-based server to fail.
The four hours to repair is a common high-quality of service response time from a hardware vendor. I admit that’s not the same as actual repair time, but if some “repairs” are just reboots, then it’s not a horrible number to start with – if your vendor has cached some spares nearby. I suppose I should sit down and do the math right, and make a spreadsheet of it. (I wonder if I remember that much math?)
I assume disk failures are taken care of by hot swap disks, RAID, etc. and so in effect they “never fail” (at least not totally) so that these failures don’t have to be accounted for by the overall availability model.
Here’s an intuitive way of thinking about it “from your gut”…
If I took your whole data center and made a cluster out of it, what’s the chance that at least half of your servers would fail at once?
Pretty darn small, is the short answer ;-). If it’s not pretty darn small, you need to buy better servers, and IBM has just the servers for you ;-). Or maybe they need to hire a better SysAdmin ;-)
If you ask yourself “when is the last time at least half my machines in my data center couldn’t communicate with the other half”, then hopefully that’s also a “pretty darn small” chance too. If not, there are well-known methods for making networks highly reliable too.
[I’m still ignoring “catastrophes” that you haven’t accounted for in your HA architecture].
I’m not saying this is free, and it can be pricey. One of my other favorite sayings is “Paranoia is an expensive hobby”. How much do you want to spend?
You tell me how much you want to spend, and you can figure out how to spend it.
I’ll make a separate comment on quorum models later. It’s getting late here.

My only comment in response to that is to say that I still believe the calculations of probability to be correct in my original post and I am interested to see someone prove otherwise.

Another comment by Alan:
Data corruption, no doubt, is almost always much worse than loss of availability. And some kinds of data corruption are worse than others. For example, mounting a non-clustered shared disk filesystem twice simultaneously is usually much worse, than updating two replicas of the data simultaneously. In the first case, you have to restore to your previous backups and lose all data since then. In the second case, you only lose updates that were made to one of the sides, and you instantly have a working copy of the data which is nearly always much newer than your last backup (with the possibility of recovering them by significant effort). Typically you would only lose a few minutes of updates at worst – and depending on the kind of networking failure, you might not lose anything.
Heartbeats certainly aren’t enough. You need to monitor the health of your servers and the health of your applications. Heartbeat monitors applications and can easily be informed of and act on the health of your servers (with release 2 style Linux-HA Heartbeat configurations).

Followed by:
Since data corruption is so serious, this is why cluster designers worry so much about split-brain, which is managed using the ideas of quorum and it’s sibling fencing.
This is all about keeping bad things from happening.
This post is really about quorum, since Russ had expressed interest in it.
Quorum is the idea that you can uniquely choose a subcluster to represent the whole cluster in those cases where communication failure has caused the cluster to split into separate sub-clusters which cannot properly communicate with each other. In this way, only one of the subclusters continues on, and the others will sit on their hands and do nothing waiting for a person to fix things.
Some of the kinds of quorum mentioned below are better than others. But, most importantly, they can be used in combination as described later.
The most common kind of quorum is that Russ mentioned in his earlier post – the majority quorum. In this method, for a cluster of n nodes, you grant quorum to a sub-cluster which has more than INT(n/2) members. This means that if you have a 3-node cluster, you have to have two nodes to continue. If you have 4 nodes, you have to have 3 nodes to continue. For 5 nodes, you have to have 3 nodes, and so on.
Other basic methods include disk reserve, so that you have reserve a disk to have quorum. In this case, if only one node survives and it can reserve the disk, it continues to run. However, the disk becomes a single point of failure. This may not be a problem if this single disk is required to run any of the cluster services, since they would fail without it anyway. [Heartbeat does not support this method].
An analagous method is to implement a software resource which grants quorum to one subcluster in a fashion analagous to the disk reserve method. This has the advantage of not requiring disk reserves, or a shared disk, but it has the same SPOF disadvantage as the disk reserve method. Heartbeat does support this method using the quorum daemon. It’s incredibly useful for those cases (like split-site clusters) where you cannot use fencing.
Another method is to grant quorum to any subcluster which can ping a certain set of nodes, and not grant it to any which can’t access those nodes. This isn’t a wonderful method, and has obvious disadvantages with respect to uniqueness, and single points of failure. (Heartbeat doesn’t yet implement this one).
Another method is to grant quorum to any node which is a member of a 2-node cluster. This is better than losing quorum and stopping when one node stops, but obviously completely ignores the uniqueness requirement of quorum.
Another method is to ask a human being if you have quorum. This is hardly an ideal circumstance, but useful in some contexts as described below. (Heartbeat doesn’t yet implement this one).
Perhaps you say, really the only one of these that’s really good is the first one – the majority vote method.
And, I would generally agree with you. But, Heartbeat has the ability to use these in combination which makes some of those methods that seem flaky to be much more reasonable.
Heartbeat has the ability to have multiple quorum modules declared, and they’re used in this way: Any module can return HAVEQUORUM, NOQUORUM, or TIE. If they return HAVEQUORUM or NOQUORUM, then no further quorum modules are consulted. However, if they return TIE, then the next quorum module is consulted for its opinion. If the last quorum module returns TIE, it is treated the same as NOQUORUM.
This enables you to use one quorum module to break the tie declared by a previous quorum module.
You could then use the quorumd to break the tie created by a voting module. Or you could use the quorumd instead of the “two-node” module. Or you could use the “pingable” module instead of the “two-node” module. Or you could at the end always tack on a “human” module, in case all else returns TIE.
This is kind of cool, actually. My favorites for next implementation are the pingable and consult human modules.
And, of course, if your cluster loses quorum due to real server failures failures, there are always ways to work around it, with a little human intervention. One method is to tell Heartbeat to ignore quorum. Another is to tell Heartbeat to remove certain nodes from the cluster, after you verify that they’re really dead. And, I’m sure that in a pinch, some new methods will be invented. And some of them might actually work ;-).

Regarding the quorumd, it seems that this is an extra server that will generally run on another machine separate from the rest of the cluster. So if we had a two-node cluster with a quorumd then it would effectively be a three-node cluster where one node is not configured to run any resources. It seems that the simpler approach in many cases would be to merely have a three-node cluster with resources not configured to run on one of the nodes.

For example if I was running a mail server cluster for an ISP I might configure a three node cluster of the two mail server back-end machines and one other machine that is lightly loaded (EG a DNS server) and have it configured not to run MTA resoures on the DNS machine.

comment spam

The war on comment-spam has now begun. It appears that Blogger might have some anti-spam measures of which I was unaware. Otherwise it’s a strange coincidence that I get a huge number of comment spams for extremely hard-core porn from the Ukraine so soon after starting a WordPress blog.

About 24 hours before the spam attack there was a strange blog comment that linked to google (with no offensive or spammy content). It appears that leaving it online was my mistake, when I left that online for a day the spammer decided that I might also leave porn spam online. I arrived home this evening to find almost 100 spams in the form of comments and track-backs, and more arriving by the minute. So I used iptables to block a /20 related to the spam and things are quiet now.

The moral of the story is to delete anything unusual ASAP in case it encourages the idiots.

I’ve also tightened the anti-spam measures on my blog too.

Update:

From now on any short comment that does not add significant meaning will not be accepted on my blog. To the person who submitted many dozens of comments with variants of “nice site” with the idea that the URL listed for the comment author will be visited by readers of my site – nice try. If you genuinely want to send me a message saying “nice blog” then email will work.

In the future I may remove the display of URLs for the comment authors entirely.

Several comments suggested using Akismet to block comment spam. Akismet is free for non-commercial use and charges for commercial use (a suggested threshold being $500 per month in blog revenue).

For the moment I am going to moderate all comments, the number of genuine comments is quite small and this is no great effort for me. I check the moderation list at least twice a day so there shouldn’t be an excessive delay either.

new blog

I am starting to move my blog to my own WordPress server. Here is the new URL for my main blog (feed), and here is the new URL for my Source-Dump blog (feed) which is now named just “dump”.

WordPress gives me the power to change all aspects of my blog’s operation (including adding plug-ins). It also allows me to correctly display greater-than and less-than characters (the Perl script I use for converting them is at this post – it’s short now but will probably grow).

Hopefully the new blog will also solve the date problems that some Planet readers have been complaining about.

I will briefly put the same content on both the old and new blogs, when I’m fully confident in the new blog I’ll stop updating the old one and try to get all Planet installations changed. Anyone who wants to convert their Planet installation to my new blog now is welcome to do so.

etbe – Russell Coker

Archives

Categories

Five ways SE Linux may surprise you

permalinks in wordpress, Apache redirection, and other blog stuff

lemonup.com – pirates

school rating

DMCA etc

LUG talks today

more about Heartbeat

comment spam

new blog

more on presentations

Archives

Email and RSS

Archives

Categories

Tags

Archives

Email and RSS