Archives

Categories

DMCA etc

A few days ago I wrote my first DMCA take-down notice, I followed the instructions on the Wikipedia page. The reason for this was that someone was mirroring my blog and putting google adverts on the copy. Before I started putting Google adverts on my web sites I wouldn’t have been bothered about this. But now that I’m making a small amount of money from Google advertising I don’t want someone else just mirroring my content and taking the money away from me.

The person who managed the site in question took a surprisingly large amount of time to comply with the request (a discussion of several messages plus a couple of reminders over the course of a few days).

The most recent news about DMCA abuse is the case of trying to prevent the distribution of a code used for decrypting DVD-HD. It is widely believed that copyright was used to prevent the distribution. Strangely many people who otherwise have a good understanding of technology have been saying “you can’t copyright a number”. What precisely is a program binary if not a long series of numbers (or a single large number depending on how you look at it)? For that matter a JPEG file or the ASCII representation of a book is also either a very large number or a series of small numbers. Also apparently it’s not protected under copyright but under the anti-circumvention clause of the DMCA.

If it was a matter of copyright it would not be an issue of whether a number can be copyrighted, but what defines such a number. One criteria for copyright is that it has to be on something non-trivial (EG I couldn’t copyright the use of “a few days ago” as an introduction) so length is a criteria. Another is that it has to be a creative expression (so an encryption key can’t be copyright). However in many jurisdictions there are separate laws regarding distributing passwords without permission, such laws are designed for preventing people from granting unauthorised access to computers but I believe that they can be used more generally (I have been advised that such laws exist in the state of Pennsylvania in the US – I’m not sure what the law is in other regions but expect that something so useful would be copied).

Another breaking story is that the RIAA has created an organisation with a US government mandate to collect royalties on ALL music that is played over Internet radio. This includes music for which the copyright owner is not an RIAA member and does not consent to have the royalties applied. You can create your own music, grant free access to everyone out of philanthropy, and then have the RIAA tax the music!

It’s unfortunate that only the down-side of this dramatic change in copyright law has been discussed. Compulsory licenses have a lot of potential in other areas of copyright material. Recently people have been complaining that government sponsored scientific research is often only published in journals that cost large amounts of money. Why not have a compulsory license for journals at a fair price that everyone can afford? Software is often unreasonably expensive (Windows Vista with the latest version of MS Office can cost up to twice as much as a new PC), let’s have compulsory licenses for software at a reasonable fee! Software vendors often cease selling old versions of software to force customers to upgrade, a compulsory license scheme would permit us to buy MS-DOS 3.30 at a reasonable price regardless of whether MS wants to sell it.

Finally there is at least one evil cult that claims it’s “religious” texts are copyright as a way of preventing the public from seeing what a drug-addled second-rate sci-fi author produces. Let’s have a compulsory license for them so everyone can read them!

The only thing that’s wrong with the RIAA scheme is that there is no option for copyright owners to directly license their material to the users (including granting a free license if they so desire). The up-side of this is that it proves beyond all doubt that the RIAA is not representing copyright owners.

Update: I initially accepted the claims about the DMCA take-down notices being based on copyright rather than anti-circumvention. Since learning of my mistake I modified this post to reflect the fact that it was not a copyright issue.

LUG talks today

Today I gave three talks at my local LUG. The first was my latest SE Linux talk (I’ll put the notes online soon). The second was a talk about voting.

I asked for a show of hands, who has already decided which party they will vote for at the next federal election (about 12 people put their hands up). I then asked people to put their hands down if they were not a member of the party that they intend to vote for, including myself there were only two raised hands in the room (including mine)!

With the way party politics works nowadays the major parties are not very interested in representing their core voters. Why try to please for people who will vote for you anyway? Instead they try to appeal to swinging voters and pressure groups. If you have decided to vote for a party they have no reason to try and impress you. Therefore you should join the party and try and influence the policy decision making process from within.

The issues that I believe are most important to the Linux community are free software use in government, sane intellectual property laws, the right to a fair trial, and not pandering to the US (which is related to the previous two points).

If you have already decided who to vote for then you should join that party and make your vote count in the party room.

One member of the audience said that he had been a member of one of the major parties but that the internal politics turned him off. If that is your experience then I think you should ask yourself whether you want to vote for a group of people that you can’t work with.

The final talk I gave was about getting speakers for Linux Users’ Groups. There is always difficulty in finding speakers for clubs. Ideally we would have meetings planned a few months ahead of time so that they could be advertised in various ways. Newspapers often have columns dedicated to providing information about public meetings but the lead time is usually at least a week (and the meeting would have to be advertised at least two weeks in advance – so more than a month’s planning ahead is required).

Getting a larger number and variety of speakers will attract new members, encourage existing members to attend more meetings, and inspire members in their Linux work.

Talks can be given by almost anyone. There is a constant demand for speakers who have expert knowledge in the topic, but anyone who is a decent speaker and has the confidence to stand up at the podium can give a good talk. For expert speakers possibilities include academics, industry leaders, leaders of free software development projects, and journalists. But that’s not all, anyone who wants to spend the time researching a topic can give a talk on it. For example I’ve been learning about MySQL recently for my own servers and will probably offer a talk about MySQL aimed at sys-admins who don’t want to become DBAs but who just want to get a database running. I’m not a MySQL expert (and don’t plan to become one) but I believe that there are many people who want to do the things I do with MySQL and who could benefit from a talk that I might give.

The best place to find speakers is a conference or trade-show. If they give a talk that works well you can suggest that they give it again for your local LUG. You can also find speakers at conferences that you can’t attend. If someone visits your country for a trade-show in a different city you could send them an email saying “unfortunately I can’t attend your talk, but if you are interested in visiting my city in the same trip then there will be an audience of X people interested in seeing you”.

There’s no harm in asking, the worst that they can do is decline. Ask everyone who you think can do a good job. Also make sure that you don’t make any commitment (unless you are member of the LUG committee).

more about Heartbeat

In a comment on my blog post “a Heartbeat developer comments on my blog post” Alan Robertson writes:
I got in a hurry on my math because of the emergency. So, there are even more assumptions (errors?) than I documented.
In particular, the probability model I gave was for a particular node to fail. So the probability of either of two failing would be double that, and either of three failing would be triple that.
Note that the probability of multiple simultaneous failures goes up as a power, but the probability of either of only goes up linearly.
I really need to sit down and do the math carefully – but the idea of the simultaneous failures going up as a power is true. And the “any of” probability goes up linearly. That’s also true. This is why people can actually use larger HA clusters ;-).
The 5 years figure is the industry standard quoted figure for an average Intel-based server to fail.
The four hours to repair is a common high-quality of service response time from a hardware vendor. I admit that’s not the same as actual repair time, but if some “repairs” are just reboots, then it’s not a horrible number to start with – if your vendor has cached some spares nearby. I suppose I should sit down and do the math right, and make a spreadsheet of it. (I wonder if I remember that much math?)
I assume disk failures are taken care of by hot swap disks, RAID, etc. and so in effect they “never fail” (at least not totally) so that these failures don’t have to be accounted for by the overall availability model.
Here’s an intuitive way of thinking about it “from your gut”…
If I took your whole data center and made a cluster out of it, what’s the chance that at least half of your servers would fail at once?
Pretty darn small, is the short answer ;-). If it’s not pretty darn small, you need to buy better servers, and IBM has just the servers for you ;-). Or maybe they need to hire a better SysAdmin ;-)
If you ask yourself “when is the last time at least half my machines in my data center couldn’t communicate with the other half”, then hopefully that’s also a “pretty darn small” chance too. If not, there are well-known methods for making networks highly reliable too.
[I’m still ignoring “catastrophes” that you haven’t accounted for in your HA architecture].
I’m not saying this is free, and it can be pricey. One of my other favorite sayings is “Paranoia is an expensive hobby”. How much do you want to spend?
You tell me how much you want to spend, and you can figure out how to spend it.
I’ll make a separate comment on quorum models later. It’s getting late here.

My only comment in response to that is to say that I still believe the calculations of probability to be correct in my original post and I am interested to see someone prove otherwise.

Another comment by Alan:
Data corruption, no doubt, is almost always much worse than loss of availability. And some kinds of data corruption are worse than others. For example, mounting a non-clustered shared disk filesystem twice simultaneously is usually much worse, than updating two replicas of the data simultaneously. In the first case, you have to restore to your previous backups and lose all data since then. In the second case, you only lose updates that were made to one of the sides, and you instantly have a working copy of the data which is nearly always much newer than your last backup (with the possibility of recovering them by significant effort). Typically you would only lose a few minutes of updates at worst – and depending on the kind of networking failure, you might not lose anything.
Heartbeats certainly aren’t enough. You need to monitor the health of your servers and the health of your applications. Heartbeat monitors applications and can easily be informed of and act on the health of your servers (with release 2 style Linux-HA Heartbeat configurations).

Followed by:
Since data corruption is so serious, this is why cluster designers worry so much about split-brain, which is managed using the ideas of quorum and it’s sibling fencing.
This is all about keeping bad things from happening.
This post is really about quorum, since Russ had expressed interest in it.
Quorum is the idea that you can uniquely choose a subcluster to represent the whole cluster in those cases where communication failure has caused the cluster to split into separate sub-clusters which cannot properly communicate with each other. In this way, only one of the subclusters continues on, and the others will sit on their hands and do nothing waiting for a person to fix things.
Some of the kinds of quorum mentioned below are better than others. But, most importantly, they can be used in combination as described later.
The most common kind of quorum is that Russ mentioned in his earlier post – the majority quorum. In this method, for a cluster of n nodes, you grant quorum to a sub-cluster which has more than INT(n/2) members. This means that if you have a 3-node cluster, you have to have two nodes to continue. If you have 4 nodes, you have to have 3 nodes to continue. For 5 nodes, you have to have 3 nodes, and so on.
Other basic methods include disk reserve, so that you have reserve a disk to have quorum. In this case, if only one node survives and it can reserve the disk, it continues to run. However, the disk becomes a single point of failure. This may not be a problem if this single disk is required to run any of the cluster services, since they would fail without it anyway. [Heartbeat does not support this method].
An analagous method is to implement a software resource which grants quorum to one subcluster in a fashion analagous to the disk reserve method. This has the advantage of not requiring disk reserves, or a shared disk, but it has the same SPOF disadvantage as the disk reserve method. Heartbeat does support this method using the quorum daemon. It’s incredibly useful for those cases (like split-site clusters) where you cannot use fencing.
Another method is to grant quorum to any subcluster which can ping a certain set of nodes, and not grant it to any which can’t access those nodes. This isn’t a wonderful method, and has obvious disadvantages with respect to uniqueness, and single points of failure. (Heartbeat doesn’t yet implement this one).
Another method is to grant quorum to any node which is a member of a 2-node cluster. This is better than losing quorum and stopping when one node stops, but obviously completely ignores the uniqueness requirement of quorum.
Another method is to ask a human being if you have quorum. This is hardly an ideal circumstance, but useful in some contexts as described below. (Heartbeat doesn’t yet implement this one).
Perhaps you say, really the only one of these that’s really good is the first one – the majority vote method.
And, I would generally agree with you. But, Heartbeat has the ability to use these in combination which makes some of those methods that seem flaky to be much more reasonable.
Heartbeat has the ability to have multiple quorum modules declared, and they’re used in this way: Any module can return HAVEQUORUM, NOQUORUM, or TIE. If they return HAVEQUORUM or NOQUORUM, then no further quorum modules are consulted. However, if they return TIE, then the next quorum module is consulted for its opinion. If the last quorum module returns TIE, it is treated the same as NOQUORUM.
This enables you to use one quorum module to break the tie declared by a previous quorum module.
You could then use the quorumd to break the tie created by a voting module. Or you could use the quorumd instead of the “two-node” module. Or you could use the “pingable” module instead of the “two-node” module. Or you could at the end always tack on a “human” module, in case all else returns TIE.
This is kind of cool, actually. My favorites for next implementation are the pingable and consult human modules.
And, of course, if your cluster loses quorum due to real server failures failures, there are always ways to work around it, with a little human intervention. One method is to tell Heartbeat to ignore quorum. Another is to tell Heartbeat to remove certain nodes from the cluster, after you verify that they’re really dead. And, I’m sure that in a pinch, some new methods will be invented. And some of them might actually work ;-).

Regarding the quorumd, it seems that this is an extra server that will generally run on another machine separate from the rest of the cluster. So if we had a two-node cluster with a quorumd then it would effectively be a three-node cluster where one node is not configured to run any resources. It seems that the simpler approach in many cases would be to merely have a three-node cluster with resources not configured to run on one of the nodes.

For example if I was running a mail server cluster for an ISP I might configure a three node cluster of the two mail server back-end machines and one other machine that is lightly loaded (EG a DNS server) and have it configured not to run MTA resoures on the DNS machine.

comment spam

The war on comment-spam has now begun. It appears that Blogger might have some anti-spam measures of which I was unaware. Otherwise it’s a strange coincidence that I get a huge number of comment spams for extremely hard-core porn from the Ukraine so soon after starting a WordPress blog.

About 24 hours before the spam attack there was a strange blog comment that linked to google (with no offensive or spammy content). It appears that leaving it online was my mistake, when I left that online for a day the spammer decided that I might also leave porn spam online. I arrived home this evening to find almost 100 spams in the form of comments and track-backs, and more arriving by the minute. So I used iptables to block a /20 related to the spam and things are quiet now.

The moral of the story is to delete anything unusual ASAP in case it encourages the idiots.

I’ve also tightened the anti-spam measures on my blog too.

Update:

From now on any short comment that does not add significant meaning will not be accepted on my blog. To the person who submitted many dozens of comments with variants of “nice site” with the idea that the URL listed for the comment author will be visited by readers of my site – nice try. If you genuinely want to send me a message saying “nice blog” then email will work.

In the future I may remove the display of URLs for the comment authors entirely.

Several comments suggested using Akismet to block comment spam. Akismet is free for non-commercial use and charges for commercial use (a suggested threshold being $500 per month in blog revenue).

For the moment I am going to moderate all comments, the number of genuine comments is quite small and this is no great effort for me. I check the moderation list at least twice a day so there shouldn’t be an excessive delay either.

new blog

I am starting to move my blog to my own WordPress server. Here is the new URL for my main blog (feed), and here is the new URL for my Source-Dump blog (feed) which is now named just “dump”.

WordPress gives me the power to change all aspects of my blog’s operation (including adding plug-ins). It also allows me to correctly display greater-than and less-than characters (the Perl script I use for converting them is at this post – it’s short now but will probably grow).

Hopefully the new blog will also solve the date problems that some Planet readers have been complaining about.

I will briefly put the same content on both the old and new blogs, when I’m fully confident in the new blog I’ll stop updating the old one and try to get all Planet installations changed. Anyone who wants to convert their Planet installation to my new blog now is welcome to do so.

more on presentations

Here’s an amusing video about how not to do presentations.

paper about ZCAV

This paper by Rodney Van Meter about ZCAV (Zoned Constant Angular Velocity) in hard drives is very interesting. It predates my work by about four years and includes some interesting methods of collecting data that I never considered.

One interesting thing is that apparently on some SCSI drives you can get the drive to tell you where the zones are. If I get enough spare time I would like to repeat such tests and see how the data returned by disks compares to benchmark results.

It’s also interesting to note that Rodney’s paper shows a fairly linear drop of performance on higher sector numbers (while he notes that it would be expected to fall off more quickly at higher sector numbers). One of my recent tests with a 300G disk showed the greater than linear performance drop (see my ZCAV results page for more details). It might require modern large disks to show this performance characteristic.

I also found it very interesting to see that a modified version of Bonnie was used for some of the tests and that it gave consistent results! I assumed that any filesystem based tests of ZCAV performance would introduce unreasonable amounts of variance into my tests and instead wrote my ZCAV test program to directly read the disk and measure performance.

It’s times like this that I wish for a “groundhog day” so that I could spend a year doing nothing but reading technical papers.

moving this blog

I’m going to move this blog. The content is now at http://dump.coker.com.au/ .

The aim is to fix the problems documented here (among other things) by moving to a site that I control.

MySQL security in Debian

Currently there is a problem with the MySQL default install in Debian/Etch (and probably other distributions too). It sets up “root” with dba access with no password by default, the following mysql command will give a list of all MySQL accounts with Grant_priv access (one of the capabilities that gives great access to the database server) and shows their hashed password (as a matter of procedure I truncated the hash for my debian-sys-maint account). As you can see the “root” and “debian-sys-maint” accounts have such access. The debian-sys-maint account is used for Debian package management tools and it’s password is stored in the /etc/mysql/debian.cnf file.

$ echo "select Host,User,Password from user where Grant_priv='y'" | mysql -u root mysql
Host    User    Password
localhost       root
aeon    root
localhost       debian-sys-maint        *882F90515FCEE65506CBFCD7

It seems likely that most people who have installed MySQL won’t realise this problem and will continue to run their machine in that manner, this is a serious issue for multi-user machines. There is currently Debian bug #418672 about this issue. In my tests this issue affects Etch machines as well as machines running Unstable.

booting from USB for security

Sune Vuorela asks about how to secure important data such as GPG keys on laptops.

I believe that the ideal solution involves booting from a USB device with an encrypted root filesystem to make subversion of the machine more difficult (note that physically subverting the machine is still possible – EG through monitoring the keyboard hardware).

The idea is that you boot from the USB device which contains the kernel, initrd, and the decryption key for the root filesystem. The advantage of having the key on a USB device is that it can be longer and more random than anything you might memorise.

In my previous posts about a good security design for an office, more about securing an office, and biometrics and passwords I covered some of the details of this.

My latest idea however is to have the root filesystem encrypted with both a password that is entered and by a password stored on the USB device. This means that someone who steals both my laptop and my USB key will still have some difficulty in getting at my data, but also someone who steals just the laptop will find that it is encrypted with a key that can not be brute-forced with any hardware that doesn’t involve quantum-computing.

Also coincidentally also on Planet Debian in the same day Michael Prokop documents how to solve some of the problems relating to booting from a USB flash device.