1

Other Reasons for not Censoring the Net

Currently there is a debate about censoring the Internet in Australia. Although debate might not be the correct word for a dispute where one party provides no facts and refuses to talk to any experts (Senator Conroy persistently refuses all requests to talk to anyone who knows anything about the technology or to have his office address any such questions). The failures of the technology are obvious to anyone who has worked with computers, here is an article in the Sydney Morning Herald about it [1] (one of many similar articles in the MSM). I don’t plan to mention the technological failures again because I believe that the only people who read my blog and don’t understand the technology are a small number of my relatives – I gave up on teaching my parents about IP protocols a long time ago.

One of the fundamental problems with the current censorship idea is that they don’t seem to have decided what they want to filter and who they want to filter it from. The actions taken to stop pedophiles from exchanging files are quite different from what would be taken to stop children accidentally accessing porn on the net. I get the impression that they just want censorship and will say whatever they think will impress people.

I have previously written about the safety issues related to mobile phones [2]. In that document I raised the issue of teenagers making their own porn (including videos of sexual assault). About four months after writing it a DVD movie was produced showing a gang of teenagers sexually assaulting a girl (they sold copies at their school). It seems that the incidence of teenagers making porn using mobile phones is only going to increase, while no-one has any plans to address the problem.

The blog www.somebodythinkofthechildren.com has some interesting information on this issue.

Two final reasons for opposing net censorship have been provided by the Sydney Anglicans [3]. They are:

  1. Given anti-vilification laws, could religious content be deemed “illegal” and be filtered out? Could Sydneyanglicans.net be blocked as “illegal” if it carries material deemed at some point now or in the future as vilifying other religions? If it’s illegal in Vic say, and there isn’t state-based filtering (there wont be), will the govt be inclined to ban it nation wide?
  2. Given anti-discrimination laws, if Sydneyanglicans.net runs an article with the orthodox line on homosexuality, will that be deemed illegal, and the site blocked? You can imagine it wouldn’t be too hard for someone to lobby Labor via the Greens, for instance.

So the Sydney Anglicans seem afraid that their religious rights to discriminate against others (seriously – religious organisations do have such rights) will be under threat if filtering is imposed.

I was a bit surprised when I saw this article, the Anglican church in Melbourne seems reasonably liberal and I had expected the Anglican church in the rest of Australia to be similar. But according to this article Peter Jensen (Sydney’s Anglican Archbishop) regards himself as one of the “true keepers of the authority of the Bible” [4]. It seems that the Anglican church is splitting over the issues related to the treatment of homosexuals and women (Peter believes that women should not be appointed to leadership positions in the church to avoid “disenfranchising” men who can’t accept them [5]).

It will be interesting to see the fundamentalist Christians who want to protect their current legal rights to vilify other religions and discriminate against people on the basis of gender and sexual preference fighting the other fundamentalist Christians who want to prevent anyone from seeing porn. But not as interesting as it will be if the Anglican church finally splits and then has a fight over who owns the cathedrals. ;)

A comment on my previous post about the national cost of slow net access suggests that Germany (where my blog is now hosted) has better protections for individual freedom than most countries [6]. If you want unrestricted net access then it is worth considering the options for running a VPN to another country (I have previously written a brief description of how to set up a basic OpenVPN link [7]).

5

The National Cost of Slow Internet Access

Australia has slow Internet access when compared to other first-world countries. The costs of hosting servers are larger and the cost of residential access is greater with smaller limits. I read news reports with people in other countries complaining about having their home net connection restricted after they transfer 300G in one month, I have two net connections at the moment and the big (expensive) one allows me 25G of downloads per month. I use Internode, here are their current prices [1] (which are typical for Australia – they weren’t the cheapest last time I compared but they offer a good service and I am quite happy with them).

Most people in Australia don’t want to pay $70 per month for net access, I believe that the plans which have limits of 10G of download or less are considerably more popular.

Last time I investigated hosting servers in Australia I found that it would be totally impractical. The prices offered for limits such as 10G per month (for a server!) were comparable to prices offered by Linode [2] (and other ISPs in the US) for hundreds of gigs of transfer per month. I have recently configured a DomU at Linode for a client, Linode conveniently offers a choice of server rooms around the US so I chose a server room that was in the same region as my client’s other servers – giving 7 hops according to traceroute and a ping time as low as 2.5ms!

Currently I am hosting www.coker.com.au and my blog in Germany thanks to the generosity of a German friend. An amount of bandwidth that would be rather expensive for hosting in Australia is by German standards unused capacity in a standard hosting plan. So I get to host my blog in Germany with higher speeds than my previous Australian hosting (which was bottlenecked due to overuse of it’s capacity) and no bandwidth quotas that I am likely to hit in the near future. This also allows me to do new and bigger things, for example one of my future plans is to assemble a collection of Xen images of SE Linux installations – that will be a set of archives that are about 100MB in size. Even when using bittorrent transferring 100MB files from a server in Australia becomes unusable.

Most Australians who access my blog and have reasonably fast net connections (cable or ADSL2+) will notice a performance improvement. Australians who use modems might notice a performance drop due to longer latencies of connections to Germany (an increase of about 350ms in ping times). But if I could have had a fast cheap server in Australia then all Australians would have benefited. People who access my blog and my web site from Europe (and to a slightly lesser extent from the US) should notice a massive performance increase, particularly when I start hosting big files.

It seems to me that the disadvantages of hosting in Australia due to bandwidth costs are hurting the country in many ways. For example I run servers in the US (both physical and Xen DomUs) for clients. My clients pay the US companies for managing the servers, these companies employ skilled staff in the US (who pay US income tax). It seems that the career opportunities for system administrators in the US and Europe are better than for Australia – which is why so many Australians choose to work in the US and Europe. Not only does this cost the country the tax money that they might pay if employed here, but it also costs the training of other people. It is impossible to estimate the cost of having some of the most skilled and dedicated people (the ones who desire the career opportunities that they can’t get at home) working in another country, contributing to users’ groups and professional societies, and sharing their skills with citizens of the country where they work.

Companies based in Europe and the US have an advantage in that they can pay for hosting in their own currency and not be subject to currency variations. People who run Australian based companies that rent servers in the US get anxious whenever the US dollar goes up in value.

To quickly investigate the hosting options chosen for various blogs I used the command “traceroute -T -p80” to do SYN traces to port 80 for some of the blogs syndicated on Planet Linux Australia [3]. Of the blogs I checked there were 13 hosted in Australia, 11 hosted independently in the US, and 5 hosted with major US based blog hosting services (WordPress.com, Blogspot, and LiveJournal). While this is a small fraction of the blogs syndicated on that Planet, and blog hosting is also a small fraction of the overall Internet traffic, I think it does give an indication of what choices people are making in terms of hosting.

Currently the Australian government is planning to censor the Internet with the aim of stopping child porn. Their general plan is to spend huge amounts of money filtering HTTP traffic in the hope that pedophiles don’t realise that they can use encrypted email, HTTPS, or even a VPN to transfer files without them getting blocked. If someone wanted to bring serious amounts of data to Australia, getting a tourist to bring back a few terabyte hard disks in their luggage would probably be the easiest and cheapest way to do it. Posting DVDs is also a viable option.

Given that the Internet censorship plan is doomed to failure, it would be best if they could spend the money on something useful. Getting a better Internet infrastructure in the country would be one option to consider. The cost of Internet connection to other countries is determined by the cost of the international cables – which can not be upgraded quickly or cheaply. But even within Australia bandwidth is not as cheap as it could be. If the Telstra monopoly on the local loop was broken and the highest possible ADSL speeds were offered to everyone then it would be a good start towards improving Australia’s Internet access.

Australia and NZ seem to have a unique position on the Internet in terms of being first-world countries that are a long way from the nearest net connections and which therefore have slow net access to the rest of the world. It seems that the development of Content Delivery Network [4] technology could potentially provide more benefits for Australia than for most countries. CDN enabling some common applications (such as WordPress) would not require a huge investment but has the potential to decrease international data transfer while improving the performance for everyone. For example if I could have a WordPress slave server in Australia which directed all writes to my server in Germany and have my DNS server return an IP address for the server which matches the region where the request came from then I could give better performance to the 7% of my blog readers who appear to reside in Australia while decreasing International data transfer by about 300MB per month.

15

Support Gay Marriage in case You Become Gay

A common idea among the less educated people who call themselves “conservative” seems to be that they should oppose tax cuts for themselves and support tax cuts for the rich because they might become rich and they want to prepare for that possibility.

The US census data [1] shows that less than 1% of males aged 15+ earn $250K. For females it’s less than 0.2%.

On the Wikipedia page about homosexuality [2] it is claimed that 2%-7% of the population are gay (and 12% of Norwegians have at least tried it out). Apparently homosexuality can strike suddenly, you never know when a right-wing politician or preacher will suddenly and unexpectedly be compelled to hire gay whores (as Ted Haggard [3] did) or come out of the closet (as Jim Kolbe [4] did).

So it seems that based on percentages you are more likely to become gay than to become rich. So it would be prudent to prepare for that possibility and lobby for gay marriage in case your sexual preference ever changes.

But on a serious note, of the people who earn $250K or more (an income level that has been suggested for higher tax rates) there will be a great correlation between the amount of education and the early start to a career. Go to a good university and earn more than the median income in your first job, and you will be well on track to earning $250K. A common misconception is that someone who has not had a great education can still be successful by starting their own company. While there are a few people who have done that, the vast majority of small companies fail in the first few years. Working hard doesn’t guarantee success, for a company to succeed you need to have the right product at the right time – this often depends on factors that you can’t predict (such as the general state of the economy and any new products released by larger companies).

Basics of EC2

I have previously written about my work packaging the tools to manage Amazon EC2 [1].

First you need to login and create a certificate (you can upload your own certificate – but this is probably only beneficial if you have two EC2 accounts and want to use the same certificate for both). Download the X509 private key file (named pk-X.pem) and the public key (named cert-X.pem). My Debian package of the EC2 API tools will look for the key files in the ~/.ec2 and /etc/ec2 directories and will take the first one it finds by default.

To override the certificate (when using my Debian package) or to just have it work when using the code without my package you set the variables EC2_PRIVATE_KEY and EC2_CERT.

This Amazon page describes some of the basics of setting up the client software and RSA keys [2]. I will describe some of the most important things now:

The command “ec2-add-keypair gsg-keypair > id_rsa-gsg-keypair” creates a new keypair for logging in to an EC2 instance. The public key goes to amazon and the private key can be used by any ssh client to login as root when you creat an instance. To create an instance with that key you use the “-k gsg-keypair” option, so it seems a requirement to use the same working directory for creating all instances. Note that gsg-keypair could be replaced by any other string, if you are doing something really serious with EC2 you might use one account to create instances that are run by different people with different keys. But for most people I think that a single key is all that is required. Strangely they don’t provide a way of getting access to the public key, you have to create an instance and then copy the /root/.ssh/authorized_keys file for that.

This Amazon page describes how to set up sample images [3].

The first thing it describes is the command ec2-describe-images -o self -o amazon which gives a list of all images owned by yourself and all public images owned by Amazon. It’s fairly clear that Amazon doesn’t expect you to use their images. The i386 OS images that they have available are Fedora Core 4 (four configurations with two versions of each) and Fedora 8 (a single configuration with two versions) as well as three other demo images that don’t indicate the version. The AMD64 OS images that they have available are Fedora Core 6 and Fedora Core 8. Obviously if they wanted customers to use their own images (which seems like a really good idea to me) they would provide images of CentOS (or one of the other recompiles of RHEL) and Debian. I have written about why I think that this is a bad idea for security [4], please make sure that you don’t use the ancient Amazon images for anything other than testing!

To test choose an i386 image from Amazon’s list, i386 is best for testing because it allows the cheapest instances (currently $0.10 per hour).

Before launching an instance allow ssh access to it with the command “ec2-authorize default -p 22“. Note that this command permits access for the entire world. There are options to limit access to certain IP address ranges, but at this stage it’s best to focus on getting something working. Of course you don’t want to actually use your first attempt at creating an instance, I think that setting up an instance to run in a secure and reliable manner would require many attempts and tests. As all the storage of the instance is wiped when it terminates (as we aren’t using S3 yet) and you won’t have any secret data online security doesn’t need to be the highest priority.

A sample command to run an instance is “ec2-run-instances ami-2b5fba42 -k gsg-keypair” where ami-2b5fba42 is a public Fedora 8 image available at this moment. This will give output similar to the following:

RESERVATION r-281fc441 999999999999 default
INSTANCE i-0c999999 ami-2b5fba42 pending gsg-keypair 0 m1.small 2008-11-04T06:03:09+0000 us-east-1c aki-a71cf9ce ari-a51cf9cc

The parameter after the word INSTANCE is the serial number of the instance. The command “ec2-describe-instances i-0c999999” will provide information on the instance, once it is running (which may be a few minutes after you request it) you will see output such as the following:

RESERVATION r-281fc441 999999999999 default
INSTANCE i-0c999999 ami-2b5fba42 ec2-10-11-12-13.compute-1.amazonaws.com domU-12-34-56-78-9a-bc.compute-1.internal running gsg-keypair 0 m1.small 2008-11-04T06:03:09+0000 us-east-1c aki-a71cf9ce ari-a51cf9cc

The command “ssh -i id_rsa-gsg-keypair root@ec2-10-11-12-13.compute-1.amazonaws.com” will then grant you root access. The part of the name such as 10-11-12-13 is the public IP address. Naturally you won’t see 10.11.12.13, it will instead be public addresses in the Amazon range – I replaced the addresses to avoid driving bots to their site.

The name domU-12-34-56-78-9a-bc.compute-1.internal is listed in Amazon’s internal DNS and returns the private IP address (in the 10.0.0.0/8 range) which is used for the instance. The instance has no public IP address, all connections (both inbound and outbound) run through some sort of NAT. This shouldn’t be a problem for HTTP, SMTP, and most protocols that are suitable for running on such a service. But for FTP or UDP based services it might be a problem. The part of the name such as12-34-56-78-9a-bc is the MAC address of the eth0 device.

To halt a service you can run shutdown or halt as root in the instance, or run the ec2-terminate-instances command and give it the instance ID that you want to terminate. It seems to me that the best way of terminating an instance would be to run a script that produces a summary of whatver the instance did (you might not want to preserve all the log data, but some summary information would be useful), and give all operations that are in progress time to stop before running halt. A script could run on the management system to launch such an orderly shutdown script on the instance and then uses ec2-terminate-instances if the instance does not terminate quickly enough.

In the near future I will document many aspects of using EC2. This will include dynamic configuration of the host, dynamic DNS, and S3 storage among other things.

6

Some RAID Issues

I just read an interesting paper titled An Analysis of Data Corruption in the Storage Stack [1]. It contains an analysis of the data from 1,530,000 disks running at NetApp customer sites. The amount of corruption is worrying, as is the amount of effort that is needed to detect them.

NetApp devices have regular “RAID scrubbing” which involves reading all data on all disks at some quiet time and making sure that the checksums match. They also store checksums of all written data. For “Enterprise” disks each sector stores 520 bytes, which means that a 4K data block is comprised of 8 sectors and has 64 bytes of storage for a checksum. For “Nearline” disks 9 sectors of 512 bytes are used to store a 4K data block and it’s checksum. These 64byte checksum includes the identity of the block in question, the NetApp WAFL filesystem writes a block in a different location every time, this allows the storage of snapshots of old versions and also means that when reading file data if the location that is read has data from a different file (or a different version of the same file) then it is known to be corrupt (sometimes writes don’t make it to disk). Page 3 of the document describes this.

Page 13 has an analysis of error location and the fact that some disks are more likely to have errors at certain locations. They suggest configuring RAID stripes to be staggered so that you don’t have an entire stripe covering the bad spots on all disks in the array.

One thing that was not directly stated in the article is the connection between the different layers. On a Unix system with software RAID you have a RAID device and a filesystem layer on top of that, and (in Linux at least) there is no way for a filesystem driver to say “you gave me a bad version of that block, please give me a different one”. Block checksum errors at the filesystem level are going to be often caused by corruption that leaves the rest of the RAID array intact, this means that the RAID stripe will have a mismatching checksum. But the RAID driver won’t know which disk has the error. If a filesystem did checksums on metadata (or data) blocks and the chunk size of the RAID was greater than the filesystem block size then when the filesystem detected an error a different version of the block could be generated from the parity.

NetApp produced an interesting guest-post on the StorageMojo blog [2]. One point that they make is that Nearline disks try harder to re-read corrupt data from the disk. This means that a bad sector error will result in longer timeouts, but hopefully the data will be returned eventually. This is good if you only have a single disk, but if you have a RAID array it’s often better to just return an error and allow the data to be retrieved quickly from another disk. NetApp also claim that “Given the realities of today’s drives (plus all the trends indicating what we can expect from electro-mechanical storage devices in the near future) – protecting online data only via RAID 5 today verges on professional malpractice“, it’s a strong claim but they provide evidence to support it.

Another relevant issue is the size of the RAID device. Here is a post that describes the issue of the Unrecoverable Error Rate (UER) and how it can impact large RAID-5 arrays [3]. The implication is that the larger the array (in GB/TB) the greater the need for RAID-6. It has been regarded for a long time that a larger number of disks in the array drove a greater need for RAID-6, but the idea that larger disks in a RAID array gives a greater need for RAID-6 is a new idea (to me at least).

Now I am strongly advising all my clients to use RAID-6. Currently the only servers that I run which don’t have RAID-6 are legacy servers (some of which can be upgraded to RAID-6 – HP hardware RAID is really good in this regard) and small servers with two disks in a RAID-1 array.

5

My Prediction for the iPhone

I have previously written about how I refused an offer of a free iPhone [1] (largely due to it’s closed architecture). The first Google Android phone has just been announced, the TechCrunch review is interesting – while the built-in keyboard is a nice feature the main thing that stands out is the open platform [2]. TechCrunch says “From now on, phones need to be nearly as capable as computers. All others need not apply“.

What I want is a phone that I control, and although most people don’t understand the issues enough to say the same, I think that they will agree in practice.

In the 80’s the Macintosh offered significant benefits over PCs, but utterly lost in the marketplace because it was closed (less available software and less freedom). Due to being used in Macs and similar machines the Motorolla 68000 CPU family also died out, and while it’s being used in games consoles and some other niche markets the PPC CPU family (the next CPU used by Apple) also has an uncertain future. The IBM PC architecture evolved along with it’s CPU from a 16bit system to a 64bit system and took over the market because it does what users want it to do.

I predict that the iPhone will be just as successful as the Macintosh OS and for the same reasons. The Macintosh OS still has a good share of some markets (it has traditionally been well accepted for graphic design and has always provided good hardware and software support for such use), and is by far the most successful closed computer system, but it has a small part of the market.

I predict that the iPhone will maintain only a small share of the market. There will be some very low-end phones that have the extremely closed design that currently dominates the market, and the bulk of the market will end up going with Android or some other open phone platform that allows users to choose how their phone works. One issue that I think will drive user demand for control over their own phones is the safety issues related to child use of phones (I’ve written about this previously [3]). Currently phone companies don’t care about such things – the safety of customers does not affect their profits. But programmable phones allows the potential for improvements to be made without involving the phone company – while with iPhone you have Apple as the roadblock.

Now having a small share of the mobile phone market could be very profitable, just as the small share of the personal computer market is quite profitable for Apple. But it does mean that I can generally ignore them as they aren’t very relevant in the industry.

5

RSS Aggregation Software

The most commonly installed software for aggregating RSS feeds seems to be Planet and Venus (two forks of the same code base). The operation is that a cron job runs the Python program which syndicates a list of RSS feeds and generates a static web page. Of course the problems start if you have many feeds as polling each of them (even the ones that typically get updated at most once a week) can take a while. My experience with adding moderate numbers of feeds (such as all the feeds used by Planet Debian [1]) is that it can take as much as 30 minutes to poll them all – which will be a problem if you want frequent updates.

Frequent polling is not always desired, it means more network load and a greater incidence of transient failures. Any error in updating a feed is (in a default configuration) going to result in an error message being displayed by Planet, which in a default configuration will result in cron sending an email to the sysadmin. Even with an RSS feed being checked every four hours (which is what I do for my personal Planet installations) it can still be annoying to get the email when someone’s feed is offline for a day.

Now while there is usually no benefit in polling every 15 minutes (the most frequent poll time that is commonly used) there is one good reason for doing it if you can only poll. The fact that some people want to click reload on the Planet web page every 10 minutes to look for new posts is not a good reason (it’s like looking in the fridge every few minutes and hoping that something tasty will appear). The good reason for polling frequently is to allow timely retraction of posts. It’s not uncommon for bloggers to fail to adequately consider the privacy implications of their posts (let’s face it – professional journalists have a written code of ethics about this, formal training, an editorial board, and they still get it wrong on occasion – it’s not easy). So when a mistake is made about what personal data should be published in a blog post it’s best for everyone if the post can be amended quickly. The design of Planet is that when a post disappears from the RSS feed then it also disappears from the Planet web page, I believe that this was deliberately done for the purpose of removing such posts.

The correct solution to the problem of amending or removing posts is to use the “Update Services” part of the blog server configuration to have it send an XML RPC to the syndication service. That can give an update rapidly (in a matter of seconds) without any polling.

I believe that a cron job is simply the wrong design for a modern RSS syndication service. This is no criticism of Planet (which has been working well for years for many people) but is due to the more recent requirements of more blogs, more frequent posting, and greater importance attached to blogs.

I believe that the first requirement for a public syndication service is that every blogger gets to specify the URL of their own feed to save the sysadmin the effort of doing routine URL changes. It should be an option to have the server act on HTTP 301 codes and record the new URL in the database. Then the sysadmin would only have to manage adding new bloggers (approving them after they have created an account through a web-based interface) and removing bloggers.

The problem of polling frequency can be mostly solved by using RPC pings to inform the server of new posts if the RPC mechanism supports removing posts. If removing posts is not supported by the RPC then every blog which has an active post would have to be polled frequently. This would reduce the amount of polling considerably, for example there are 319 blogs that are currently syndicated on Planet Debian, there are 60 posts in the feed, and those posts were written by 41 different people. So if the frequent polling to detect article removal was performed for active articles, given the fact that you poll the bloggers feed URL not the article that would only mean 41 polls instead of 319 – reducing the polling by a factor of more than 7!

Now even with support for RPC pings there is still a need to poll feeds. One issue is that feeds may experience temporary technical difficulty in sending the RPC as we don’t want to compel the authors of blog software to try and make the ping as reliable a process as sending email (if that was the requirement then a ping via email might be the best solution). The polling frequency could be implemented on a per-blog basis based on the request of the blogger and the blog availability and posting frequency. Someone who’s blog has been down for a day (which is not uncommon when considering a population of 300 bloggers) could have their blog polled on a daily basis. Apart from that the polling frequency could be based on the time since the last post. It seems to be a general pattern that hobby bloggers (who comprise the vast majority of bloggers syndicated in Planet installations) often go for weeks at a time with no posts and then release a series of posts when they feel inspired.

In terms of software which meats these requirements, the nearest option seems to be the the Advogato software mod_virgule [2]. Advogato [3] supports managing accounts with attached RSS feeds and also supports ranking blogs for a personalised view. A minor modification of that code to limit who gets to have their blog archived, and fixing it so that a modified post only has the latest version stored (not both versions as Advogato does) would satisfy some of these requirements. One problem is that Advogato’s method of syndicating blogs is to keep an entire copy of each blog (and all revisions). This goes against the demands of many bloggers who demand that Planet installations not keep copies of their content for a long period and not have any permanent archives. Among other things if there are two copies of a blog post then Google might get the wrong idea as to which is the original.

Does anyone know of a system which does better than Advogato in meeting these design criteria?

9

Software has No Intrinsic Value

In a comment on my Not All Opinions Are Equal [1] post AlphaG said “Anonymous comments = free software, no intrinsic value as you got it for nothing”.

After considering the matter I came to the conclusion that almost all software has no intrinsic value (unless you count not being sued for copyright infringement as intrinsic value). When you buy software you generally don’t get a physical item (maybe a CD or DVD), to increase profit margins manuals aren’t printed for most software (it used to be that hefty manuals were shipped to give an impression that you were buying a physical object). Software usually can’t be resold (both due to EULA provisions and sites such as EBay not wanting to accept software for sale) and recently MS has introduced technical measures that prevent even using it on a different computer (which force legitimate customers to purchase more copies of Windows when they buy new hardware but doesn’t stop pirates from using it without paying). Even when software could be legally resold there were always new versions coming out which reduced the sale price to almost zero in a small amount of time.

The difference between free software and proprietary software in terms of value is that when you pay for free software you are paying for support. This therefore compels the vendor to provide good support that is worth the money. Vendors of proprietary software have no incentive to provide good support – at least not unless they are getting paid a significant amount of money on top of the license fees. This is why Red Hat keeps winning in the CIO Vendor Value Studies from CIO Insight [2]. Providing value is essential to the revenue of Red Hat, they need to provide enough value in RHEL support that customers will forgo the opportunity to use CentOS for free.

Thinking of software as having intrinsic value leads to the error of thinking of software purchases as investments. Software is usually outdated in a few years, as is the hardware that is used to run it. Money spent on software and hardware should be considered as being a tax on doing business. This doesn’t mean that purchases should be reduced to the absolute minimum (if systems run slowly they directly decrease productivity and also cause a loss of morale). But it does mean that hardware purchases should not be considered as investments – the hardware will at best be on sale cheap at an auction site in 3-5 years, and purchases of proprietary software are nothing but a tax.

8

Google Chrome – the Security Implications

Google have announced a new web browser – Chrome [1]. It is not available for download yet, currently there is only a comic book explaining how it will work [2]. The comic is of very high quality and will help in teaching novices about how computers work. I think it would be good if we had a set of comics that explained all the aspects of how computers work.

One noteworthy feature is the process model of Chrome. Most browsers seem to aim to have all tabs and windows in the same process which means that they can all crash together. Chrome has a separate process for each tab so when a web site is a resource hog it will be apparent which tab is causing the performance problem. Also when you navigate from site A to site B they will apparently execute a new process (this will make the back-arrow a little more complex to implement).

A stated aim of the process model is to execute a new process for each site to clear out the memory address space. This is similar to the design feature of SE Linux where a process execution is needed to change security context so that a clean address space is provided (preventing leaks of confidential data and attacks on process integrity). The use of multiple processes in Chrome is just begging to have SE Linux support added. Having tabs opened with different security contexts based on the contents of the site in question and also having multiple stores of cookie data and password caches labeled with different contexts is an obvious development.

Without having seen the code I can’t guess at how difficult it will be to implement such features. But I hope that when a clean code base is provided by a group of good programmers (Google has hired some really good people) then the result would be a program that is extensible.

They describe Chrome as having a sandbox based security model (as opposed to the Vista modem which is based on the Biba Integrity Model [3]).

It’s yet to be determined whether Chrome will live up to the hype (although I think that Google has a good record of delivering what they promise). But even if Chrome isn’t as good as I hope, they have set new expectations of browser features and facilities that will drive the market.

Update: Chrome is now released [4]!

Thanks to Martin for pointing out that I had misread the security section. It’s Vista not Chrome that has the three-level Biba implementation.

35

AppArmor is Dead

For some time there have been two mainstream Mandatory Access Control (MAC) [1] systems for Linux. SE Linux [2] and AppArmor [3].

In late 2007 Novell laid off almost all the developers of AppArmor [4] with the aim of having the community do all the coding. Crispin Cowan (the founder and leader of the AppArmor project) was later hired by Microsoft, which probably killed the chances for ongoing community development [5]. Crispin has an MSDN blog, but with only one post so far (describing UAC) [6], hopefully he will start blogging more prolifically in future.

Now SUSE is including SE Linux support in OpenSUSE 11.1 [7]. They say that they will not ship policies and SE Linux specific tools such as “checkpolicy”, but instead they will be available from “repositories”. Maybe this is some strange SUSE thing, but for most Linux users when something is in a “repository” then it’s shipped as part of the distribution. The SUSE announcement also included the line “This is particularly important for organizations that have already standardized on SELinux, but could not even test-drive SUSE Linux Enterprise before without major work and changes“. The next step will be to make SE Linux the default and AppArmor the one that exists in a repository, and the step after that will be to remove AppArmor.

In a way it’s a pity that AppArmor is going away so quickly. The lack of competition is not good for the market, and homogenity isn’t good for security. But OTOH this means more resources will be available for SE Linux development which will be a good thing.

Update: I’ve written some more about this topic in a later post [8].