Archives

Categories

Fixing execmod (textrel) Problems in Lenny

I’ve just updated my repository of SE Linux related packages for Lenny [1] to include a set of ffmpeg packages modified to not need text relocations (execmod access under SE Linux). I haven’t checked to make sure that I fixed all issues in those packages, but I have fixed all the issues that prevented Mplayer from working in a default configuration of SE Linux.

I had to patch the file libswscale/rgb2rgb.c to disable the MMX assembly code as the --disable-mmx option doesn’t work for that file. I changed the build script so that when it generates the code for the shared and cmov targets in i386 mode it adds -DPIC and -DBROKEN_RELOCATIONS to the CFLAGS and also added LIBOBJFLAGS=-fPIC to the ./configure run. There might have been a better way of doing this, but the current implementation basically works.

Long term I think that the ideal solution to this would be to have separate versions of the library packages for people who prefer extra security to a possible 15% performance benefit.

While using these libraries on an EeePC 701 (the least powerful of all the machines I own which could be used to play video) I was able to play full-screen video downloaded from ted.com without any glitches so it seems that a 15% performance loss is not a problem.

Noise in Computer Rooms

Some people think that you can recognise a good restaurant by the presence of obscure dishes on the menu or having high prices. The reality is that there are two ways of quickly identifying a good restaurant, one is the Michelin Guide [1] (or a comparable guide – if such a thing exists), the other is how quiet the restaurant is.

By a quiet restaurant I certainly don’t mean a restaurant with no customers (which may become very noisy once customers arrive). I mean a restaurant which when full will still be reasonably quiet. Making a restaurant quiet is not in itself a sufficient criteria to be a good restaurant – but it’s something that is usually done after the other criteria (such as hiring good staff and preparing a good menu) are met.

The first thing to do to make a room quiet is to have good carpet. Floor boards are easy to clean and the ratio of investment to lifetime is very good (particularly for hard wood), but they reflect sound and the movement of chairs and feet makes noise. A thick carpet with a good underlay is necessary to absorb sound. Booths are also good for containing sound if the walls extend above head height. Decorations on the walls such as curtains and thick wallpaper also absorb sound. A quiet environment allows people to talk at a normal volume which improves the dining experience.

It seems to me that the same benefits apply to server rooms and offices, with the benefit being more efficient work. I found it exciting when I first had my desk in a server room (surrounded by tens of millions of pounds worth of computer gear). But as I got older I found it less interesting to work in that type of environment just as I found it less interesting to have dinner in a noisy bar – and for the same reasons.

For a server room there is no escaping the fact that it will be noisy. But if the noise can be minimised then it will allow better communication between the people who are there and less distraction which should result in higher quality of work – which matters if you want good uptime! One thing I have observed is that physically larger servers tend to make less noise per volume and per compute power. For example a 2RU server with four CPUs seems to always make less noise than two 1RU servers that each have two CPUs. I believe that this is because a fan with a larger diameter can operate at a lower rotational speed which results in less bearing noise and the larger fans also give less turbulence. While it’s obvious that using fewer servers via virtualisation has the potential to avoid noise (both directly through fans and disks and indirectly through the cooling system for the server room [2]). A less obvious way of reducing noise is to swap two 1RU servers for one 2RU server – although my experience is that for machines in a similar price band, a 2RU server often has comparable compute power (in terms of RAM and disk capacity) to three or four 1RU servers.

To reduce noise both directly and indirectly it is a requirement to increase disk IO capacity (in terms of the number of random IOs per second) without increasing the number of spindles (disks). I just read an interesting Sun blog covering some concepts related to using Solid State Disks (SSDs) on ZFS for best performance [3]. It seems that using such techniques is one way of significantly increasing the IO capacity per server (and thus allowing more virtual servers on one physical machine) – it’s a pity that we currently don’t have access to ZFS or a similar filesystem for Linux servers (ZFS has license issues and the GPL alternatives are all in a beta state AFAIK). Another possibility that seems to have some potential is the use of NetApp Filers [4] for the main storage of virtual machines. A NetApp Filer gives a better ratio of IO requests per second to the number of spindles used than most storage array products due to the way they use NVRAM caching and their advanced filesystem features (which also incidentally gives some good options for backups and for detecting and correcting errors). So a set of 2RU servers that have the maximum amount of RAM installed and which use a NetApp Filer (or two if you want redundancy) for the storage with the greatest performance requirements should give the greatest density of virtual machines.

Blade servers also have potential to reduce noise in the server room. The most significant way that they do this is by reducing the number of power supplies, instead of having one PSU per server (or two if you want redundancy) you might have three or five PSUs for a blade enclosure that has 8 or more blades. HP blade enclosures support shutting down some PSUs when the blades are idling and don’t need much power (I don’t know whether blade enclosures from other vendors do this – I expect that some do).

A bigger problem however is the noise in offices where people work. It seems that the major responsible for this is the cheap cubicles that are used in most offices (and almost all computer companies). More expensive cubicles that are at almost head-height (for someone who is standing) and which have a cloth surface absorb sound better significantly improve the office environment, and separate offices are better still. One thing I would like to see is more use of shared desktop computers, it’s not difficult to set up a desktop machine with multiple video cards, so with appropriate software support (which is really difficult) you could have one desktop machine for two, or even four users which would save electricity and reduce noise.

Better quality carpet on the floors would also be a good thing. While office carpet wears out fast adding some underlay would not increase the long-term cost (it can remain as the top layer gets replaced).

Better windows in offices are necessary to provide a quiet working environment. The use of double-glazed windows with reflective plastic film significantly decreases the amount of heating and cooling that is required in the office. This would permit a lower speed of air flow for heating and cooling which means less noise. Also an office in a central city area will have a noise problem outside the building, again double (or even triple) glazed windows help a lot.

Some people seem to believe that an operations room should have no obstacles (one ops room where I once worked had all desks facing a set of large screens that displayed network statistics and the desks were like school desks with no dividers), I think that even for an ops room there should be some effort made to reduce the ambient noise. If the room is generally reasonably quiet then it should be easy to shout the news of an outage so that everyone can hear it.

Let’s assume for the sake of discussion that a quieter working environment can increase productivity by 5% (I think this is a conservative assumption). For an office full of skilled people who are doing computer work the average salary may be about $70,000, and it’s widely regarded that to factor in the management costs etc you should double the salary – so the average cost of an employee would be about $140,000. If there are 50 people in the office then the work of those employees has a cost of $7,000,000 per annum. A 5% increase in that would be worth $350,000 per annum – you could buy a lot of windows for that!

Islamophobia

I recently wasted a bit of time reading some right-wing blogs. One thing I noted was the repeated references to news reports about young women from an Islamic background being beaten (and in some cases killed) by their fathers (and other male relatives) for not conforming to some weird cultural ideas that some people associate with Islam. These are spun as examples of Islam being bad and therefore opposing immigration policies that allow Muslims into countries identified as “Christendom” or “The West” (never mind the fact that the vast majority of the population in “Christendom” don’t even attend church twice a year and the fact that Australia is directly south of China, Russia, and North Korea).

It seems to me that when young people follow the cultural standards of the country where they live rather than the standards of the country that their parents came from then it’s evidence of “multiculturalism” working. When young Muslim women are beaten by their fathers whether it’s considered an example of Muslims being bad (and who therefore should be excluded) or an example of Muslims as victims who should be protected is a matter of interpretation. It’s not as if there is any shortage of domestic violence cases from any religious or cultural group.

It’s often claimed that fundamentalist Muslims hate our culture, strangely the same people seem to claim that our culture will be destroyed by radical Islam. These two ideas seem to conflict, if our culture (the pro-science, free-speech, few inhibitions on clothing standards, do what you want but don’t hurt others culture that most readers of my blog enjoy) can be destroyed by radical Islam then they wouldn’t hate it. I think that the reason why fundamentalist religious people (Christians and Muslims) dislike our culture is because it is so strong. Our culture offers a way of life that is simply better than that which fundamentalist religious groups offer. Any religious person can choose to take a liberal approach to their religion (emphasising the positive aspects of giving to charity, being nice to others, etc) and enjoy our culture. Our culture is based around wide-spread communication, mass media, mobile phones, the Internet, custom clothing design, etc. It can do to religions what the sea does to rocks.

It seems that the strongest efforts at attacking our culture come from Christian groups. For example the Exclusive Bretheren [1] runs a high school in my area, according to a local paper it distinguishes itself by having no students enter a university course! The Exclusive Bretheren (and some other radical Christian groups) have a deliberate policy of keeping children stupid with the idea that people who think may decide to change their religion.

Some time ago I had a taxi driver start an unsolicited discussion of religion by telling me how much he hated Muslims. I pointed out the fact that there are Muslims of all races and asked why he thought that I was not a Muslim. After that the rest of the journey was very quiet.

The mainstream media would have us believe that Muslims have some sort of monopoly on terrorism. Noam Chomsky’s paper “Terror and Just Response” [2] is one of many that he has written on this issue. I realise that many people don’t want to acknowledge the involvement of the US government (and it’s allies such as Australia) in international terrorism. But please read Noam’s position (which is compelling) or read his wikipedia page which lists his extensive accomplishments [3] (if it’s the background of an author that impresses you).

Execmod and SE Linux – i386 Must Die

I have previously written about the execmod permission check in SE Linux [1] and in a post about SE Linux on the desktop I linked to some bug reports about it [2] (which probably won’t be fixed in Debian).

One thing I didn’t mention is the proof of the implication of this. When running a program in the unconfined_t domain on a SE Linux system (the domain for login sessions on a default configuration), if you set the boolean allow_execmod then the following four tests from paxtest will be listed as vulnerable:

Executable bss (mprotect)
Executable data (mprotect)
Executable shared library bss (mprotect)
Executable shared library data (mprotect)

This means that if you have a single shared object which uses text relocations and therefore needs the execmod permission then the range of possible vectors for attack against bugs in the application has just increased by four. This doesn’t necessarily require that the library in question is actually being used either! If a program is linked against many shared objects that it might use, then even if it is not going to ever use the library in question it will still need execmod access to start and thus give extra possibilities to the attacker.

For reference when comparing a Debian system that doesn’t run SE Linux (or has SE Linux in permissive mode) to a SE Linux system with execmod enabled the following tests fail (are reported as vulnerable):

Executable anonymous mapping (mprotect)
Executable heap (mprotect)
Executable stack (mprotect)
Writable text segments

If you set the booleans allow_execstack and allow_execheap then you lose those protections. But if you use the default settings of all three booleans then a program running in the unconfined_t domain will be protected against 8 different memory based attacks.

Based on discussions with other programmers I get the impression that fixing all the execmod issues on i386 is not going to be possible. The desire for a 15% performance boost (the expected result of using an extra register) is greater than the desire for secure systems among the people who matter most (the developers).

Of course we could solve some of these issues by using statically linked programs and have statically linked versions of the libraries in question which can use the extra register without any issues. This does of course mean that updates to the libraries (including security updates) will require rebuilding the statically linked applications in question – if a rebuild was missed then this could be reduce the security of the system.

To totally resolve that issue we need to have i386 machines (the cause of the problem due to their lack of registers) go away. Fortunately in the mainstream server, desktop, and laptop markets that is already pretty much done. I’m still running a bunch of P3 servers (and I know many other people who have similar servers), but they are not used for tasks that involve running programs that are partially written in assembly code (mplayer etc).

One problem is that there are still new machines being released with the i386 ISA as the only instruction set. For example the AMD Geode CPU [2] is used by the One Laptop Per Child (OLPC) project [3] and the new Intel Atom line of CPUs [4] apparently only supports the AMD64 ISA on the “desktop” models and the versions used in ultra-mobile PCs are i386 only.

I think that these issues are particularly difficult in regard to the OLPC. It’s usually not difficult to run “yum upgrade” or “apt-get dist-upgrade” on an EeePC or similar ultra-mobile PC. But getting an OLPC machine upgraded in some of the remote places where they may be deployed might be more difficult. Past experience has shown that viruses and trojans can get to machines that are supposed to be on isolated networks, so it seems that malware can get access to machines that can not communicate with servers that contain security patches… One mitigating factor however is that the OLPC OS is based on Fedora, and Fedora seems to be taking the strongest efforts to improve security of any mainstream distribution, a choice between 15% performance and security seems to be a no-brainer for the Fedora developers.

Efficiency of Cooling Servers

One thing I had wondered was why home air-conditioning systems are more efficient than air-conditioning systems for server rooms. I received some advice on this matter from the manager of a small server room (which houses about 30 racks of very powerful and power hungry servers).

The first issue is terminology, the efficiency of a “chiller” is regarded as the number of Watts of heat energy removed divided by the number of Watts of electricity consumed by the chiller. For example when using a 200% efficient air cooling plant, a 100W light bulb is rated as being a 150W heat source. 100W to Heat it, 50W from the cooling plant to cool it.

For domestic cooling I believe that 300% is fairly common for modern “split systems” (it’s the specifications for the air-conditioning on my house and the other air-conditioners on display had similar ratings). For high-density server rooms with free air cooling I have been told that a typical efficiency range is between 80% and 110%! So it’s possible to use MORE electricity on cooling than on running the servers!

One difficulty in cooling a server room is that the air often can’t flow freely (unlike a big open space such as the lounge room of your house). Another is the range of temperatures and the density of heat production in some parts (a 2RU server can dissipate 1000W of heat in a small space). These factors can be minimised by extracting hot air at the top and/or rear of racks and forcing cold air in the bottom and/or the front and by being very careful when planning where to place equipment. HP offers some services related to designing a server room to increase cooling efficiency, one of the services is using computational fluid dynamics to simulate the air-flow in the server-room [1]! CFD is difficult and expensive (the complete package from HP for a small server room costs more than some new cars), I believe that the fact that it is necessary for correct operation of some server rooms is an indication of the difficulty of the problem.

The most effective ways of cooling servers involve tight coupling of chillers and servers. This often means using chilled water or another liquid to extract the heat. Chilled water refrigeration systems largely remove the problem of being unable to extract the heat from the right places, but instead you have some inefficiency in pumping the water and the servers are fixed in place. I have not seen or heard of chilled water being used for 2RU servers (I’m not saying that it doesn’t get used or that it wouldn’t make sense – merely that I haven’t seen it). When installing smaller servers (2RU and below) there is often a desire to move them and attaching a chilled-water cooling system would make such a move more difficult and expensive. When a server weighs a ton or more then you aren’t going to move it in a hurry (big servers have to be mostly disassembled before the shell can be moved, and the shell might require the efforts of four men to move it). Another issue related to water cooling is the weight. Managing a moderate amount of water involves a lot of heavy pipes (a leak would be really bad) and the water itself can weigh a lot. A server room that is based around 20Kg servers might have some issues with the extra weight of water cooling (particularly the older rooms), but a server room designed for a single rack that weighs a ton can probably cope.

I have been told that the cooling systems for low density server rooms are typically as efficient as those used for houses, and may even be more efficient. I expect that when designing an air-conditioner the engineering trade-offs when designing for home use favor low purchase price. But someone who approves the purchase of an industrial cooling system will be more concerned about the overall cost of operations and will be prepared to spend some extra money up-front and recover it over the course of a few years. The fact that server rooms run 24*7 also gives more opportunity to recover the money spent on the purchase (my home A-C system runs for about 3 months a year for considerably less than 24 hours a day).

So it seems that the way to cool servers efficiently is to have low density server rooms (to the largest extent possible). One step towards this goal would be to have servers nearer the end users. For example having workgroup servers near the workgroup (instead of in the server room). Of course physical security of those servers would be more challenging – but if all the users have regular desktop PCs that can be easily 0wned then having the server for them in the same room probably doesn’t make things any worse. Modern tower servers are more powerful than rack mounted servers that were available a few years ago while also being very quiet. A typical rack-mounted server is not something you would want near your desk, but one of the quiet tower servers works quite well.

SE Linux in Lenny status – Achieved Level 1

I previously described the goals for SE Linux development in Lenny and assigned numbers to the levels of support [1]. I have just uploaded a new policy to unstable which I hope to get in Lenny that will solve all the major issues for level 1 of support (default configuration with the unconfined_t domain for all user sessions – much like the old “targeted” policy). The policy in question is in my Lenny SE Linux repository [2] (for those who don’t want to wait for it to get into Unstable or Lenny).

Random Opinions, Expert Opinions, and Facts about AppArmor

My previous post titled AppArmor is Dead [1] has inspired a number of reactions. Some of them have been unsubstantiated opinions, well everyone has an opinion so this doesn’t mean much. I believe that opinions of experts matter more, Crispin responded to my post and made some reasonable points [2] (although I believe that he is overstating the ease of use case). I take Crispin’s response a lot more seriously than most of the responses because of his significant experience in commercial computer security work. The opinion of someone who has relevant experience in the field in question matters a lot more than the opinion of random computer users!

Finally there is the issue of facts. Of the people who don’t agree with me, Crispin seems to be the first to acknowledge that Novell laying off AppArmor developers and adding SE Linux support are both bad signs for AppArmor. The fact that Red Hat and Tresys have been assigning more people to SE Linux development in the same time period that SUSE has been laying people off AppArmor development seems to be a clear indication of the way that things are going.

One thing that Crispin and I understand is the amount of work involved in maintaining a security system. You can’t just develop something and throw it to the distributions. There is ongoing work required in tracking kernel code changes, and when there is application support there is also a need to track changes to application code (and replacements of system programs). Also there is a need to add new features. Currently the most significant new feature development in SE Linux is related to X access controls – this is something that every security system for Linux needs to do (currently none of them do it). It’s a huge amount of work, but the end result will be that compromising one X client that is running on your desktop will not automatically grant access to all the other windows.

The CNET article about Novell laying off the AppArmor developers [3] says ‘“An open-source AppArmor community has developed. We’ll continue to partner with this community,” though the company will continue to develop aspects of AppArmor‘ and attributes that to Novell spokesman Bruce Lowry.

Currently there doesn’t seem to be an AppArmor community, the Freshmeat page for AppArmor still lists Crispin as the owner and has not been updated since 2006 [4], it also links to hosting on Novell’s site. The Wikipedia page for AppArmor also lists no upstream site other than Novell [4].

The AppArmor development list hosted by SUSE is getting less than 10 posts per month recently [6]. The AppArmor general list had a good month in January with a total of 23 messages (including SPAM) [7], but generally gets only a few messages a month.

The fact that Crispin is still listed as the project leader [8] says a lot about how the project is managed at Novell!

So the question is, how can AppArmor’s prospects be improved? A post on linsec.ca notes that Mandriva is using AppArmor, getting more distribution support would be good [9], but the most important thing in that regard will be contributing patches back and dedicating people to do upstream work (Red Hat does a huge amount of upstream development for SE Linux and a significant portion of my Debian work goes upstream).

It seems to me that the most important thing is to have an active community. Have a primary web site (maybe hosted by Novell, maybe SourceForge or something else) that is accurate and current. Have people giving talks about AppArmor at conferences to promote it to developers. Then try to do something to get some buzz about the technology, my SE Linux Play Machines inspired a lot of interest in the SE Linux technology [10]. If something similar was done with AppArmor then it would get some interest.

I’m not interested in killing AppArmor (I suspect that Crispin’s insinuations were aimed at others). If my posts on this topic inspire more work on AppArmor and Linux security in general then I’m happy. As Crispin notes the real enemy is his employer (he doesn’t quite say that – but it’s my interpretation of his post).

Google Chrome – the Security Implications

Google have announced a new web browser – Chrome [1]. It is not available for download yet, currently there is only a comic book explaining how it will work [2]. The comic is of very high quality and will help in teaching novices about how computers work. I think it would be good if we had a set of comics that explained all the aspects of how computers work.

One noteworthy feature is the process model of Chrome. Most browsers seem to aim to have all tabs and windows in the same process which means that they can all crash together. Chrome has a separate process for each tab so when a web site is a resource hog it will be apparent which tab is causing the performance problem. Also when you navigate from site A to site B they will apparently execute a new process (this will make the back-arrow a little more complex to implement).

A stated aim of the process model is to execute a new process for each site to clear out the memory address space. This is similar to the design feature of SE Linux where a process execution is needed to change security context so that a clean address space is provided (preventing leaks of confidential data and attacks on process integrity). The use of multiple processes in Chrome is just begging to have SE Linux support added. Having tabs opened with different security contexts based on the contents of the site in question and also having multiple stores of cookie data and password caches labeled with different contexts is an obvious development.

Without having seen the code I can’t guess at how difficult it will be to implement such features. But I hope that when a clean code base is provided by a group of good programmers (Google has hired some really good people) then the result would be a program that is extensible.

They describe Chrome as having a sandbox based security model (as opposed to the Vista modem which is based on the Biba Integrity Model [3]).

It’s yet to be determined whether Chrome will live up to the hype (although I think that Google has a good record of delivering what they promise). But even if Chrome isn’t as good as I hope, they have set new expectations of browser features and facilities that will drive the market.

Update: Chrome is now released [4]!

Thanks to Martin for pointing out that I had misread the security section. It’s Vista not Chrome that has the three-level Biba implementation.

Links August 2008

Michael Janke is writing a series of posts about estimating availability of systems, here is a link to the introduction [1]. He covers lots of things that people often miss (such as cooling). If you aren’t about to implement a system for reliability then it’s an interesting read. If you are about to implement a system where reliability is required and you have control of the system (not paying someone else to run it and hope for the best) then it’s an essential read. It will probably also be good to give this URL to managers who make decisions about such things.

Interesting summary of the connections between the Iraq war and the oil industry in the Reid Report [2]. The suggestion made by one of the sources she cites is that the intention of the war was to reduce the supply of Iraqi oil to increase prices. Sam Varghese has written an essay about this which summarises where the Iraqi oil goes [3]. It seems that half of Iraq’s oil goes to US military use, the other half is used domestically, and some oil is imported as well! So because of the US occupation the country with the second largest known oil reserves is importing petroleum products! If the US military was to cease operations world-wide then the oil price would drop significantly, this doesn’t just mean the occupation of Iraq and the various actions in South America, but also the bases in Germany and Japan.

Interesting paper by Alexander Sotirov and Mark Dowd about Bypassing Browser Memory Protection in Windows [4]. This paper is good for people who are interested in computer security but don’t generally use Windows (such as me), if you want to learn about the latest things happening in Windows land then this is a good place to start.

A well researched article by Rick Moen about the unintended effects of anti-gay-marriage laws [5]. Maybe some of the “conservatives” who advocate such laws should get themselves and their spouses tested. It would be amusing if someone like Rush Limbaugh turned out to be involved in a “gay marriage”.

What Sysadmins should know about exposure to hazardous materials [6]. High-level overview of the issues, probably a good start for some google searches to get the details.

Diamond John McCain is an interesting blog about the 73 year old (who was born in Panama) candidate in the US presidential election [7].

Update: Corrected my statement about Iraq’s oil reserves based on a comment by Sam.

Improving Blog Latency to Benefit Readers

I just read an interesting post about latency and how it affects web sites [1]. The post has some good ideas but unfortunately mixed information on some esoteric technologies such as infiniband that are not generally applicable with material that is of wide use (such as ping times).

The post starts by describing the latency requirements of Amazon and stock broking companies. It’s obvious that stock brokers have a great desire to reduce latency, it’s also not surprising that Google and Amazon analyse the statistics of their operations and make changes to increase their results by a few percent. But it seems to be a widely held belief that personal web sites are exempt from such requirements. The purpose of creating content on a web site is to have people read it, if you can get an increase in traffic of a few percent by having a faster site and if those readers refer others then it seems likely to have the potential to significantly improve the result. Note that an increase in readership through a better experience is likely to be exponential, and an exponential increase of a few percent a year will eventually add up (an increase of 4% a year will double the traffic in 18 years).

I have been considering hosting my blog somewhere else for a while. My blog is currently doing about 3G of traffic a month which averages out to just over 1KB/s, peaks will of course be a lot greater than that and the 512Kb/s of the Internet connection would probably be a limit even if it wasn’t for the other sites onn the same link. The link in question is being used for serving about 8G of web data per month and there is some mail server use which also takes bandwidth. So performance is often unpleasantly slow.

For a small site such as mine the most relevant issues seem to be based around available bandwidth, swap space use (or the lack therof), disk IO (for when things don’t fit in cache) and available CPU power exceeding the requirements.

For hosting in Australia (as I do right now) bandwidth is a problem. Internet connectivity is not cheap in any way and bandwidth is always limited. Also the latency of connections from Australia to other parts of the world often is not as good as desired (especially if using cheap hosting as I currently do).

According to Webalizer only 3.14% of the people who access my blog are from Australia, they will get better access to my site if hosted in Australia, and maybe the 0.15% of people who access my blog from New Zealand will also benefit from the locality of sites hosted in Australia. But the 37% of readers who are described as “US Commercial” (presumably .com) and the 6% described as “United States” (presumably .us) will benefit from US hosting, as will most of the 30% who are described as “Network” (.net I guess).

For getting good network bandwidth it seems that the best option is to choose what seems to be the best ISP in the US that I can afford, where determining what is “best” is largely based on rumour.

One of the comments on my post about virtual servers and swap space [2] suggested just not using swap and referenced the Amazon EC2 (Elastic Computing) cloud service and the Gandi.net hosting (which is in limited beta and not generally available).

The Amazon EC2 clound service [3] has a minimum offering of 1.7G of RAM, 1EC2 Compute Unit (equivalent to a 1.0-1.2GHz 2007 Opteron or 2007 Xeon processor), 160G of “instance storage” (local disk for an instance) running 32bit software. Currently my server is using 12% of a Celeron 2.4GHz CPU on average (which includes a mail server with lots of anti-spam measures, Venus, and other things). Running just the web sites on 1EC2 Compute Unit should use significantly less than 25% of a 1.0GHz Opteron. I’m currently using 400M of RAM for my DomU (although the MySQL server is in a different DomU). 1.7G of RAM for my web sites is heaps even when including a MySQL server. Currently a MySQL dump of my blog is just under 10M of data, with 1.7G of RAM the database should stay entirely in RAM which will avoid the disk IO issues. I could probably use about 1/3 of that much RAM and still not swap.

The cost of EC2 is $US0.10 per hour of uptime (for a small server), so that’s $US74.40 per month. The cost for data transfer is 17 cents a GIG for sending and 10 cents a gig for receiving (bulk discounts are available for multiple terabytes per month).

I am not going to pay $74 per month to host my blog. But sharing that cost with other people might be a viable option. An EC2 instance provides up to 5 “Elastic IP addresses” (public addresses that can be mapped to instances) which are free when they are being used (there is a cost of one cent per hour for unused addresses – not a problem for me as I want 24*7 uptime). So it should be relatively easy to divide the costs of an EC2 instance among five people by accounting for data transfer per IP address. Hosting five web sites that use the same software (MySQL and Apache for example) should reduce memory use and allow more effective caching. A small server on EC2 costs about five times more than one of the cheap DomU systems that I have previously investigated [4] but provides ten times the RAM.

While the RAM is impressive, I have to wonder about CPU scheduling and disk IO performance. I guess I can avoid disk IO on the critical paths by relying on caching and not doing synchronous writes to log files. That just leaves CPU scheduling as a potential area where it could fall down.

Here is an interesting post describing how to use EC2 [5].

Another thing to consider is changing blog software. I currently use WordPress which is more CPU intensive than some other options (due to being written in PHP), is slightly memory hungry (PHP and MySQL), and doesn’t have the best security history. It seems that an ideal blog design would use a language such as Java or PHP for comments and use static pages for the main article (with the comments in a frame or loaded by JavaScript). Then the main article would load quickly and comments (which probably aren’t read by most users) would get loaded later.