I just read an interesting post about latency and how it affects web sites [1]. The post has some good ideas but unfortunately mixed information on some esoteric technologies such as infiniband that are not generally applicable with material that is of wide use (such as ping times).
The post starts by describing the latency requirements of Amazon and stock broking companies. It’s obvious that stock brokers have a great desire to reduce latency, it’s also not surprising that Google and Amazon analyse the statistics of their operations and make changes to increase their results by a few percent. But it seems to be a widely held belief that personal web sites are exempt from such requirements. The purpose of creating content on a web site is to have people read it, if you can get an increase in traffic of a few percent by having a faster site and if those readers refer others then it seems likely to have the potential to significantly improve the result. Note that an increase in readership through a better experience is likely to be exponential, and an exponential increase of a few percent a year will eventually add up (an increase of 4% a year will double the traffic in 18 years).
I have been considering hosting my blog somewhere else for a while. My blog is currently doing about 3G of traffic a month which averages out to just over 1KB/s, peaks will of course be a lot greater than that and the 512Kb/s of the Internet connection would probably be a limit even if it wasn’t for the other sites onn the same link. The link in question is being used for serving about 8G of web data per month and there is some mail server use which also takes bandwidth. So performance is often unpleasantly slow.
For a small site such as mine the most relevant issues seem to be based around available bandwidth, swap space use (or the lack therof), disk IO (for when things don’t fit in cache) and available CPU power exceeding the requirements.
For hosting in Australia (as I do right now) bandwidth is a problem. Internet connectivity is not cheap in any way and bandwidth is always limited. Also the latency of connections from Australia to other parts of the world often is not as good as desired (especially if using cheap hosting as I currently do).
According to Webalizer only 3.14% of the people who access my blog are from Australia, they will get better access to my site if hosted in Australia, and maybe the 0.15% of people who access my blog from New Zealand will also benefit from the locality of sites hosted in Australia. But the 37% of readers who are described as “US Commercial” (presumably .com) and the 6% described as “United States” (presumably .us) will benefit from US hosting, as will most of the 30% who are described as “Network” (.net I guess).
For getting good network bandwidth it seems that the best option is to choose what seems to be the best ISP in the US that I can afford, where determining what is “best” is largely based on rumour.
One of the comments on my post about virtual servers and swap space [2] suggested just not using swap and referenced the Amazon EC2 (Elastic Computing) cloud service and the Gandi.net hosting (which is in limited beta and not generally available).
The Amazon EC2 clound service [3] has a minimum offering of 1.7G of RAM, 1EC2 Compute Unit (equivalent to a 1.0-1.2GHz 2007 Opteron or 2007 Xeon processor), 160G of “instance storage” (local disk for an instance) running 32bit software. Currently my server is using 12% of a Celeron 2.4GHz CPU on average (which includes a mail server with lots of anti-spam measures, Venus, and other things). Running just the web sites on 1EC2 Compute Unit should use significantly less than 25% of a 1.0GHz Opteron. I’m currently using 400M of RAM for my DomU (although the MySQL server is in a different DomU). 1.7G of RAM for my web sites is heaps even when including a MySQL server. Currently a MySQL dump of my blog is just under 10M of data, with 1.7G of RAM the database should stay entirely in RAM which will avoid the disk IO issues. I could probably use about 1/3 of that much RAM and still not swap.
The cost of EC2 is $US0.10 per hour of uptime (for a small server), so that’s $US74.40 per month. The cost for data transfer is 17 cents a GIG for sending and 10 cents a gig for receiving (bulk discounts are available for multiple terabytes per month).
I am not going to pay $74 per month to host my blog. But sharing that cost with other people might be a viable option. An EC2 instance provides up to 5 “Elastic IP addresses” (public addresses that can be mapped to instances) which are free when they are being used (there is a cost of one cent per hour for unused addresses – not a problem for me as I want 24*7 uptime). So it should be relatively easy to divide the costs of an EC2 instance among five people by accounting for data transfer per IP address. Hosting five web sites that use the same software (MySQL and Apache for example) should reduce memory use and allow more effective caching. A small server on EC2 costs about five times more than one of the cheap DomU systems that I have previously investigated [4] but provides ten times the RAM.
While the RAM is impressive, I have to wonder about CPU scheduling and disk IO performance. I guess I can avoid disk IO on the critical paths by relying on caching and not doing synchronous writes to log files. That just leaves CPU scheduling as a potential area where it could fall down.
Here is an interesting post describing how to use EC2 [5].
Another thing to consider is changing blog software. I currently use WordPress which is more CPU intensive than some other options (due to being written in PHP), is slightly memory hungry (PHP and MySQL), and doesn’t have the best security history. It seems that an ideal blog design would use a language such as Java or PHP for comments and use static pages for the main article (with the comments in a frame or loaded by JavaScript). Then the main article would load quickly and comments (which probably aren’t read by most users) would get loaded later.
- [1] http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
- [2] http://etbe.coker.com.au/2008/08/27/killing-servers-virtualisation-and-swap/
- [3] http://www.amazon.com/b/?node=201590011
- [4] http://etbe.coker.com.au/2008/05/28/xen-hosting/
- [5] http://www.protocolostomy.com/2008/08/27/more-adventures-in-amazon-ec2-and-ebs/
for that kind of blogs I would recommend ikiwiki.
Why not use LiveJournal’s code, and tweak it as necessary? They put a ton of work into scaling up, and managing accounts. It could handle you and five friends easily, I think.
Amazon EC2 is extremely cool, but there are cheaper options around:
* the domU sites you mention. I use slicehost, and am able to fit my website, version control repositories, email (Postfix+Cyrus IMAP+SpamAssassin, not a particularly light setup), and several other things into a 512 slice at $38/month. For a website alone, even a dynamic MySQL-based one, you might be able to squeeze into a 256. It’s also a lot simpler to use – you don’t need to use Elastic Block Store or SimpleDB or S3 for persistent storage. Everything on a slicehost instance is persistent.
* Google App Engine. You wouldn’t be able to use WordPress, but if you’re thinking of changing blog software anyway, it’s worth considering.
* Blogger. You’d be giving up a lot of control, but it’s easy and free.
Hi Russell,
Dunno if you’ve tried this already, but I found that moving from mod_php to mod_fcgid (libapache2-mod-fcgid in Debian) significantly reduced the memory requirements of PHP for me.
cheers,
Chris
I agree with other commenters that statically sized and priced virtual servers (Slicehost and Linode seem to have the best mindshare: I use Linode and like it) are better than EC2 for this. EC2 seems to be more designed for a site that you might want to scale up in an awful hurry without developing more infrastructure, which is more true for commercial than personal websites.
Regarding software I don’t have much of a horse in the blog software race, but if you stick with WordPress WP Super Cache is designed to allow serving static pages as much as possible.
The Yahoo performance team has an interesting article on best practices for a fast website at http://developer.yahoo.com/performance/rules.html The YSlow Firefox extension will do their analysis for you.
Anon: ikiwiki is an interesting project, but doesn’t seem to compete well on features.
jldugger: Yes, I could also write my own blog server from scratch. But I have more than enough free software projects to keep busy and can’t start any more. I really don’t think that I need LiveJournal scalability either.
Scott: My understanding of the Amazon offering is that if you don’t choose to get the “elastic store” then you just end up with a regular system image. Is my understanding incorrect?
As for slicehost etc, the amount of bang for the buck seems a lot better with EC2 IF you have the need for it.
App Engine is an interesting service. But I don’t particularly like Python and really don’t want to write a big application in it.
I started blogging with Blogger, it’s OK for newbies (IMHO) but when you do more then you want more control.
Chris: That’s an interesting idea. I guess that using the threaded version of Apache (which mod-fcgid permits) would save some extra memory as well. Is it hard to set up?
Umm, I don’t think it was hard to do, but that was back on, er, Sarge, maybe, so it was a while ago since I did the transition. ;-)
Register another vote for wp_super_cache here, it’s got some nice features, the main one being that for visitors who have a WP cookie set a page is cached as a straight HTML page meaning there’s no need to invoke the PHP engine to produce it, so Apache can just sendfile() it to them.
the status with EC2 storage is this, you get an image, and as long as you don’t terminate that instance, your storage is persistent. But if you terminate the instance (or it’s terminated for you, by say, machine failure), the data is lost. If you’re just going to leave an EC2 instance up 24/7 it’s much like any other virtual host, although it’s not backed up or anything like that.
Regarding the best way to store comments, strikes me the best way is just write them into a static page and serve that as well. Most of your accesses are going to be of the “show me all the comments to this post” anyway. (you can always also have a database that allows searching, but i’d write the main comment page to disk and just serve that puppy)
Chris: Thanks for the advice, I’ll have to check that out.
Eric: Oh, that is not how I understood it – but on reviewing the web site it seems to be explained reasonably clearly. It seems that my preconceptions of what a Unix system is overrode Amazon’s description of the service.
The storage cost for EBS is a mere 10 cents per month per GB (which is nothing for the amount of data I have). The cost for IO is an issue. I’ll have to try and measure the amount of IO for a database to determine what it would be. While 10 cents for a million IO requests doesn’t sound like much, I really don’t know how much database operation I can get for a million IO requests.
What does ikiwiki lack that you need? You might consider reporting a feature request.
I DONOT like WebAlizer.
Main reasons:
1) Not very correct stats
2) Refspam through webalizer logs
Refspam is popular in my country, and in case they make it more often the site with WebAlizer may me ddosed.
Thats what i think
http://www.phpmyvisites.us/
Bubba: I believe that Webalizer is as good as you can get by analysing web logs. Javascript is supposed to give more accurate results (the above URL has an alternative to the popular Google offering).
As for referrer spam, this is why the default configuration is that Webalizer will not generate output with links to referrers.
Of course it’s best to password protect access to your web stats anyway.