etbe - Russell Coker

Car vs Public Transport to Save Money

I’ve just been considering when it’s best to drive and when it’s best to take public transport to save money. My old car (1999 VW Passat) uses 12.8L/100km which at $1.65 per liter means 21.1 cents per km on fuel. A new set of tires costs $900 and assuming that they last 20,000km will cost 4.5 cents per km. A routine service every 10,000Km will cost about $300 so that’s another 3 cents per km. While it’s difficult to estimate the cost per kilometer of replacing parts that wear out, it seems reasonable to assume that over 100,000Km of driving at least $20,000 will be spent on parts and the labor required to install them, this adds another 20 cents per km.

The total then would be 48.6 cents per km. The tax deduction for my car is 70 cents per km of business use, so if my estimates are correct then the tax deductions exceed the marginal costs of running a vehicle (the costs of registration, insurance, and asset depreciation however make the car significantly more expensive than that – see my previous post about the costs of owning a small car for more details [1]). So for business use the marginal cost after tax deductions are counted is probably about 14 cents per km.

Now a 2 hour ride on Melbourne’s public transport costs $2.76 (if you buy a 10 trip ticket). For business use that’s probably the equivalent cost to 20Km of driving. The route I take when driving to the city center is about 8Km, that gets me to the nearest edge of the CBD (Central Business District) and doesn’t count the amount of driving needed to find a place to park. This means the absolute minimum distance I would drive when going to the CBD would be 16Km. The distance I would drive on a return trip to the furthest part of the CBD would be almost exactly 20km. So on a short visit to the central city area I might save money by using my car if it’s a business trip and I tax-deduct the distance driven. A daily ticket for the public transport is equivalent to two 2 hour tickets (if you have a 10 trip ticket then if you use it outside the two hour period it becomes a daily ticket and uses a second credit). If I could park my car for an out of pocket expense of less than $2.76 (while I can tax-deduct private parking it’s so horribly expensive that it would cost at least $5 after deductions are counted) then I could possibly save money by driving. There were some 4 hour public parking spots that cost $2.

So it seems that for a basic trip to the CBD it’s more expensive to use a car than to take a tram when car expenses are tax deductible. For personal use a 5.7km journey would cost as much as a 2 hour ticket for public transport and a 11.4km journey would cost as much as a daily ticket. The fact that public transport is the economical way to travel for such short distances is quite surprising. In the past I had thought of using a tram ticket as an immediate cost while considering a short car drive as costing almost nothing (probably because the expense comes days later for petrol and years later for servicing the car).

Also while there is a lot of media attention recently about petrol prices, it seems that for me at least petrol is still less than half the marginal cost of running a car. Cars are being advertised on the basis of how little fuel they use to save money, but cars that require less service might actually save more money. There are many cars that use less fuel than a VW Passat, and also many cars that are less expensive to repair. It seems that perhaps the imported turbo-Diesel cars which are becoming popular due to their fuel use may actually be more expensive than locally manufactured small cars which have cheap parts.

Update: Changed “Km” to “km” as suggested by Lars Wirzenius.

[1] http://etbe.coker.com.au/2008/06/16/cost-owning-car/

Links June 2008

Paul Graham has recently published an essay titled How To Disagree [1]. One form that he didn’t mention is to claim that a disagreement is a matter of opinion. Describing a disagreement about an issue which can be proved as a matter of opinion is a commonly used method of avoiding the need to offer any facts or analysis.

Sam Varghese published an article about the Debian OpenSSL issue and quoted me [2].

The Basic AI Drives [3] is an interesting papar about what might motivate an AI and how AIs might modify themselves to better achieve their goals. It also has some insights into addiction and other vulnerabilities in human motivation.

It seems that BeOS [4] is not entirely dead. The Haiku OS project aims to develop an open source OS for desktop computing based on BeOS [5]. It’s not nearly usable for end-users yet, but they have vmware snapshots that can be used for development.

On my Document Blog I have described how to debug POP problems with the telnet command [6]. Some users might read this and help me fix their email problems faster. I know that most users won’t be able to read this, but the number of people who can use it will surely be a lot greater than the number of people who can read the RFCs…

Singularity tales is an amusing collection of short stories [7] about the Technological Singularity [8].

A summary of the banana situation [9]. Briefly describes how “banana republics” work and the fact that a new variety of the Panama disease is spreading through banana producing countries. Given the links between despotic regimes and banana production it’s surprising that no-one is trying to spread the disease faster. Maybe Panama disease could do for South America what the Boll weevil did for the south of the US [10].

Jeff Dean gives an interesting talk about the Google server architecture [11]. One thing I wonder about is whether they have experimented with increasing the chunk size over the years. It seems that the contiguous IO performance of disks has been steadily increasing while the seek performance has stayed much the same, and the dramatic increases in the amount of RAM you can get for any given amount of money over the last few years have been amazing. So it seems that now it’s possible to read larger chunks of data in the same amount of time and more easily store such large chunks in memory.

Solving Rubik’s Cube and IO Bandwidth

Solving Rubiks Cube by treating disk as RAM: Gene Cooperman gave an interesting talk at Google about how he proved that Rubik’s Cube can be solved in 26 moves and how treating disk as RAM was essential for this. The Google talk is on Youtube [1]. I recommend that you read the ACM paper he wrote with Daniel Kunkle first before watching the talk. Incidentally due to the resolution of Youtube it would have been good if the notes had less than 10 lines per screen.

Here is the main page for the Rubiks Cube project with source and math [2], note that I haven’t been interested enough to read the source but I’m including the link for reference.

The main concept is that modern disks can deliver up to 100MB/s (I presume that’s from the outer tracks, I suspect that the inner tracks wouldn’t deliver that speed) for contiguous IO. Get 50 disks running at the same speed and you get 5GB/s for contiguous IO which is a typical speed for RAM. Of course that RAM speed is for a single system while getting 50 disks running at that speed will require either a well-tuned system from SGI (who apparently achieved such speeds on a single process on a single system many years ago – but I can’t find a reference) or 5+ machines from anyone else. The configuration that Gene describes apparently involves a network of machines with one disk each, he takes advantage of hardware purchased for other tasks (where the disks are mostly wasted).

I believe that SGI sells Altix machines which can have enough RAM to store all that data. It is NUMA RAM, even the “slow” access to RAM on another NUMA node should be a lot faster in most cases for sequential access and when there are seeks the benefits of NUMA RAM over disk will be dramatic. Of course the cost of a large NUMA installation is also significant, while a set of 50 quad-core machines with 500G disks is affordable by some home users.

TED – Defining Words

I recently joined the community based around the TED conference [1]. The TED conference is expensive ($6000US) and has a long waiting list (the 2009 conference is sold out) so it seems quite unlikely that I will ever attend one. But signing up to the web site is easy and might offer some benefit.

optional words to define yourself as a TED member

One thing that interested me was that part of the sign-up process requests that you select up to 10 words from the list above to describe yourself. Some of the words seem almost mandatory for anyone who is interested in what TED has to offer (I find it difficult to imagine someone declaring that they are not an “activist” or a “change agent” while wanting to be involved with TED in any way). The range of words also seems quite strange, there are some professions mixed with educational status, marital status, and religion. The way it is laid out would tend to encourage people to make a decision as to which aspects of their life are more important, is career, marital status, or religion more important?

Given the nature of TED I’m wondering whether the intentionally did a bad job of that part of the site design to encourage people to think about these issues.

It seems to me that a better way of doing this would be to provide a few suggestions and allow people to fill in text fields with their own values. Even defining marital status can require many choices and there is no limit to the number of religions and careers. If you try to make a comprehensive list then you will end up doing what British Airways did with their frequent flyer membership application page [2]. Even disregarding the choices of spelling (EG Admiral vs Admiraal and Brig Gen vs Brig General vs Brigadier General) the British Airways list is unreasonably long, and I doubt that anyone who deserves the title “Her Magesty” or “His Holyness” is going to be interested in frequent flyer points.

Also I wonder which of the entries in the TED list would be most commonly accepted by the free software community. It seems that activist and technologist would be quite popular.

Here is the list in text form for those who can’t get the picture above:
Continue reading TED – Defining Words

ISP Redundancy and Virtualisation

If you want a reliable network then you need to determine an appropriate level of redundancy. When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.

A common example is a large mail server. You might have MX servers to receive mail from the Internet, front-end servers to send mail to the Internet, database or LDAP servers (of which there is one server for accepting writes and redundant slave servers for allowing clients to read data), and some back-end storage. The back-end storage is generally going to lack redundancy to some degree (all the common options involve mail being stored in one location). So the redundancy would start with the routers which direct traffic to redundant servers (typically a pair of routers in a failover configuration – I would use OpenBSD boxes running CARP if I was given a choice in how to implement this [1], in the past I’ve used Cisco devices).

The next obvious place for redundancy is for the MX servers (it seems that most ISPs have machines with names such as mx01.example.net to receive mail from the Internet). The way that MX records are used in the DNS means that there is no need for a router to direct traffic to a pair of servers, and even a pair of redundant routers is another point of failure so it’s best to avoid them where possible. A smaller ISP might have two MX machines that are used for both sending outbound mail from their users (which needs to go through a load-balancing router) as well as inbound mail. A larger ISP will have two or more machines dedicated to receiving mail and two or more machines dedicated to sending mail (when you scan for viruses on both sent and received mail it can take a lot of compute power).

Now the database or LDAP servers used for storing user account data is another possible place for redundancy. While some database and LDAP servers support multi-master operation a more common configuration is to have a single master and multiple slaves which are read-only. This means that you want to have more slaves than are really required so that you can lose one without impacting the service.

There are several ways of losing a server. The most obvious is a hardware failure. While server class machines will have redundant PSUs, RAID, ECC RAM, and a general high quality of hardware design and manufacture, they still have hardware problems from time to time. Then there are a variety of software related ways of losing a server, most of which stem from operator error and bugs in software. Of course the problem with the operator errors and software bugs is that they can easily take out all redundant machines. If an operator mistakenly decides that a certain command needs to be run on all machines they will often run it on all machines before realising that it causes things to go horribly wrong. A software bug will usually be triggered by the same thing on all machines (EG I’ve had bad data written to a master LDAP server cause all slaves to crash and had a mail loop between two big ISPs take out all front-end mail servers).

Now if you have a mail server running on a virtual platform such that the MX servers, the mail store, and the database servers all run on the same hardware then redundancy is very unlikely to alleviate hardware problems. It’s difficult to imagine a situation where a hardware failure takes out one DomU while leaving others running.

It seems to me that if you are running on a single virtual server there is no benefit in having redundancy. However there is benefit in having an infrastructure which supports redundancy. For example if you are going to install new software on one of the servers there is a possibility that the software will fail. Doing upgrades and then having to roll them back is one of the least pleasant parts of sys-admin work, not only is it difficult but it’s also unreliable (new software writes different data to shared files and you have to hope that the old version can cope with them).

To implement this you need to have a Dom0 that can direct traffic to multiple redundant servers for services which only have a single server. Then when you need to upgrade (be it the application or the OS) you can configure a server on the designated secondary address, get it running, and then disable traffic to the primary server. If there are any problems you can direct traffic back to the primary server (which can be done much more quickly than downgrading software). Also if configured correctly you could have the secondary server be accessible from certain IP addresses only. So you could test the new version of the software using employees as test users while customers use the old version.

One advantage a virtual machine environment for load balancing is that you can have as many virtual Ethernet devices as you desire and you can configure them using software (without changing cables in the server room). A limitation on the use of load-balancing routers is that traffic needs to go through the router in both directions. This is easy for the path from the Internet to the server room and the path from the server room to the customer network. But when going between servers in the server room it’s a problem (which is not insurmountable, merely painful and expensive). Of course there will be a cost in CPU time for all the extra routing. If instead of having a single virtual ethernet device for all redundant nodes you have a virtual ethernet device for every type of server and use the Dom0 as a router you will end up doubling the CPU requirements for networking without even considering the potential overhead of the load balancing router functionality.

Finally there is a significant benefit in virtual machines for reliability of services. That is the ability to perform snapshot backups. If you have sufficient disk space and IO capacity you could have a snapshot of your server taken every day and store several old snapshots. Of course doing this effectively would require some minor changes to the configuration of machines to avoid unnecessary writes, this would include not compressing old log files and using a ram disk for /tmp and any other filesystem with transient data. When you have snapshots you can then run filesystem analysis tools on the snapshots to detect any silent corruption that may be occurring and give the potential benefit of discovering corruption before it gets severe (but I have yet to see a confirmed report of this saving anyone). Of course similar snapshot facilities are available on almost every SAN and on many NAS devices, but there are many sites that don’t have the budget to use such equipment.

[1] http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol

Letter Frequency in Account Names

It’s a common practice when hosting email or web space for large numbers of users to group the accounts by the first letter. This is due to performance problems on some filesystems with large directories and due to the fact that often a 16bit signed integer is used for the hard link count so that it is impossible to have more than 32767 subdirectories.

I’ve just looked at a system I run (Bluebottle anti-spam email service [1]) which has about half a million accounts and counted the incidence of each first letter. It seems that S is the most common at almost 10% and M and A aren’t far behind. Most of the clients have English as their first language, naturally the distribution of letters would be different for other languages.

Now if you were to have a server with less than 300,000 accounts then you could probably split them based on the first letter. If there were more than 300,000 accounts then you would face the risk of having there be too many account names starting with S. See the table below for the incidences of all the first letters.

The two letter prefix MA comprised 3.01% of the accounts. So if faced with a limit of 32767 sub-directories then if you split by two letters then you might expect to have no problems until you approached 1,000,000 accounts. There were a number of other common two-letter prefixes which also had more than 1.5% of the total number of accounts.

Next I looked at the three character prefixes and found that MAR comprised 1.06% of all accounts. This indicates that splitting on the first three characters will only save you from the 32767 limit if you have 3,000,000 users or less.

Finally I observed that the four character prefix JOHN (which incidentally is my middle name) comprised 0.44% of the user base. That indicates that if you have more than 6,400,000 users then splitting them up among four character prefixes is not necessarily going to avoid the 32767 limit.

It seems to me that the benefits of splitting accounts by the first characters is not nearly as great as you might expect. Having directories for each combination of the first two letters is practical I’ve seen directory names such as J/O/JOHN or JO/JOHN (or use J/O/HN or JO/HN if you want to save directory space). But it becomes inconvenient to have J/O/H/N and the form JOH/N will have as many as 17,576 subdirectories for the first three letters which may be bad for performance.

This issue is only academic as far as most sys-admins won’t ever touch a system with more than a million users. But in terms of how you would provision so many users, in the past the limits of server hardware were approached long before these issues. For example in 2003 I was running some mail servers on 2RU rack mounted systems with four disks in a RAID-5 array (plus one hot-spare) – each server had approximately 200,000 mailboxes. The accounts were split based on the first two letters, but even if it had been split on only one letter it would probably have worked. Since then performance has improved in all aspects of hardware. Instead of a 2RU server having five 3.5″ disks it will have eight 2.5″ disks – and as a rule of thumb increasing the number of disks tends to increase performance. Also the CPU performance of servers has dramatically increased, instead of having two single-core 32bit CPUs in a 2RU server you will often have two quad-core 64bit CPUs – more than four times the CPU performance. 4RU machines can have 16 internal disks as well as four CPUs and therefore could probably serve mail for close to 1,000,000 users.

While for reliability it’s not the best idea to have all the data for 1,000,000 users on internal disks on a single server (which could be the topic of an entire series of blog posts), I am noting that it’s conceivable to do so and provide adequate performance. Also of course if you use one of the storage devices that supports redundant operation (exporting data over NFS, iSCSI, or Fiber Channel) then if things are configured correctly then you can achieve considerably more performance and therefore have a greater incentive to have the data for a larger number of users in one filesystem.

Hashing directory names is one possible way of alleviating these problems. But this would be a little inconvenient for sys-admin tasks as you would have to hash the account name to discover where it was stored. But I guess you could have a shell script or alias to do this.

Here is the list of frequency of first letters in account names:

First Letter	Percentage
a	7.65
b	5.86
c	5.97
d	5.93
e	2.97
f	2.85
g	3.57
h	3.19
i	2.21
j	6.09
k	3.92
l	3.91
m	8.27
n	3.15
o	1.44
p	4.82
q	0.44
r	5.04
s	9.85
t	5.2
u	0.85
v	1.9
w	2.4
x	0.63
y	0.97
z	0.95

[1] http://www.bluebottle.com/

The Cost of Owning a Car

There has been a lot of talk recently about the cost of petrol, Colin Charles is one of the few people to consider the issue of wages in this discussion [1]. Unfortunately almost no-one seems to consider the overall cost of running a vehicle.

While I can’t get the figures for Malaysia (I expect Colin will do that) I can get them for Australia. First I chose a car that’s cheap to buy, reasonably fuel efficient (small) and common (cheap parts from the wreckers) – the Toyota Corolla seemed like a good option.
Continue reading The Cost of Owning a Car

What is Appropriate Advertising?

Colin Charles writes about a woman who is selling advertising space on herself [1]. Like Colin I haven’t bought a t-shirt in about 9 years (apart from some Cafepress ones I designed myself). So it seems that the price for getting some significant advertising at a computer conference is to buy a few hundred t-shirts (they cost $7 each when buying one at a time from Cafepress, I assume that the price gets lower than $3 each when buying truck-loads). I have been given boxer-shorts and socks with company logos on them (which I never wore), I think that very few people will show their underwear to enough people to make boxer-shorts a useful advertising mechanism, socks would probably work well in Japan though.

It seems to me that many people regard accepting free t-shirts as being an exception to all the usual conventions regarding advertising. Accepting gifts from companies that you do business with is generally regarded as a bad idea, except of course when t-shirts and other apparel are given out then it’s OK. Being paid to wear a placard advertising a product is regarded as degrading by many people, but accepting a free t-shirt (effectively being paid $7 for wearing advertising) is regarded as OK by almost everyone.

I don’t mind being a walking advert for a company such as Google. I use many Google products a lot and I can be described as a satisfied customer. There are some companies that have given me shirts which I only wear in winter under a jumper. The Oracle Unbreakable Linux [2] shirt is one that I wear in winter.

Now I would not consider accepting an offer to have advertising on my butt (although I’m pretty sure that it doesn’t get enough attention that anyone would make such an offer). I would however be happy to talk with someone who wants to pay me to wear a t-shirt with advertising when giving a lecture at a conference. I am not aware of any conference which has any real dress requirement for speakers (apart from the basic idea of not offending the audience). The standard practice is that if your employer pays you to give a lecture as part of their marketing operation then they give you a shirt to wear (polo more often than t-shirt). I am currently working on some things which could end up as papers for presentation at Linux conferences. If someone wanted to sponsor my work on one of those free software related projects and then get the recognition of having me wear their shirt while giving a lecture and have me listed as being sponsored by that company in the conference proceedings then that seems like a reasonable deal for everyone.

One thing that you need to keep in mind when accepting or soliciting for advertising is the effect it has on your reputation. Being known as someone who wants advertising on their butt probably wouldn’t be fun for very long.

On the Internet advertising seems to be almost everywhere. It seems that more than half the content on the net (by the number of pages or by the number of hits) either has an option to donate (as Wikipedia does and some blogs are starting to do), has Google advertising (or a similar type of adverts from another company), is a sales site (IE you can buy online), or is a marketing site (IE provides background information and PR to make you want to buy at some other time). Note that my definition of advertising is quite broad, for example the NSA web site [3] has a lot of content that I regard as advertising/marketing – with the apparent aim of encouraging skilled people to apply for jobs. Not that I’m complaining, I’ve visited the National Cryptologic Museum [4] several times and learned many interesting things!

I think that Internet advertising that doesn’t intrude on the content (IE no pop-ups, page diversions, or overly large adverts) is fine. If the advertising money either entirely pays people to produce useful content or simply encourages them to do so (as in the case of all the blogs which earn $10 a month) then I’m happy with that. I have previously written about some of my experience advertising on my blog [5] and how I encourage others to do the same.

I don’t think that space on a t-shirt is any more or less appropriate for advertising than space on a web site hosting someone’s blog.

Finally there is one thing I disagree with in Colin’s post, that is the use of the word “whore“. It’s not uncommon to hear the term “whoring” used as a slang term for doing unreasonable or unworthy things to make money (where “unreasonable” and “unworthy” often merely means doing something that the speaker wouldn’t be prepared to do). But using the term when talking about a woman is quite likely to cause offense and is quite unlikely to do any good. The Wikipedia page about prostitution [6] has some interesting background information.

I’m Skeptical about Robotic Nanotech

There has been a lot of fear-mongering about nanotech. The idea is that little robots will eat people (or maybe eat things that we depend on such as essential food crops). It’s unfortunate that fear-mongering has replaced thought and there seems to have been little serious discussion about the issues.

If (as some people believe) nanotech has the potential to be more destructive than nuclear weapons then it’s an issue that needs to be discussed in debates before elections and government actions to alleviate the threat need to be reported on the news – as suggested in the Accelerating Future blog [0].

I predict that there will be three things which could be called nanotech in the future:

Artifical life forms as described by Craig Venter in his talk for ted.com [1]. I believe that these should be considered along with nanotech because the boundary between creatures and machines can get fuzzy when you talk about self-replicating things devised by humans which are based on biological processes.
I believe that artificial life forms and tweaked versions of current life forms have significant potential for harm. The BBC has an interesting article on health risks of GM food which suggests that such foods should be given the same level of testing as pharmaceuticals [2]. But that’s only the tip of the iceberg, the potential use of Terminator Gene technology [3] in biological warfare seems obvious.
But generally this form of nanotech has the same potential as bio-warfare (which currently has significantly under-performed when compared to other WMDs) and needs to be handled in the same way.
The more commonly discussed robotic nanotech, self-replicating and which can run around to do things (EG work inside a human body). I doubt that tiny robots can ever be as effective at adapting to their environment as animals, I also doubt that they can self-replicate in the wild. Currently we create CPU cores (the most intricate devices created by humans) from very pure materials in “clean rooms”. Making tiny machines in clean-rooms is not easy, making them in dirty environments is going to be almost impossible. Robots as we know them are based around environments that are artificially clean not natural environments. Robots that can self-replicate in a clean-room when provided with pure supplies of the necessary raw materials is a solvable problem. I predict that this will remain in science-fiction.
Tiny robots manufactured in factories to work as parts of larger machines. This is something that we are getting to today. It’s not going to cause any harm as long as the nano-bots can’t be manufactured on their own and can’t survive in the wild.

In summary, I think that the main area that we should be concerned about in regard to nano-bot technology is as a new development on the biological warfare theme. This seems to be a serious threat which deserves the attention of major governments.

Wyndham Resorts is a Persistent Spammer

Over the last week I have received five phone calls from Wyndham Resorts asking if I would like to be surveyed. Every time I tell them that I am not going to do their survey, on all but one call I had to repeatedly state that I would not do the survey for more than two minutes before they would go away.

The advantage of phone spam over email spam is that the caller pays, I guess that they have a time limit of three minutes when calling a mobile phone to save on calling costs.

There have been a number of proposals for making people pay for email to discourage spam. Even a cost of a few cents a message would make spam cease being economically viable for a mass audience (a smaller number of targeted spams would be easier to block or delete). But such plans to entirely change the way email works have of course failed totally.

But for phones it could work. I’d like to have a model where anyone who calls me has to pay an extra dollar per minute which gets added on to their phone bill. When people who I want to talk to call me I could reimburse them (or maybe be able to give my phone company a list of numbers not to bill).

This could also seamlessly transition to commercial use. I would be happy to accept calls from people asking for advice about Linux and networking issues for $1 per minute. With all the people who call me about such things for free already it would be good to answer some questions for money.

etbe – Russell Coker

Archives

Categories

Car vs Public Transport to Save Money

Links June 2008

Solving Rubik’s Cube and IO Bandwidth

TED – Defining Words

ISP Redundancy and Virtualisation

Letter Frequency in Account Names

The Cost of Owning a Car

What is Appropriate Advertising?

I’m Skeptical about Robotic Nanotech

Wyndham Resorts is a Persistent Spammer

Archives

Email and RSS

Archives

Categories

Tags

Archives

Email and RSS