Controlling a STONITH and Upgrading a Cluster

One situation that you will occasionally encounter when running a Heartbeat cluster is a need to prevent a STONITH of a node. As documented in my previous post about testing STONITH the ability to STONITH nodes is very important in an operating cluster. However when the sys-admin is performing maintenance on the system or programmers are working on a development or test system it can be rather annoying.

One example of where STONITH is undesired is when upgrading packages of software related to the cluster services. If during a package upgrade the data files and programs related to the OCF script are not synchronised (EG you have two programs that interact and upgrading one requires upgrading the other) at the moment that the status operation is run then an error may occur which may trigger a STONITH. Another possibility is that if using small systems for testing or development (EG running a cluster under Xen with minimal RAM assigned to each node) then a package upgrade may cause the system to thrash which might then cause a timeout of the status scripts (a problem I encounter when upgrading my Xen test instances that have 64M of RAM).

If a STONITH occurs during the process of a package upgrade then you are likely to have consistency problems with the OS due to RPM and DPKG not correctly calling fsync(), this can cause the OCF scripts to always fail to run the status command which can cause an infinite loop of the cluster nodes in question being STONITHed. Incidentally the best way to test for this (given the problems of a STONITH sometimes losing log data) is to boot the node in question without Heartbeat running and then run the OCF status commands manually (I previously documented three ways of doing this).

Of course the ideal (and recommended) way of solving this problem is to migrate all services from a node using the crm_resource program. But in a test or development situation you may forget to migrate all services or simply forget to run the migration before the package upgrade starts. In that case the best thing to do is to be able to remove the ability to call STONITH . For my testing I use Xen and have the nodes ssh to the Dom0 to call STONITH, so all I have to do to remove the STONITH ability is to stop the ssh daemon on the Dom0. For a more serious test network (EG using IPMI or an equivalent technology to perform a hardware STONITH as well as ssh for OS level STONITH on a private network) a viable option might be to shut down the switch port used for such operations – shutting down switch ports is not a nice thing to do, but to allow you to continue work on a development environment without hassle it’s a reasonable hack.

When choosing your method of STONITH it’s probably worth considering what the possibilities are for temporarily disabling it – preferably without having to walk to the server room.

2

ARP

In the IP protocol stack the lowest level protocol is ARP (the Address Resolution Protocol). ARP is used to request the Ethernet hardware (MAC) address of the host which owns a particular IP address.

# arping 192.168.0.43
ARPING 192.168.0.43
60 bytes from 00:60:b0:3c:62:6b (192.168.0.43): index=0 time=339.031 usec
60 bytes from 00:60:b0:3c:62:6b (192.168.0.43): index=1 time=12.967 msec
60 bytes from 00:60:b0:3c:62:6b (192.168.0.43): index=2 time=168.800 usec
— 192.168.0.43 statistics —
3 packets transmitted, 3 packets received, 0% unanswered

One creative use of this is the program arping which will send regular ARP request packets for an IP address and give statistics on the success of getting responses. The above is the result of an arping command which shows that the machine in question can respond in 12.9msec or less. One of the features of arping (when compared to the regular ping which uses an ICMP echo) is that it will operate when the interface has no IP address assigned or when the IP address does not match the netmask for the network in question.

This means that if you have a network which lacks DHCP and you want to find a spare IP address in the range that is used then you can use arping without assigning yourself an IP address first. If you wanted to use ping in that situation then you would have to first assign an IP address in which case you may have already broken the network!

Another useful utility is arpwatch. This program listens to ARP traffic and will notify the sys-admin when new machines appear. The notification message will include the Ethernet hardware address and the name of the manufacturer of the device (if it’s known). When you use arpwatch you can say “who added the device with the Intel Ethernet card to the network at lunch time?” instead of “who did something recently to the network that made it break?”. The more specific question is more likely to get an accurate answer.

3

IT Recruiting Agencies – Advice for Contract Workers

I read an interesting post on Advogato about IT recruiting agencies (along with an interesting preface about medical treatment for broken ribs).

Their report closely mirrored my experience in many ways. Here are what I consider to be the main points for a job applicant dealing with recruiters:

  1. Ask more than you believe that you are worth – the worst they can do is say “no” (and you will feel like a fool if the agency pays you less than half what the client pays because you didn’t ask for enough).
  2. Put lots of terms in your CV that will work for grep or other searches. A human who reads your CV will know that if you describe 3 years of Linux sys-admin experience that you can do BASH shell scripting and sys-admin work on other versions of Unix. But if a search doesn’t match it then the typical recruiting agent won’t offer you the position. I have idly considered saying things like “Perl (not Pearl) experience” to catch mis-spelled grep operations.
  3. Recruiting agents will frequently demand that you re-write your CV to match a position that they have open, they will say things such as “you claim 3 years of shell scripting and Perl experience but I don’t see that on your CV” and insist that you re-write it to give more emphasis to that area.
  4. Most recruiting agents are compulsive liars and don’t understand computers, you have to deal with the fact that to get most of the better paying positions you need to have an incompetent liar represent you. Avoid the stupid liars though. For example I once refused to deal with an agent who told me about his plans for stealing the CV database from the agency he worked for and selling it to another agency – not because he was shifty in every possible way, but because he was so stupid as to boast about such things immediately after meeting me on a train.
  5. Expect that recruiting agents won’t understand the technology. If you politely and subtly offer to assist them in writing a letter to a client recommending you then they will often accept. Why would they go to the effort of assessing your skills and writing a short letter to the client describing how good you are when you can do that for them? On one particularly amusing occasion I was applying for a position with IBM and the recruiting agent had been supplied with a short quiz of technical skills to assess all applicants – they gave me the answer sheet and asked me to self-assess (I got 100% – but it was an easy test and I would have got the same result anyway).
  6. Some levels of stupidity are so great that you should avoid dealing with the agent (and possibly the agency that employs them). Being unable to view a HTML file is one criteria I have used since 1999 (every OS since about 1998 came with a web browser built in). Another example is an agent who tried to convince me that “.au” is not a valid suffix for an email address (I was applying for a sys-admin job with an ISP). Job adverts that mis-spell terms (such as Perl spelled at Pearl) are also a warning sign.
  7. Gossip is important to your business! Some agencies will pay you what you earn and merely terminate your contract when things go wrong. Other agencies will refuse to pay you when things go bad, or even demand that paid money be returned and threaten legal action. Talk to other contract workers in your region and learn the goss about the bad agencies. Also track agency name changes, when a bad agency changes name don’t be fooled.

When applying for a position advertised by an agency you will ideally start by seeing an advert with a phone number and an email address. The best strategy in that case seems to be to send your CV with a brief cover letter and then about 5 minutes after your mail server sends the message to their mail server you phone them. I found that I got a significantly higher success rate (in terms of having the agent send my CV to the client) if I phoned them when my CV arrived.

Sometimes a fax number is advertised, unless there is some problem that prevents sending a document via email (such as the agency having a broken mail server) then do not FAX them. A faxed document will have to be faxed on to the client and will look bad after the double-fax operation and will prevent the agent from grepping it. Rumor has it that agents will often post fake adverts for the purpose of collecting CVs (so that they can boast to clients

In most situations a recruiting agent should insist on meeting you for an interview before sending your CV to a client. The only exception is if you are applying for a job in another country. Meeting an agent at a restaurant or other public place is not uncommon (often they want to meet you while travelling between other locations and sometimes their main office is not in a good location). I suspect that some agencies start with a “virtual office” and perform all their interviews in public places (this doesn’t mean that they will do a worse job than the more established agencies). If an agent is prepared to recommend you to a client without meeting you then they are not doing their job properly. It used to be that there were enough agencies pretending to do their job that you could ignore the agencies that will recommend any unseen candidate. But now an increasing number of agencies do this and if you want a contract you may have to deal with them.

When an agency has a fancy office keep in mind that they paid for it by taking money from people like you! For contract work a recruiting agent is not your friend, they make their money by getting you to accept less money than the client pays them – the less they pay you the more money they make. A common claim is “we only take a fixed percentage of what the client pays”, but when you ask what that percentage is they refuse to answer – I guess that the fixed percentage is 50% or as close to it as they can manage.

8

Ethernet Bonding and a Xen Bridge

After getting Ethernet Bonding working (see my previous post) I tried to get it going with a bridge for Xen.

I used the following in /etc/network/interfaces to configure the bond0 device and to make the Xen bridge device xenbr0 use the bond device:

iface bond0 inet manual
pre-up modprobe bond0
pre-up ifconfig bond0 up
hwaddress ether 00:02:55:E1:36:32
slaves eth0 eth1

auto xenbr0
iface xenbr0 inet static
pre-up ifup bond0
address 10.0.0.199
netmask 255.255.255.0
gateway 10.0.0.1
bridge_ports bond0

But things didn’t work well. A plain bond device worked correctly in all my tests, but when I had a bridge running over it I had problems every time I tried pulling cables. My test for a bond is to boot the machine with a cable in eth0, then when it’s running switch the cable to eth1. This means there is a few seconds of no connectivity and then the other port becomes connected. In an ideal situation at least one port would work at all times – but redundancy features such as bonding are not for an ideal situation! When doing the cable switching test I found that the bond device would often get into a state where it every two seconds (the configured ARP ping time for the bond) it would change it’s mind about the link status and have the link down half the time (according to the logs – according to ping results it was down all the time). This made the network unusable.

Now I have deided that Xen is more important than bonding so I’ll deploy the machine without bonding.

One thing I am considering for next time I try this is to use bridging instead of bonding. The bridge layer will handle multiple Ethernet devices, and if they are both connected to the same switch then the Spanning Tree Protocol (STP) is designed to work in this way and should handle it. So instead of having a bond of eth0 and eth1 and running a bridge over that I would just bridge eth0, eth1, and the Xen interfaces.

15

Ethernet Bonding on Debian Etch

I have previously blogged about Ethernet bonding on Red Hat Enterprise Linux. Now I have a need to do the same thing on Debian Etch – to have multiple Ethernet links for redundancy so that if one breaks the system keeps working.

The first thing to do on Debian is to install the package ifenslave-2.6 which provides the utility to manage the bond device. Then create the file /etc/modprobe.d/aliases-bond with the following contents for a network that has 10.0.0.1 as either a reliable host or important router. Note that this will use ARP to ping the router every 2000ms, you could use a lower value for a faster failover or a higher value
alias bond0 bonding
options bond0 mode=1 arp_interval=2000 arp_ip_target=10.0.0.1

If you want to monitor link status then you can use the following options line instead, however I couldn’t test this because the MII link monitoring doesn’t seem to work correctly on my hardware (there are many Ethernet devices that don’t work well in this regard):
options bond0 mode=0 miimon=100

Then edit the file /etc/network/interfaces and inset something like the following (as a replacement for the configuration of eth0 that you might currently be using). Note that XX:XX:XX:XX:XX:XX must be replaced by the hardware address of one of the interfaces that are being bonded or by a locally administered address (see this Wikipedia page for details). If you don’t specify the Ethernet address then it will default to the address of the first interface that is enslaved. This might not sound like a problem, however if the machine boots and a hardware failure is experienced which makes the primary Ethernet device not visible to the OS (IE the PCI card is dead but not killing the machine) then the hardware address of the bond would change, this might cause problems with other parts of your network infrastructure.
auto bond0
iface bond0 inet static
pre-up modprobe bond0
hwaddress ether XX:XX:XX:XX:XX:XX
address 10.0.0.199
netmask 255.255.255.0
gateway 10.0.0.1
up ifenslave bond0 eth0 eth1
down ifenslave -d bond0 eth0 eth1

There is some special support for bonding in the Debian ifup and ifdown utilities. The following will give the same result as the above in /etc/network/interfaces:
auto bond0
iface bond0 inet static
pre-up modprobe bond0
hwaddress ether 00:02:55:E1:36:32
address 10.0.0.199
netmask 255.255.255.0
gateway 10.0.0.1
slaves eth0 eth1

The special file /proc/net/bonding/bond0 can be used to view the current configuration of the bond0 device.

In theory it should be possible to use bonding on a workstation with DHCP, but in my brief attempts I have not got it working – any comments from people who have this working would be appreciated. The first pre-requisite of doing so is to use either MII monitoring or broadcast (mode 3), I experimented with using options bond0 mode=3 in /etc/modprobe.d/aliases-bond but found that it took too long to get the bond working and dhclient timed out.

Thanks for the howtoforge.com article and the linuxhorizon.ro article that helped me discover some aspects of this.

Update: Thanks to Guus Sliepen on the debian-devel mailing list for giving an example of the slaves directive as part of an example of bridging and bonding in response to this question.

35

Porn For Children

James Purser writes about the current plans for Internet filtering in Australia and concentrates on the technical issues (whether it will degrade the ISP service) and the issue of who’s moral standards should be enforced for the entire country.

But the fact is that children have never had any problem accessing porn. When I was in grade 4 at primary school (~9yo) a group of boys decided to walk to the local shopping centre at lunch-time and I joined them. At the shopping centre the other boys read Playboy (that was before such magazines were required to be displayed in sealed plastic bays). I didn’t read Playboy because there were some electronics magazines that were more interesting. When in grade 6 (~11yo) a friend told me about his parents video collection which featured fellatio and sodomy. I don’t recall whether he offered to show me the videos but being a good friend I’m sure he would have done so if I had asked. In the early years of high school some boys ran a black-market for second-hand porn magazines (ick), they also sold new magazines that were significantly more expensive. When in year 12 digital porn was just becoming popular and the exchange of porn on floppy disk began.

I’m sure that now children use USB sticks to exchange porn that they get from the Internet or other sources.

When I was in year 10 a female dancing instructor ceased working for the school after an up-skirt picture of her was stuck on a notice-board (I guess that her resignation was related to the picture but can’t be sure).

The evidence that I witnessed while at school is that 15yo boys are prepared to photograph unwilling women and exchange the pictures, and that the exchange and sale of all manner of porn is not uncommon at school (including primary school). I don’t think that the schools I attended were in any way unusual in this regard.

When I was at school cameras were large. Unless you had a polaroid camera (which was even larger) the film had to be developed – and the staff at the photo company were potential witnesses. I expect that these factors significantly decreased the amount of such activity.

Now a significant portion of children have a mobile phone and it seems that a built-in camera is a standard feature in all new phones now. Digital cameras (which have much better quality than phone-cameras) are becoming quite cheap. It’s widely regarded that giving a teenager a mobile phone is good for their safety (and it certainly makes it easier to discover where children claim to be) and it’s also widely regarded that a digital camera is a good toy (babies as young as 2 are often given the old camera when their parents get a new one). We should expect that the number of children who have digital cameras to rapidly approach 100% of children who desire them.

Given these factors it seems to me that it would be a good idea to allow teenage boys access to better quality porn than they are unable to produce (with either willing or unwilling subjects). It has already been shown that increased access to porn reduces the incidence of rape. I expect that the same also applies to the issue of making porn, people who have good access to porn will be less inclined to make their own.

There is some nasty porn out there. If they were to try and prevent access to porn that is illegal under Australian law (IE pictures of children, animals, rape, etc) then I don’t think that anyone would object. But preventing access to soft porn such as Playboy (which is so tame that it’s hardly porn by modern standards) is a really bad idea if it will increase the risk of up-skirt photos and the production of child rape movies.

Let’s be sensible and accept the fact that children who want to see porn will see it and focus our attention on what type of porn will be seen by children and whether the “actors” are consenting adults.

PS I spent several years living in Amsterdam and working as a sys-admin for ISPs there.

10

Two Questions for All Serious Free Software Contributors

What do you think is the most important single-sentence of advice that you can offer to someone who wants to contribute to the free software community? I intentionally didn’t mention what area or type of advice or what “contribute” means, interpret it how you wish and give multiple answers for different interpretations if that seems appropriate to you.

If you had the opportunity to say one sentence to someone who knows about computers and free software (EG they have used both Linux and Windows and done a small amount of programming) to convince them that they should join the free software team, what would it be?

Writing an essay about your thoughts is fine (and I’m sure that many readers of my blog could easily write an interesting essay on each of those topics). But please preface it with what you consider to be the most important sentence.

Please either track-back to this blog post or post a comment with a URL of your post (comments are moderated but I usually approve them in less than 12 hours and often much faster – I approve all sensible non-spam comments). If you only offer two sentences (and decide not to write an essay) then the comment section can contain your entire answer.

Note that by Serious Free Software Contributors I am referring to people who feel that they are serious about it. If free software matters to you and you go out of your way to help the cause in the way that best suits your abilities then it means you.

I will write another post with a summary of what I consider to be the most interesting responses (including links to any blog posts with long answers).

PS This post is not what I consider to be a “meme”.

3

Blog Memes

A common pattern in blog communication is referred to as a Meme. Here is one example of the commonly used definition of the term as applied to blogs. One common factor that doesn’t seem to get directly mentioned much when people define the term (but which always seems to be mentioned in passing) is the idea of tagging people. So the definition of a meme as applied to blogs seems to be a silly question that you answer in a blog post and then request that some other bloggers (usually 5) answer as well.

At the end of this post I have included the dictionary definition of the term (here is the Wikipedia definition).

I believe that it is incorrect to call a question such as “which superhero do you most identify with” a meme. Instead I think that there is a Memeplex associated with such posts. One meme is that when someone “tags” you (requests that you answer a question) it should be considered an honour (someone in the blog-sphere likes you enough to ask you random questions in a public forum). Another meme is that such discussion is a good thing (although an increasing number of people in the more serious part of the blog-sphere oppose this). A final one that is apparent to me (I’m sure that there are others) is that so-called memes and lazyweb posts are the same thing (I believe this to be wrong).

I believe that lazyweb posts if written about interesting topics can contribute significantly to the community knowledge base. I also believe that chain lazyweb posts (here is a link to the only such post I’ve made so far) can also contribute if created in a sensible manner. Chain posts that don’t require any thought or input from the person re-posting them (EG “please post this message to all your friends so that they can know of the terrible war/famine/earthquake/whatever in some foreign part of the world”) are of course quite useless (you can make a post of links once a month if you want to spread the news about such things).

Now I agree that some amount of conversation among bloggers in a community that is personal and not directly related to the main topics of discussion is good for building the community.

From Jargon File (4.4.4, 14 Aug 2003) [jargon]:

meme
/meem/, n.

[coined by analogy with `gene’, by Richard Dawkins] An idea considered as a {replicator}, esp. with the connotation that memes parasitize people into propagating them much as viruses do. Used esp. in the phrase meme complex denoting a group of mutually supporting memes that form an organized belief system, such as a religion. This lexicon is an (epidemiological) vector of the `hacker subculture’ meme complex; each entry might be considered a meme. However, meme is often misused to mean meme complex. Use of the term connotes acceptance of the idea that in humans (and presumably other tool- and language-using sophonts) cultural evolution by selection of adaptive ideas has superseded biological evolution by selection of hereditary traits. Hackers find this idea congenial for tolerably obvious reasons.

PS This evening I had planned to go to a LUV meeting and see my friend Andy Fitzsimon (blog) give a talk about Inkscape (for which he is famous). I also had a day off work, so it was going to be a day of non-stop fun. But instead I got some sort of cold/flu, stayed in bed for much of the day, missed the meeting, and was late in my blog post. This sucks.

7

Translation

I’ve created a page about translating my blog. Currently it has the following text:

If you would like to translate any posts from my blog to a language other than English then please feel free to do so. I demand that any translations correctly cite me as the author of the original English version and give a permanent link to the original post, but I don’t expect that this will cause any inconvenience.
I also request that anyone who translates one of my posts gives me permission to do whatever I wish with the translated text (I want to mirror all translations of my work on my own site). I am unsure of what legal rights I have to demand this and have not yet considered whether I have a moral right to demand it. But I believe that it is the nice thing for a translator to do and hope that everyone who translates one of my posts will do so.
Also I may grant permission for translations of my posts to appear on sites with Google advertising or other commercial use. I won’t rule out the possibility of assigning monopoly rights on commercial use of the translations of my posts to specific individuals or organisations.

Does anyone have suggestions for improvements?

One of my multi-lingual friends suggested that I should be concerned about the risk of bad translations. But I desire to have people read my posts and I believe that this is a risk I just have to accept – I’m sure that there are enough multi-lingual people in the blog space to find such errors and help the translator correct them.

Also I have to consider the best way to mirror the translations. I could add them to the same permalink page (producing long pages with multiple translations of my best posts), I could create a new post (resulting in English-language Planet installations getting posts that most people can’t read), or I could use a separate blog installation for the translations.

Please comment if you have any suggestions. I’ll write another post in future with the solutions that I select and some analysis of the issues.

Update: Thanks for the nice post Victor. It was seeing my blog in the list of “Enlaces Interesantes” on Victor’s blog that inspired my post about translation.

3

Housing Prices

The Sydney Morning Herald has an article about pre-fabricated houses from Ikea and suggests that they could solve the housing price problems. The article states that in the UK the pre-fab houses would be more than 25% cheaper than regular houses in the UK.

Let’s assume for the sake of discussion that the introduction of Ikea pre-fabricated houses in the Australian market (which they currently don’t plan to do) will reduce house prices by 25%. This won’t solve the problem. According to the Australian Bureau of Statistics in 2006 the median income for males was $600-799 and for females was $250-399. A couple who are both at the high-end of that scale would gross $1200 per week, this would possibly allow them to pay $1600 per month towards a mortgage, if they were in the middle of that scale then they would gross $1000 per week which might allow paying $1000 per month towards a mortgage. With current interest rates that would make any mortgage greater than $200K unreasonably difficult for a couple at the high end of the median income range to repay and a mortgage greater than $120K would be unreasonably difficult for a couple in the middle of the range.

The ABS median house prices show that the cheapest places to buy houses in 2005 were Adelaide, Hobart, and Darwin with median prices of $208K, $209K, and $216K respectively. So a couple who are both at the high end of the median income range can only afford to buy a house in the three cheapest cities in Australia (which are also not densly populated). The two largest cities are Sydney and Melbourne which had median house prices of $363K and $300K respectively, they would not be affordable to a couple who are anywhere near the median income – not even with a 25% reduction in price!

I believe that people on a median income should be able to afford a median prices house – the majority of Australian families should be able to afford the majority of houses!

The above analysis only covers families wanting to purchase a house with two incomes. The “traditional” Australian idea of having the man earn the majority of all the money that is required while his wife looks after the children (which is a bad thing for other reasons) is obviously dead. A man who earns 50% more than the median income will have trouble paying for a house while supporting children if his wife doesn’t also work. It is generally accepted that anyone who doesn’t purchase a house before having children will never own a house. It seems strange that the major political parties talk about wanting to support families and to support “the Aussie battler”, but won’t do anything serious about house prices (which is the most significant issue for such people). Giving a first home owners grant of $7000 (which is less than 3% of the required money).

One possible way of alleviating this problem would be to remove Negative Gearing (or at least modifying it to encourage construction rather than buying existing properties). Then the price of properties that are rented out would reflect the rent value instead of being significantly over-priced.

Another possibility would be to make public transport more efficient and with a wider scope. The desirability of a location is to a large extent determined by how much time/money/effort is required to get to the centre of the nearest city for work. Making the mass public transport support larger numbers of people (by larger and more frequent trains, trams, and buses), have shorter journeys (by a more frequent service to reduce waiting and express and connecting services where possible), and more routes (by building new train or tram lines every time a new major road is built) would significantly increase the desirability of properties that are further from the centre of cities. That would decrease the market pressure on prices of properties that are a currently 30 minutes or one hour travel from the centre. It would also be a significant benefit if people who currently spend 30 minutes commuting could spend only 20.

If public transport was improved and negative gearing was abolished then I expect that there would be increased demand for new houses that are further away from city centres, and that pre-fabricated houses would make a significant difference in the price. But while the majority of the value of a house is contained in the land that it rests on I can’t see that making a difference.