etbe - Russell Coker

2 node vs 3+ node clusters

A comment on my post about the failure probability of clusters suggested that a six node cluster that has one node fail should become a five node cluster.

The problem with this is what to do when nodes recover from a failure. For example if a six node cluster had a node fail and became a five node cluster, then became a three node cluster after another two nodes had failed, then you would have half the cluster that was disconnected. If the three nodes that appeared to have failed became active again but unable to see the other three nodes then you would have a split-brain situation.

As noted in the comment the special case of a two node cluster does have different failure situations. If the connection between nodes goes down and the router can still be pinged then you can have a split brain situation. To avoid this you will generally have a direct connection between the two nodes (either a null-modem cable or a crossover Ethernet cable), such cables are more reliable than networking which involves a switch or hub. Also the network interface which involves the router in question will ideally also be used as a method of maintaining cluster status – it seems unlikely that two nodes will both be able to ping the router but be unable to send data to each other.

For best reliability you need to use multiple network interfaces between cluster nodes. One way of doing this is to have a pair of Ethernet ports bonded for providing the service (connected to two switches and pinging a router to determine which switch is best to use). The Heartbeat software supports encrypted data so it should be safe to run it on the same interface as used for providing the service (of course if you provide a service to the public Internet then you want a firewall to prevent machines on the net from trying to attack it).

Heartbeat also supports using multiple interfaces for maintaining the cluster data, so you can have one network dedicated to cluster operations and the network that is used for providing the service can be a backup network for cluster data. The pingd service allows Heartbeat to place services on nodes that have good connectivity to the net. So you could have multiple nodes that each have one Ethernet port for providing the service and one port as a backup for Heartbeat operations, if pingd indicates that the service port was not functioning correctly then the services would be moved to other nodes.

If you want to avoid having private Heartbeat data going over the service interface then in the two-node case you need a minimum of two Ethernet ports for Heartbeat and one port for providing the service if you use pingd. If you don’t use pingd then you need two bonded ports for providing the service and two ports (either bonded or independently configured in Hertbeat) for Heartbeat giving a total of four ports.

When there are more than two nodes in the cluster the criteria for cluster membership is that a majority of nodes are connected. This makes split-brain impossible and reduces the need to have reliable Ethernet interfaces. A cluster with three or more nodes could have a single service port and a single private port for Heartbeat, or if you trust the service interface you could do it all on one Ethernet port.

In summary, three nodes is better than two, but requires more hardware. Five nodes is better than three, but as I wrote in my previous post four nodes is not much good. I recommend against any even number of nodes other than two for the same reason.

failure probability and clusters

When running a high-availability cluster of two nodes it will generally be configured such that if one node fails then the other runs. Some common operation (such as accessing a shared storage device or pinging a router) will be used by the surviving node to determine that the other node is dead and that it’s not merely a networking problem. Therefore if you lose one node then the system keeps operating until you lose another.

When you run a three-node cluster the general configuration is that a majority of nodes is required. So if the cluster is partitioned then one node on it’s own will shut down all services while two nodes that can talk to each other will continue operating as normal. This means that to lose the cluster you need to lose all inter-node communication or have two nodes fail.

If the probability of a node surviving for the time interval required to repair a node that’s already died is N (where N is a number between 0 and 1 – 1 means 100% chance of success and 0 means it is certain to fail) then for a two node cluster the probability of the second node surviving long enough for a dead node to be fixed is N. For a three node cluster the probability that both the surviving two nodes will survive is N^2. This is significantly less, therefore a three node cluster is more likely to experience a critical second failure than a two node cluster.

For a four node cluster you need three active nodes to have quorum. Therefore the probability that a second node won’t fail is N^3 – even worse again!

For a five node cluster you can lose two nodes without losing the cluster. If you have already lost a node the probability that you won’t lose another two is N^4+(1-N)*N^3*4. As long as N is greater than 0.8 the probability of keeping three nodes out of four is greater than the probability of a single node not failing.

To see the probabilities of four and five node clusters experiencing a catastrophic failure after one node has died run the following shell script for different values of N (0.9 and 0.99 are reasonable values to try). You might hope that the probability of a second node remaining online while the first node is being repaired is significantly higher than 0.9, however when you consider that the first node’s failure might have been partially caused by the ambient temperature, power supply problems, vibration, or other factors that affect multiple nodes I don’t think it’s impossible for the probability to be as low as 0.9.

echo $N^4+\(1-$N\)*$N^3*4|bc -l ; echo $N^3 | bc -l

So it seems that if reliability is your aim in having a cluster then your options are two nodes (if you can be certain of avoiding split-brain) or five nodes. Six nodes is not a good option as the probability of losing three nodes out of six is greater than the probability of losing three nodes out of five. Seven and nine node clusters would also be reasonable options.

But it’s not surprising that a google search for “five node” cluster high-availability gives about 1/10 the number of results as a search for “four node” cluster high-availability. Most people in the computer industry like powers of two more than they like maths.

Debian/Etch release party in Melbourne – Australia

We are having a release party on Saturday the 14th of April. We meet at mid-day under the clocks at Flinders Street Station and then go somewhere convenient and not too expensive for lunch.

All welcome.

Update:

The event was moderately successful. There were only six people including me – that was quite a bit smaller than the Debian 10th birthday party we had in Melbourne, but it was still enough to have fun.

Everyone there had a good knowledge of Linux and Debian and many interesting things were discussed. We had lunch at a Japanese stone-grill restaurant – their specialty is serving raw ingredients along with a stone that’s at 400C (or so they claim – I would expect a 400C stone to radiate more heat than I experienced on my previous visit). As it was a warm day we skipped the stone grill and ordered from the lunch menu (which was also a lot cheaper). Some of the guys had never tried Sake or Plum Wine before, they seemed to like it. Strangely the waitress always wanted to deliver alcohol to a 15yo in preference to almost anyone else.

One of the topics of discussion was Linux meetings and the ability to attend them. A point was made that if you are <18yo and rely on your parents’ permission to do things then a meeting that finishes at 9PM isn’t a viable option. It has previously been noted that for people from regional areas an evening meeting is also inconvenient.

Maybe we should have occasional LUG meetings on a Saturday afternoon to cater for the needs of such people?

Spooks and GConf

Jeff Waugh wrote an amusing post about SE Linux and GConf support. It’s good to see SE Linux being promoted to the GNOME community.

presentations about SE Linux

I have just read the Presentation Zen blog post about PowerPoint.

One of the interesting suggestions was that it’s not effective to present the same information twice, so you don’t have notes covering what you say. Having a diagram that gives the same information is effective though because it gives a different way of analyzing the data. I looked at a couple of sets of slides that I have written and noticed that the ratio of text slides to diagram slides was 6:1 and 3:1 in favor of text, and that wasn’t counting the first and last slides that have the title of the talk and a set of URLs respectively.

So it seems that I need more and better diagrams. I’ll include most of the diagrams I use in my current SE Linux talks in this post with some ideas on how to improve them. I would appreciate any suggestions that may be offered (either through blog comments or email).

The above diagram shows how the SE Linux identity limits the roles that may be selected, and how the role limits the domains that may be entered. Therefore the identity controls what the user may do and in this example the identity “root” means that the user has little access to the machine (a Play Machine configuration). I think that the above is reasonably effective and have been using it for a few years. I have considered a more complex diagram with the “staff_r” role included as well and possibly including the way that “newrole” can be used to change between roles. So I could have the above as slide #1 about identities and roles with a more detailed diagram following to replace a page of text about role transition.

The above diagram shows the domain transitions used in a typical system boot and login process. It includes the names of the types and a summary of the relevant policy rules used to implement the transitions. I also have another diagram that I have used which is the same but without the file types and policy. In the past I have never used both in the one talk – just used one of the two and had text to describe the information content of the other. To make greater use of diagrams I could start with the simple diagram and then have the following slide have all the detail.

The above diagram simply displays the MCS security model with ellipses representing processes and rectangles representing files.

The above diagram shows a simplified version of the MMCS policy. With MMCS each process has a range with the low level representing the minimum category set of files to which it is permitted to write and the high level representing the files that it may read and write. So to write to a file with the “HR” category the process must have a low level that’s no higher than “HR” and a high level that is equal or greater than “HR“. The full set of combinations of two categories with low and high levels means 10 different levels of access for processes which makes for a complex diagram. I need something other than plain text for this but the above diagram is overly complex and a full set is even more so. Maybe a table with process contexts on one axis, file contexts on another and access granted being one of “R“, “RW” or nothing?

I also have a MLS diagram in the same manner, but I now think it’s too awful to put on my blog. Any suggestions on how to effectively design a diagram for MLS? For those of you who don’t know how MLS works the basic concept is that every process has an “Effective Clearance” (AKA low level) which determines what it can write, it can’t write to anything below that because it might have read data from a file at it’s own level and it can’t read from a level higher than it’s own level. MLS also uses a high level for ranged processes and filesystem objects (but that’s when it gets really complex).

This last one is what I consider my most effective diagram. It shows the benefits of SE Linux in confining daemons in a clear and effective manner. Any suggestions for improvement (apart from fixing the varying text size which is due to a bug in Dia) would be appreciated.

The above diagrams are all on my SE Linux talks page, along with the Dia files that were used to create them. They may be used freely for non-commercial purposes.

If anyone has some SE Linux diagrams that they would like to share then please let me know, either through a blog comment, email, or a blog post syndicated on Planet SE Linux.

Xen and SE Linux – EWeek review of RHEL5

The online magazine EWeek has done a review of RHEL5. It’s quite a positive review which can be summarised as “good support for Xen as service (not an appliance), better value than previous versions with the licenses for multiple guests included, and SE Linux briefly got in the way but the Troubleshooting tool fixed it quickly and easily”.

The problem they had is that the SE Linux policy expects Xen images to be in /var/lib/xen/images, but the Xen configuration tools apparently didn’t adequately encourage them to use that directory. They stored the images somewhere else and SE Linux stopped it from working. The Troubleshooting tool did something that they didn’t describe and then it all worked.

Generally a very positive review of RHEL5 and a moderately positive review of SE Linux in RHEL5.

PS You might have to turn off JavaScript to view the link, the page has broken JavaScript code that takes an unreasonable amount of CPU time.

what is a BOF?

BOF stands for Birds Of a Feather, it’s an informal session run at a conference usually without any formal approval by the people who run the conference.

Often conferences have a white-board, wiki, or other place where conference delegates can leave notes for any reason. It is used for many purposes including arranging BOFs. To arrange a BOF you will usually write the title for the BOF and the name of the convenor (usually yourself if it’s your idea) and leave a space for interested people to sign their names. Even though there is usually no formal involvement of the conference organizers they will generally reserve some time for BOFs. Depending on the expected interest they will usually offer one or two slots of either 45 minutes or one hour. They will also often assist in allocating BOFs to rooms. But none of this is needed. All that you need to do is find a notice-board, state your intention to have a BOF at a time when not much else is happening and play it by ear!

My observation is that about half the ideas for BOFs actually happen, the rest don’t get enough interest. This is OK, one of the reasons for a BOF is to have a discussion about an area of technology that has an unknown level of interest. If no-one is interested then you offer the same thing the next year. If only a few people are interested then you discuss it over dinner. But sometimes you get 30+ people, you never know what to expect as many people don’t sign up – or have their first choice canceled and attend the next on the list!

To run a BOF you firstly need some level of expert knowledge in the field. I believe that the best plan is for a BOF to be a panel discussion where you have a significant portion of the people in the audience (between 5 and 15 people) speaking their opinions on the topic and the convener moderating the discussion. If things work in an ideal manner then the convener will merely be one member of the panel. However it’s generally expected that the person running the BOF can give an improvised lecture on the topic in case things don’t happen in an ideal manner. It’s also expected that the convener will have an agenda for a discussion drawn up so that if the panel method occurs they can ask a series of questions for members of the BOF to answer. My experience is that 8 simple questions will cover most of an hour.

One requirement for convening a BOF is that you be confident in speaking to an audience of unknown size, knowledge, and temperament. Although I haven’t seen it done it would be possible to have two people acting as joint conveners of a BOF. One person with the confidence to handle the audience and manage the agenda and another with the technical skills needed to speak authoritatively on the topic.

Some of the BOFs I have attended have had casual discussions, some have had heated arguments, and some ended up as lectures with the convener just talking about the topic. Each of these outcomes can work in terms of entertaining and educating the delegates.

But don’t feel afraid, one of the advantages of a BOF is that it’s a very casual affair, not only because of the nature of the event but also because it usually happens at the end of a long conference day. People will want to relax not have a high-intensity lecture. One problem that you can have when giving a formal lecture to an audience is nervous problems such as hyper-ventilating. This has happened to me before and it was really difficult to recover while continuing the lecture. If that happens during a BOF then you can just throw a question to the audience such as “could everyone in the room please give their opinion on X“, that will give you time for your nerves to recover while also allowing the audience to get to know each other a bit – it’s probably best to have at least one such question on your agenda in case it’s needed.

Note that the above is my personal opinion based on my own experience. I’m sure that lots of other people will disagree with me and write blog posts saying so. ;)

The facts which I expect no-one to dispute are:

BOFs are informal
Anyone can run one
You need an agenda
You need some level of expert knowledge of the topic

A Strange Interpretation of the US Constitution About Copyright

In a blog on infoworld the following strange statement appeared:

The US Constitution is clear that the reason for copyright/patent/etc. is to benefit creators of property, not users of property. I appreciate the reason: give creators a reasonable return on their investment.

Actually the US constitution seems to clearly say the opposite. Here is a link to section 8 of the US constitution. The important phrase is “To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries“.

There is no mention of providing benefits to creators of written works and inventions. The aim is clearly stated to promote the progress of science and useful arts and the exclusive right which operates for limited times is merely a method of achieving that aim.

BSD vs GPL licences

James Dumay writes about Theo’s latest flame-war.

One interesting part of the debate was Theo’s response to this comment:
> We can dual license our code though and that is an
> acceptable license for Linux, the kernel.

Theo:
We? Sure, you can. But Reyk will not dual license his code, and most of the other people in the BSD community won’t either, because then they receive the occasional patch from a GPL-believer which is ONLY under the GPL license, and then they are no less screwed than they would be from the code granted totally freely to companies.

The difference of course is that when you give code to companies under the BSD license you will never know what is done to it, but GPL-only patches can still be used as inspiration for new code development. Sure GPL-only code can’t be copied into BSD-only code, but once you know where the bugs are they are easy to fix.

Towards the end of the debate Theo asks the following question:
David, if you found a piece of your code in some other tree, under a different license, would your first point of engatement be a public or private mail?

I can’t speak for David, but after reading the discussion I would probably start by blogging about such an issue.

New Debian release and new DPL

Ingo Juergensmann has blogged in detail about the new release and the new DPL.

Sam Hocevar ran for DPL on a platform based on some significant new changes. It will be interesting to see what happens over the next year.

The release of Etch is an exciting milestone in Debian development. Among other things it has SE Linux working!

I’m going to try and arrange a party in Melbourne, Australia to celebrate. We also have a mailing list for Debian people in Melbourne, to subscribe send a message to debian-melb-request@taz.net.au with the subject subscribe. I’ll use that list for arranging the party, send me private email if you are not subscribed but want to attend.

etbe – Russell Coker

Archives

Categories

2 node vs 3+ node clusters

failure probability and clusters

Debian/Etch release party in Melbourne – Australia

Spooks and GConf

presentations about SE Linux

Xen and SE Linux – EWeek review of RHEL5

what is a BOF?

A Strange Interpretation of the US Constitution About Copyright

BSD vs GPL licences

New Debian release and new DPL

Archives

Email and RSS

Archives

Categories

Tags

Archives

Email and RSS