12

Kernel Security vs Uptime

For best system security you want to apply kernel security patches ASAP. For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to gain root access. So even if a machine is not used for providing public shell access or any other task which involves giving user access to potential hostile people, having the kernel be secure is an important part of system security.

One thing that gets little consideration is the overall effect of applying security updates on overall uptime. Over the last year there have been 14 security related updates (I count a silent data loss along with security issues) to the main Debian Etch kernel package. Of those 14, it seems that if you don’t use DCCP, NAT for CIFS or SNMP, IA64, the dialout group, then you will only need to patch for issues 2, 3 (for SMP machines), 4, 5, 7 (sound drivers get loaded on all machines by default), 9, 10, 11, 12, 13, and 14.

This means 11 reboots a year for SMP machines and 10 a year for uni-processor machines. If a reboot takes three minutes (which is an optimistic assumption) then that would be 30 or 33 minutes of downtime a year due to kernel upgrades. In terms of uptime we talk about the number of “nines”, where the ideal is generally regarded as “five nines” or 99.999% uptime. 33 minutes of downtime a year for kernel upgrades means that you get 99.993% uptime (which is “four nines”). If a reboot takes six minutes (which is not uncommon for servers) then it’s 99.987% uptime (“thee nines”).

While it doesn’t seem likely to affect the number of “nines” you get, not using SMP has the potential to avoid future security issues. So it seems that when using a Xen (or other virtualisation technology) assigning only one CPU to the DomUs that don’t need any more could improve uptime for them.

For Xen Dom0’s which don’t have local users or daemons, don’t use DCCP, NAT for CIFS or SNMP, wireless, CIFS, JFFS2, PPPoE, bluetooth, H.323 or SCTP connection tracking, then only issue 11 applies. However for “five nines” you need to have 5 minutes of downtime a year or less. It seems unlikely that a busy Xen server can be rebooted in 5 minutes as all the DomUs need to have their memory saved to disk (writing out the data to disk and reading it back in after a reboot will probably take at least a couple of minutes) or they need to be shutdown and booted again after the Dom0 is rebooted (which is a good procedure if the security fix affects both Dom0 and DomU use), and such shutdowns and reboots of DomU’s will take a lot of time.

Based on the past year, it seems that a system running as a basic server might get “four nines” if configured for a fast boot (it’s surprising that no-one seems to be talking about recent improvements to the speed of booting as high-availability features) and if the boot is slower then you are looking at “three nines”. For a Xen server unless you have some sort of cluster it seems that “five nines” is unattainable due to reboot times if there is one issue a year, but “four nines” should be easy to get.

Now while the 14 issues over the last year for the kernel seems likely to be a pattern that will continue, the one issue which affects Xen may not be representative (small numbers are not statistically significant). I feel confident in predicting a need for between 5 and 20 kernel updates next year due to kernel security issues, but I would not be prepared to bet on whether the number of issues affecting Xen will be 0, 1, or 4 (it seems unlikely that there would be 5 or more).

I will write a future post about some strategies for mitigating these issues.

Here is my summary of the Debian kernel linux-image-2.6.18-6-686 (Etch kernel) security updates according to it’s changelog, they are not in chronological order, it’s the order of the changelog file:
Continue reading

2

ISP Redundancy and Virtualisation

If you want a reliable network then you need to determine an appropriate level of redundancy. When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.

A common example is a large mail server. You might have MX servers to receive mail from the Internet, front-end servers to send mail to the Internet, database or LDAP servers (of which there is one server for accepting writes and redundant slave servers for allowing clients to read data), and some back-end storage. The back-end storage is generally going to lack redundancy to some degree (all the common options involve mail being stored in one location). So the redundancy would start with the routers which direct traffic to redundant servers (typically a pair of routers in a failover configuration – I would use OpenBSD boxes running CARP if I was given a choice in how to implement this [1], in the past I’ve used Cisco devices).

The next obvious place for redundancy is for the MX servers (it seems that most ISPs have machines with names such as mx01.example.net to receive mail from the Internet). The way that MX records are used in the DNS means that there is no need for a router to direct traffic to a pair of servers, and even a pair of redundant routers is another point of failure so it’s best to avoid them where possible. A smaller ISP might have two MX machines that are used for both sending outbound mail from their users (which needs to go through a load-balancing router) as well as inbound mail. A larger ISP will have two or more machines dedicated to receiving mail and two or more machines dedicated to sending mail (when you scan for viruses on both sent and received mail it can take a lot of compute power).

Now the database or LDAP servers used for storing user account data is another possible place for redundancy. While some database and LDAP servers support multi-master operation a more common configuration is to have a single master and multiple slaves which are read-only. This means that you want to have more slaves than are really required so that you can lose one without impacting the service.

There are several ways of losing a server. The most obvious is a hardware failure. While server class machines will have redundant PSUs, RAID, ECC RAM, and a general high quality of hardware design and manufacture, they still have hardware problems from time to time. Then there are a variety of software related ways of losing a server, most of which stem from operator error and bugs in software. Of course the problem with the operator errors and software bugs is that they can easily take out all redundant machines. If an operator mistakenly decides that a certain command needs to be run on all machines they will often run it on all machines before realising that it causes things to go horribly wrong. A software bug will usually be triggered by the same thing on all machines (EG I’ve had bad data written to a master LDAP server cause all slaves to crash and had a mail loop between two big ISPs take out all front-end mail servers).

Now if you have a mail server running on a virtual platform such that the MX servers, the mail store, and the database servers all run on the same hardware then redundancy is very unlikely to alleviate hardware problems. It’s difficult to imagine a situation where a hardware failure takes out one DomU while leaving others running.

It seems to me that if you are running on a single virtual server there is no benefit in having redundancy. However there is benefit in having an infrastructure which supports redundancy. For example if you are going to install new software on one of the servers there is a possibility that the software will fail. Doing upgrades and then having to roll them back is one of the least pleasant parts of sys-admin work, not only is it difficult but it’s also unreliable (new software writes different data to shared files and you have to hope that the old version can cope with them).

To implement this you need to have a Dom0 that can direct traffic to multiple redundant servers for services which only have a single server. Then when you need to upgrade (be it the application or the OS) you can configure a server on the designated secondary address, get it running, and then disable traffic to the primary server. If there are any problems you can direct traffic back to the primary server (which can be done much more quickly than downgrading software). Also if configured correctly you could have the secondary server be accessible from certain IP addresses only. So you could test the new version of the software using employees as test users while customers use the old version.

One advantage a virtual machine environment for load balancing is that you can have as many virtual Ethernet devices as you desire and you can configure them using software (without changing cables in the server room). A limitation on the use of load-balancing routers is that traffic needs to go through the router in both directions. This is easy for the path from the Internet to the server room and the path from the server room to the customer network. But when going between servers in the server room it’s a problem (which is not insurmountable, merely painful and expensive). Of course there will be a cost in CPU time for all the extra routing. If instead of having a single virtual ethernet device for all redundant nodes you have a virtual ethernet device for every type of server and use the Dom0 as a router you will end up doubling the CPU requirements for networking without even considering the potential overhead of the load balancing router functionality.

Finally there is a significant benefit in virtual machines for reliability of services. That is the ability to perform snapshot backups. If you have sufficient disk space and IO capacity you could have a snapshot of your server taken every day and store several old snapshots. Of course doing this effectively would require some minor changes to the configuration of machines to avoid unnecessary writes, this would include not compressing old log files and using a ram disk for /tmp and any other filesystem with transient data. When you have snapshots you can then run filesystem analysis tools on the snapshots to detect any silent corruption that may be occurring and give the potential benefit of discovering corruption before it gets severe (but I have yet to see a confirmed report of this saving anyone). Of course similar snapshot facilities are available on almost every SAN and on many NAS devices, but there are many sites that don’t have the budget to use such equipment.

7

ECC RAM is more useful than RAID

A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code [0]) RAM is only a server feature.

The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks for one person. Therefore when purchasing a desktop machine you can decide how much you are willing to spend for the safety and continuity of your work. For a server it’s more difficult as everyone has a different idea of how reliable a server should be in terms of uptime and in terms of data security. When running a server for a business there is the additional issue of customer confidence. If a server goes down occasionally customers start wondering what else might be wrong and considering whether they should trust their credit card details to the online ordering system.

So it is obviously apparent that servers need a different degree of reliability – and it’s easy to justify spending the money.

Desktop machines also need reliability, more so than most people expect. In a business when a desktop machine crashes it wastes employee time. If a crash wastes an hour (which is not unlikely given that previously saved work may need to be re-checked) then it can easily cost the business $100 (the value of the other work that the employee might have done). Two such crashes per week could cost the business as much as $8000 per year. The price difference between a typical desktop machine and a low-end workstation (or deskside server) is considerably less than that (when I investigated the prices almost a year ago desktop machines with server features ranged in price from $800 to $2400 [1]).

Some machines in a home environment need significant reliability. For example when students are completing high-school their assignments have a lot of time invested in them. Losing an assignment due to a computer problem shortly before it’s due in could impact their ability to get a place in the university course that they most desire! Then there is also data which is irreplaceable, one example I heard of was of a woman who’s computer had a factory pre-load of Windows, during a storm the machine rebooted and reinstalled itself to the factory defaults – wiping several years of baby photos… In both cases better backups would mostly solve the problem.

For business use the common scenario is to have file servers storing all data and have very little data stored on the PC (ideally have no data on the PC). In this case a disk error would not lose any data (unless the swap space was corrupted and something important was paged out when the disk failed). For home use the backup requirements are quite small. If a student is working on an important assignment then they can back it up to removable media whenever they reach a milestone. Probably the best protection against disk errors destroying assignments would be a bulk purchase of USB flash storage sticks.

Disk errors are usually easy to detect. Most errors are in the form of data which can not be read back, when that happens the OS will give an error message to the user explaining what happened. Then if you have good backups you revert to them and hope that you didn’t lose too much work in the mean-time (you also hope that your backups are actually readable – but that’s another issue). The less common errors are lost-writes – where the OS writes data to disk but the disk doesn’t store it. This is a little more difficult to discover as the drive will return bad data (maybe an old version of the file data or maybe data from a different file) and claim it to be good.

The general idea nowadays is that a filesystem should check the consistency of the data it returns. Two new filesystems, ZFS from Sun [2] and BTRFS from Oracle [3] implement checksums of data stored on disk. ZFS is apparently production ready while BTRFS is apparently not nearly ready. I expect that from now on whenever anyone designs a filesystem for anything but the smallest machines (EG PDAs and phones) they will include data integrity mechanisms in the design.

I believe that once such features become commonly used the need for RAID on low-end systems will dramatically decrease. A combination of good backups and knowing when your live data is corrupted will often be a good substitute for preserving the integrity of the live data. Not that RAID will necessarily protect your data – with most RAID configurations if a hard disk returns bad data and claims it to be good (the case of lost writes) then the system will not read data from other disks for checksum validation and the bad data will be accepted.

It’s easy to compute checksums of important files and verify them later. One simple way of doing so is to compress the files, every file compression program that I’ve seen has some degree of error detection.

Now the real problem with RAM which lacks ECC is that it can lose data without the user knowing. There is no possibility of software checks because any software which checks for data integrity could itself be mislead by memory errors. I once had a machine which experienced filesystem corruption on occasion, eventually I discovered that it had a memory error (memtest86+ reported a problem). I will never know whether some data was corrupted on disk because of this. Sifting through a large amount of stored data for some files which may have been corrupted due to memory errors is almost impossible. Especially when there was a period of weeks of unreliable operation of the machine in question.

Checking the integrity of file data by using the verify option of a file compression utility, fsck on a filesystem that stores checksums on data, or any of the other methods is not difficult.

I have a lot of important data on machines that don’t have ECC. One reason is that machines which have ECC cost more and have other trade-offs (more expensive parts, more noise, more electricity use, and the small supply makes it difficult to get good deals). Another is that there appear to be no laptops which support ECC (I use a laptop for most of my work). On the other hand RAID is very cheap and simple to implement, just buy a second hard disk and install software RAID – I think that all modern OSs support RAID as a standard installation option. So in spite of the fact that RAID does less good than a combination of ECC RAM and good backups (which are necessary even if you have RAID), it’s going to remain more popular in high-end desktop systems for a long time.

The next development that seems interesting is the large portion of the PC market which is designed not to have the space for more than one hard disk. Such compact machines (known as Small Form Factor or SFF) could easily be designed to support ECC RAM. Hopefully the PC companies will add reliability features in one area while removing them in another.

Controlling a STONITH and Upgrading a Cluster

One situation that you will occasionally encounter when running a Heartbeat cluster is a need to prevent a STONITH of a node. As documented in my previous post about testing STONITH the ability to STONITH nodes is very important in an operating cluster. However when the sys-admin is performing maintenance on the system or programmers are working on a development or test system it can be rather annoying.

One example of where STONITH is undesired is when upgrading packages of software related to the cluster services. If during a package upgrade the data files and programs related to the OCF script are not synchronised (EG you have two programs that interact and upgrading one requires upgrading the other) at the moment that the status operation is run then an error may occur which may trigger a STONITH. Another possibility is that if using small systems for testing or development (EG running a cluster under Xen with minimal RAM assigned to each node) then a package upgrade may cause the system to thrash which might then cause a timeout of the status scripts (a problem I encounter when upgrading my Xen test instances that have 64M of RAM).

If a STONITH occurs during the process of a package upgrade then you are likely to have consistency problems with the OS due to RPM and DPKG not correctly calling fsync(), this can cause the OCF scripts to always fail to run the status command which can cause an infinite loop of the cluster nodes in question being STONITHed. Incidentally the best way to test for this (given the problems of a STONITH sometimes losing log data) is to boot the node in question without Heartbeat running and then run the OCF status commands manually (I previously documented three ways of doing this).

Of course the ideal (and recommended) way of solving this problem is to migrate all services from a node using the crm_resource program. But in a test or development situation you may forget to migrate all services or simply forget to run the migration before the package upgrade starts. In that case the best thing to do is to be able to remove the ability to call STONITH . For my testing I use Xen and have the nodes ssh to the Dom0 to call STONITH, so all I have to do to remove the STONITH ability is to stop the ssh daemon on the Dom0. For a more serious test network (EG using IPMI or an equivalent technology to perform a hardware STONITH as well as ssh for OS level STONITH on a private network) a viable option might be to shut down the switch port used for such operations – shutting down switch ports is not a nice thing to do, but to allow you to continue work on a development environment without hassle it’s a reasonable hack.

When choosing your method of STONITH it’s probably worth considering what the possibilities are for temporarily disabling it – preferably without having to walk to the server room.

Starting a Heartbeat Resource Without Heartbeat

The command crm_resource allows you to do basic editing of resources in the Heartbeat configuration database. But sometimes you need to do different things and the tool xmlstarlet is a good option.

The below script can be used for testing Heartbeat OCF resource scripts. It uses the Heartbeat management program cibadmin to get the XML configuration data and then uses xmlstarlet to process it. The sel option for xmlstarlet selects some data from an XML file, the -t -m options instruct it to match data from a template. The template is the /resources/primitive part. The --value-of expression will print the values of some labels from the XML. The script will concatenate the name and value tags and export them as environment variables (see my post about Configuring a Heartbeat Service for an explanation of the use of the variables). The TYPE variable is the name of the script under the /usr/lib/ocf/resource.d/heartbeat directory.

In recent versions of Heartbeat (2.1.x) the OCF_ROOT environment variable must be set before an OCF script is called. Setting it on older versions of Heartbeat doesn’t do any harm so I unconditionally set it in this script (which should work for all 2.x.x versions of Heartbeat).

The first parameter for the script is the id of the service to be operated on and the second parameter is the operation to perform (start, stop, and status are the only interesting values). The script will echo the exit code to the screen (0 means success, 7 means that the service is not running or the operation failed, and any other number means a serious error that will trigger a STONITH if Heartbeat gets it).

#!/bin/sh
$(cibadmin -Q -o resources| xmlstarlet sel -t -m \
"/resources/primitive [@id='$1']/instance_attributes/attributes/nvpair" \
--value-of "concat('export OCF_RESKEY_',@name,'=',@value,'
','TYPE=',../../../@type,'
')")
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2
echo $?

Below is another version of the same script that instead uses crm_resource to get the XML data. The output of crm_resource has a couple of lines of non-XML data at the start (removed by the grep) and also only gives the XML tree related to the primitive in question (so the /resources part is removed from the xmlstarlet command-line).

#!/bin/sh
$(crm_resource -r $1 -x | grep -v ^[a-z] | xmlstarlet sel -t -m \
  "/primitive/instance_attributes/attributes/nvpair" --value-of \
  "concat('export OCF_RESKEY_',@name,'=',@value,'
','TYPE=',../../../@type,'
')")
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2
echo $?

The problem with both of those scripts is that they rely on Heartbeat being operational. Performing any operations other than a status check while Heartbeat is running is a risky thing to do. If Heartbeat starts a service at the same time as you start it via such a script then the results will probably be undesired. One situation where it is safe to run this is when a service fails to start. After it has failed repeatedly Heartbeat may stop trying to restart it (depending on the configuration) in which case it will be safe to try and start it. Also you can put in temporary constraints to stop the resource from running by repeatedly running crm_resource -M -r ID until all nodes have been prohibited from running it (make sure you run crm_resource -U -r ID afterwards to remove the temporary constraints).

The following script does the same thing but directly reads the XML file for the Heartbeat configuration. This is designed to be used when Heartbeat is not running. For example you could copy the XML file from a running cluster to a test machine and then test your OCF resource scripts.

#!/bin/sh
$(cat /var/lib/heartbeat/crm/cib.xml| xmlstarlet sel -t -m \
  "/cib/configuration/resources/primitive [@id='$1']/instance_attributes/attributes/nvpair" \
  --value-of \
  "concat('export OCF_RESKEY_',@name,'=',@value,'
','TYPE=',../../../@type,'
')")
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2
echo $?

Testing STONITH

One problem that I have had in configuring Heartbeat clusters is in performing a STONITH that originates outside the Heartbeat system.

STONITH was designed for the Heartbeat system to know when a node is not operating correctly (this can either be determined by the node itself or by other nodes in the network) and then force a hardware reset so that the non-functional node will not interfere with another node that is designated to take over the service.

However sometimes code that is called by Heartbeat will have more information about the state of the system than Heartbeat can access. For example if I have a service that accesses a filesystem on an external RAID then it’s common for the RAID to track who is accessing it. In some situations the RAID hardware has the ability to “fence” the access (so that when machine B mounts the filesystem machine A can no longer access it). In other situations the RAID may only be capable of informing the system that another machine is registered as the owner of the device. To solve this problem a machine that is to mount such a device must either prohibit the previous owner from accessing the device (which may be impossible or unreasonably difficult) or reset the previous owner.

Until recently I had been doing this by writing some code to extract the STONITH configuration from the CIB and call the stonith utility. The problem with this is that there is no requirement that every node be capable of performing a STONITH on every other node, and that even if every node is are designed to be capable of rebooting every other node a partial failure condition may restrict the set of nodes that are capable of performing a STONITH on the target.

Currently the recommended way of doing this is via the test program. Below is an example of the command used to reset the node node-1 with a timeout of 20000ms and the result of it being successfully completed. I have suggested that the Heartbeat developers make an official interface for doing this (rather than a test of the API) and I believe that this is being considered. In the mean time the following is the only way of doing it:

# /usr/lib/heartbeat/stonithdtest/apitest 1 node-1 20000 0
optype=1, node_name=node-1, result=0, node_list=node-0

2

Xen and Heartbeat

Xen (a system for running multiple virtual Linux machines) and has some obvious benefits for testing Heartbeat (the clustering system) – the cheapest new machine that is on sale in Australia can be used to simulate a four node cluster. I’m not sure whether there is any production use for a cluster running under Xen (I look forward to seeing some comments or other blog posts about this).

Most cluster operations run on a Xen virtual machine in the same way as they would under physically separate machines, and Xen even supports simulating a SAN or fiber-channel shared storage device if you use the syntax phy:/dev/VG/LV,hdd,w! in the Xen disk configuration line (the exclamation mark means that the volume is writable even if someone else is writing to it).

The one missing feature is the ability to STONITH a failed node. This is quite critical as the design of Heartbeat is that a service on a node which is not communicating will not be started on another node until the failed node comes up after a reboot or the STONITH sub-system states that it has rebooted it or turned it off. This means that the failure of a node implies the permanent failure of all services on it until/unless the node can be STONITH’d.

To solve this problem I have written a quick Xen STONITH module. The first issue is how to communicate between the DomU’s (Xen virtual machines) and the Dom0 (the physical host). It seemed that the best way to do this is to ssh to special accounts on the Dom0 and then use sudo to run a script that calls the Xen xm utility to actually restart the node. That way the Xen virtual machine gets limited access to the Dom0, and the shell script could even be written to allow each VM to only manage a sub-set of the VMs on the host (so you could have multiple virtual clusters on the one physical host and prevent them from messing with each other through accident or malice).

xen ALL=NOPASSWD:/usr/local/sbin/xen-stonith

Above is the relevant section from my /etc/sudoers file. It allows user xen to execute the script /usr/local/sbin/xen-stonith as root to do the work.

One thing to note is that from each of the DomU’s you must be able to ssh from root on the node to the specified account for the Xen STONITH service without using a password and without any unreasonable delay (IE put UseDNS no in /etc/ssh/sshd_config.

The below section (which isn’t in the feed) there are complete scripts for configuring this.

Continue reading

1

configuring a Heartbeat service

In my last post about Heartbeat I gave an example of a script to start and stop a cluster service. In that post I omitted to mention that the script goes in the directory /usr/lib/ocf/resource.d/heartbeat.

To actually use the script you need to write some XML configuration to tell Heartbeat which parameters should be passed to it via environment variables and which nodes may be candidates to run it.

In the below example the type of web means that the script /usr/lib/ocf/resource.d/heartbeat/web will be called to do the work. The id attributes are all arbitrary, but you want to decide on some sort of consistent naming scheme. I have decided to name web server instances web-X where X is the IP address used for providing the service.

The nvpair element contains a configuration option that will be passed to the script as an environment variable. The name of ip means that the environment variable will be named OCF_RESKEY_ip. Naming of such variables is arbitrary and a script may take many variables. A well written script (which incidentally does not mean my previous blog post) will have an option meta-data to give XML output describing all the variables that it accepts. An example of this can be seen by the command /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data.

In the XML the resources section (as specified by --obj_type resources on the cibadmin command-line) describes resources that the Heartbeat system will run. The constraints section specifies a set of rules that determine where they will run. If the symmetric-cluster attribute in the cluster_property_set is set to true then resources will be permitted to run anywhere, if it is set to false then a resource will not run anywhere unless there is a constraint specifying that it should do so – which means that there must be at least one constraint rule for every resource that is permitted to run.

In the below example I have constraint rules for the service giving node-0 and node-1 a priority of 9000 for running the service.

In a future post I will describe the cluster_property_set and how it affects calculations of where resources should run.
Continue reading

1

Heartbeat service scripts

A service script for Heartbeat needs to support at least three operations, start, stop, and status. The operations will return 0 on success, 7 on failure (which in the case of the monitor script means that the service is not running) and any other value to indicate that something has gone wrong.

In the second half of this post (not in the feed) I have included an example service script. It is a very brief script and does not support some of the optional parameters (monitor, validate-all, and meta-data). So this script is not of a quality that would be accepted for inclusion in a Heartbeat release but is adequate to demonstrate the concepts.

The XML configuration for a service can have an arbitrary set of name-value pairs, and they are passed to the script as environment variables. For example the below script expects that the XML configuration item named ip will have the IP address used by the service, my script receives this as a variable named OCF_RESKEY_ip. My script doesn’t use the address, it merely allows it to be inherited by the IPaddr2 script (which is part of the Heartbeat distribution) and that script assigns the address to an Ethernet interface.

The script is for testing Heartbeat, it mounts a filesystem and starts Apache (which is configured to serve web pages from the filesystem in question on the IP address supplied by the ip parameter).

For test purposes the script looks for a file named /root/fail, if this file exists then the status check will always abort. An aborting status script means that Heartbeat can not be certain that the node in question has released all resources that it was using for the service. This means that Heartbeat will have to kill the node via the STONITH service. Such test scripts are the only way to test that STONITH works, and I believe that it’s a necessary part of pre-production testing of a Heartbeat cluster.

Update: Made it display error messages in all cases and also reformatted it for better cut/paste.
Continue reading

1

Another Heartbeat 2.0 STONITH example configuration

In a Heartbeat cluster installation it may not be possible to have one STONITH device be used to reboot all nodes. To support this it is possible to have multiple STONITH devices configured that will each be used to reboot different nodes in the cluster. In the following code section there is an example of how to configure STONITH for two separate ssh instances. Of course this is not useful apart from as an example of how to configure STONITH. It would be quite easy to change one of those ssh configuration entries to use IPMI or some more effective method of managing machines. My previous post on this topic has an example of a simpler ssh STONITH configuration.

It is convenient that the ssh method for rebooting nodes is available both as a shared object (which is used by the following example XML) and as a shell script (type external/ssh). The shell script can be used to give the same functionality as the shared object (with reduced performance) but the benefit is as an example of how to write external plugins. For a client I have just written an IPMI module that works with machines that have two Ethernet ports. When a server has two Ethernet ports you want to send an IPMI reset command to both of them in case the failure which requires a STONITH was triggered by a broken Ethernet cable. Unfortunately I can’t release the IPMI code at this time

Continue reading