If you want a reliable network then you need to determine an appropriate level of redundancy. When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.
A common example is a large mail server. You might have MX servers to receive […]
A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code ) RAM is only a server feature.
The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks […]
One situation that you will occasionally encounter when running a Heartbeat cluster is a need to prevent a STONITH of a node. As documented in my previous post about testing STONITH the ability to STONITH nodes is very important in an operating cluster. However when the sys-admin is performing maintenance on the system or programmers […]
The command crm_resource allows you to do basic editing of resources in the Heartbeat configuration database. But sometimes you need to do different things and the tool xmlstarlet is a good option.
The below script can be used for testing Heartbeat OCF resource scripts. It uses the Heartbeat management program cibadmin to get the XML […]
One problem that I have had in configuring Heartbeat clusters is in performing a STONITH that originates outside the Heartbeat system.
STONITH was designed for the Heartbeat system to know when a node is not operating correctly (this can either be determined by the node itself or by other nodes in the network) and then […]
Xen (a system for running multiple virtual Linux machines) and has some obvious benefits for testing Heartbeat (the clustering system) – the cheapest new machine that is on sale in Australia can be used to simulate a four node cluster. I’m not sure whether there is any production use for a cluster running under Xen […]
In my last post about Heartbeat I gave an example of a script to start and stop a cluster service. In that post I omitted to mention that the script goes in the directory /usr/lib/ocf/resource.d/heartbeat.
To actually use the script you need to write some XML configuration to tell Heartbeat which parameters should be passed […]
A service script for Heartbeat needs to support at least three operations, start, stop, and status. The operations will return 0 on success, 7 on failure (which in the case of the monitor script means that the service is not running) and any other value to indicate that something has gone wrong.
In the second […]
In a Heartbeat cluster installation it may not be possible to have one STONITH device be used to reboot all nodes. To support this it is possible to have multiple STONITH devices configured that will each be used to reboot different nodes in the cluster. In the following code section there is an example of […]
Below is a sample script to configure the ssh STONITH agent for the Heartbeat system. STONITH will reboot nodes when things go wrong to restore the integrity of the cluster.
The STONITH test program supports the -n option to list parameters and the -l option to list nodes. The following is an example of using […]