Some Notes on DRBD

DRBD is a system for replicating a block device across multiple systems. It’s most commonly used for having one system write to the DRBD block device such that all writes are written to a local disk and a remote disk. In the default configuration a write is not complete until it’s committed to disk locally […]

Hetzner Failover Konfiguration

The Wiki documenting how to configure IP failover for Hetzner servers [1] is closely tied to the Linux HA project [2]. This is OK if you want a Heartbeat cluster, but if you want manual failover or an automatic failover from some other form of script then it’s not useful. So I’ll provide the simplest […]

Why Clusters Usually Don’t Work

It’s widely regarded that to solve reliability problems you can just install a cluster. It’s quite obvious that if instead of having one system of a particular type you have multiple systems of that type and a cluster configured such that broken systems aren’t used then reliability will increase. Also in the case of routine […]

A Basic IPVS Configuration

I have just configured IPVS on a Xen server for load balancing between multiple virtual hosts. The benefit is not load balancing but management. With two virtual machines providing a service I can gracefully shut one down for maintenance and have the other take the load. When there are two machines providing a service a […]

Kernel Security vs Uptime

For best system security you want to apply kernel security patches ASAP. For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to gain […]

ISP Redundancy and Virtualisation

If you want a reliable network then you need to determine an appropriate level of redundancy. When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.

A common example is a large mail server. You might have MX servers to receive […]

ECC RAM is more useful than RAID

A common myth in the computer industry seems to be that ECC (Error Correcting Code – a Hamming Code [0]) RAM is only a server feature.

The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks […]

Controlling a STONITH and Upgrading a Cluster

One situation that you will occasionally encounter when running a Heartbeat cluster is a need to prevent a STONITH of a node. As documented in my previous post about testing STONITH the ability to STONITH nodes is very important in an operating cluster. However when the sys-admin is performing maintenance on the system or programmers […]

Starting a Heartbeat Resource Without Heartbeat

The command crm_resource allows you to do basic editing of resources in the Heartbeat configuration database. But sometimes you need to do different things and the tool xmlstarlet is a good option.

The below script can be used for testing Heartbeat OCF resource scripts. It uses the Heartbeat management program cibadmin to get the XML […]


One problem that I have had in configuring Heartbeat clusters is in performing a STONITH that originates outside the Heartbeat system.

STONITH was designed for the Heartbeat system to know when a node is not operating correctly (this can either be determined by the node itself or by other nodes in the network) and then […]