<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>etbe - Russell Coker &#187; Ha</title>
	<atom:link href="http://etbe.coker.com.au/category/ha/feed/" rel="self" type="application/rss+xml" />
	<link>http://etbe.coker.com.au</link>
	<description>Linux, politics, and other interesting things</description>
	<lastBuildDate>Thu, 18 Mar 2010 09:13:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A Basic IPVS Configuration</title>
		<link>http://etbe.coker.com.au/2008/08/07/basic-ipvs-configuration/</link>
		<comments>http://etbe.coker.com.au/2008/08/07/basic-ipvs-configuration/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 13:10:19 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/?p=691</guid>
		<description><![CDATA[I have just configured IPVS on a Xen server for load balancing between multiple virtual hosts.  The benefit is not load balancing but management.  With two virtual machines providing a service I can gracefully shut one down for maintenance and have the other take the load.  When there are two machines providing [...]]]></description>
			<content:encoded><![CDATA[<p>I have just configured IPVS on a Xen server for load balancing between multiple virtual hosts.  The benefit is not load balancing but management.  With two virtual machines providing a service I can gracefully shut one down for maintenance and have the other take the load.  When there are two machines providing a service a load balancing configuration is much better than a hot-spare, one reason is the fact that there may be application scaling issues that prevent one machine with twice the resources from giving as much performance as two smaller machines.  Another is the fact that if you have a machine configured but never used there will always be some doubt as to whether it would work&#8230;</p>
<p>The first thing to do is to assign the IP address of the service to the front-end machine so that other machines on the segment (IE routers) will be able to send data to it.  If the address for the service is 10.0.0.5 then the command &#8220;<b>ip addr add dev eth0 10.0.0.5/24 broadcast +</b>&#8221; will make it a secondary address on the <b>eth0</b> interface.  On a Debian system you would add the line &#8220;<b>up ip addr add dev eth0 10.0.0.5/24 broadcast + || true</b>&#8221; to the appropriate section of <b>/etc/network/interfaces</b>, for a Red Hat system it seems that <b>/etc/rc.local</b> is the best place for it.  I expect that it would be possible to merely advertise the IP address via ARP without adding it to the interface, but the ability to ping the IPVS server on the service address seems useful and there seems no benefit in not assigning the address.</p>
<p>There are three methods used by IPVS for forwarding packets, gatewaying/routing (the default), IPIP encapsulation (tunneling), and masquerading.  The gatewaying/routing method requires the back-end server to respond to requests on the service address.  That would mean assigning the address to the back-end server without advertising it via ARP (which seems likely to have some issues for managing the system).  The IPIP encapsulation method requires setting up IPIP which seemed like it would be excessively difficult (although maybe not more than required to set up masquerading).  The masquerading option (which I initially chose) rewrites the packets to have the IP address of the real server.  So for example if the service address is 10.0.0.5 and the back-end server has the address 10.0.1.5 then it will see packets addresses to 10.0.1.5.  A benefit of masquerading is that it allows you to use different ports, so for example you could have a non-virtualised mail server listening on port 25 and a back-end server for a virtual service listening on port 26.  While there is no practical limit to the number of private IP addresses that you might use it seems easier to manage servers listening on different ports with the same IP address &#8211; and there is the issue of server programs that are not written to support binding to an IP address.</p>
<p><b>ipvsadm -A -t 10.0.0.5:25 -s lblc -p<br />
ipvsadm -a -t 10.0.0.5:25 -r 10.0.1.5 -m</b></p>
<p>The above two commands create an IPVS configuration that listens on port 25 of IP address 10.0.0.5 and then masquerades connections to 10.0.1.5 on port 25 (the default is to use the same port).</p>
<p>Now the problem is in getting the packets to return via the IPVS server.  If the IPVS server happens to be your default gateway then it&#8217;s not a problem and it will already be working after the above two commands (if a service is listening on 10.0.1.5 port 25).</p>
<p>If the IPVS server is not the default gateway and you have only one IP address on the back-end server then this will require using netfilter to mark the packets and then route based on the packet matching.  Marking via netfilter also seems to be the only well documented way of doing similar things.  I spent some time working on this and didn&#8217;t get it working.  However having multiple IP addresses per server is a recommended practice anyway (a back-end interface for communication between servers as well as a front-end interface for public data).</p>
<p><b>ip rule add from 10.0.1.5 table 1<br />
ip route add default via 10.0.0.1 table 1</b></p>
<p>I use the above two commands to set up a new routing table for the data for the virtual service.  The first line causes any packets from <b>10.0.1.5</b> to be sent to routing table 1 (I currently have a rough plan to have table numbers match ethernet device numbers, the data in question is going out device eth1).  The second line adds a default router to table 1 which sends all packets to 10.0.0.1 (the private IP address of the IPVS server).</p>
<p>Then it SHOULD all be working, but in the network that I&#8217;m using (RHEL4 DomU and RHEL5 Dom0 and IPVS) it doesn&#8217;t.  For some reason the data packets from the DomU are not seen as part of the same TCP stream (both in Net Filter connection tracking and by the TCP code in the kernel).  So I get an established connection (3 way handshake completed) but no data transfer.  The server sends the SMTP greeting repeatedly but nothing is received.  At this stage I&#8217;m not sure whether there is something missing in my configuration or whether there&#8217;s a bug in IPVS.  I would be happy to send tcpdump output to anyone who wants to try and figure it out.</p>
<p>My next attempt at this was via routing.  I removed the &#8220;<b>-m</b>&#8221; option from the <b>ipvsadm</b> command and added the service IP address to the back-end with the command &#8220;<b>ifconfig lo:0 10.0.0.5 netmask 255.255.255.255</b>&#8221; and configured the mail server to bind to port 25 on address 10.0.0.5.  Success at last!</p>
<p>Now I just have to get Piranha working to remove back-end servers from the list when they fail.</p>
<p>Update:  It&#8217;s quite important that when adding a single IP address to device <b>lo:0</b> you use a netmask of <b>255.255.255.255</b>.  If you use the same netmask as the front-end device (which would seem like a reasonable thing to do) then (with RHEL4 kernels at least) you get proxy ARPs by default.  For example you used netmask 255.255.255.0 to add address 10.0.0.5 to device lo:0 then on device eth0 the machine will start answering ARP requests for 10.0.0.6 etc.  Havoc then ensues.</p>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2008/08/07/basic-ipvs-configuration/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Kernel Security vs Uptime</title>
		<link>http://etbe.coker.com.au/2008/06/27/kernel-security-vs-uptime/</link>
		<comments>http://etbe.coker.com.au/2008/06/27/kernel-security-vs-uptime/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 21:00:08 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Best Posts]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/?p=622</guid>
		<description><![CDATA[For best system security you want to apply kernel security patches ASAP.  For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to [...]]]></description>
			<content:encoded><![CDATA[<p>For best system security you want to apply kernel security patches ASAP.  For an attacker gaining root access to a machine is often a two step process, the first step is to exploit a weakness in a non-root daemon or take over a user account, the second step is to compromise the kernel to gain root access.  So even if a machine is not used for providing public shell access or any other task which involves giving user access to potential hostile people, having the kernel be secure is an important part of system security.</p>
<p>One thing that gets little consideration is the overall effect of applying security updates on overall uptime.  Over the last year there have been 14 security related updates (I count a silent data loss along with security issues) to the main Debian Etch kernel package.  Of those 14, it seems that if you don&#8217;t use DCCP, NAT for CIFS or SNMP, IA64, the dialout group,  then you will only need to patch for issues 2, 3 (for SMP machines), 4, 5, 7 (sound drivers get loaded on all machines by default), 9, 10, 11, 12, 13, and 14.</p>
<p>This means 11 reboots a year for SMP machines and 10 a year for uni-processor machines.  If a reboot takes three minutes (which is an optimistic assumption) then that would be 30 or 33 minutes of downtime a year due to kernel upgrades.  In terms of uptime we talk about the number of &#8220;nines&#8221;, where the ideal is generally regarded as &#8220;five nines&#8221; or 99.999% uptime.  33 minutes of downtime a year for kernel upgrades means that you get 99.993% uptime (which is &#8220;four nines&#8221;).  If a reboot takes six minutes (which is not uncommon for servers) then it&#8217;s 99.987% uptime (&#8220;thee nines&#8221;).</p>
<p>While it doesn&#8217;t seem likely to affect the number of &#8220;nines&#8221; you get, not using SMP has the potential to avoid future security issues.  So it seems that when using a Xen (or other virtualisation technology) assigning only one CPU to the DomUs that don&#8217;t need any more could improve uptime for them.</p>
<p>For Xen Dom0&#8217;s which don&#8217;t have local users or daemons, don&#8217;t use DCCP, NAT for CIFS or SNMP, wireless, CIFS, JFFS2, PPPoE, bluetooth, H.323 or SCTP connection tracking, then only issue 11 applies.  However for &#8220;five nines&#8221; you need to have 5 minutes of downtime a year or less.  It seems unlikely that a busy Xen server can be rebooted in 5 minutes as all the DomUs need to have their memory saved to disk (writing out the data to disk and reading it back in after a reboot will probably take at least a couple of minutes) or they need to be shutdown and booted again after the Dom0 is rebooted (which is a good procedure if the security fix affects both Dom0 and DomU use), and such shutdowns and reboots of DomU&#8217;s will take a lot of time.</p>
<p>Based on the past year, it seems that a system running as a basic server might get &#8220;four nines&#8221; if configured for a fast boot (it&#8217;s surprising that no-one seems to be talking about recent improvements to the speed of booting as high-availability features) and if the boot is slower then you are looking at &#8220;three nines&#8221;.  For a Xen server unless you have some sort of cluster it seems that &#8220;five nines&#8221; is unattainable due to reboot times if there is one issue a year, but &#8220;four nines&#8221; should be easy to get.</p>
<p>Now while the 14 issues over the last year for the kernel seems likely to be a pattern that will continue, the one issue which affects Xen may not be representative (small numbers are not statistically significant).  I feel confident in predicting a need for between 5 and 20 kernel updates next year due to kernel security issues, but I would not be prepared to bet on whether the number of issues affecting Xen will be 0, 1, or 4 (it seems unlikely that there would be 5 or more).</p>
<p>I will write a future post about some strategies for mitigating these issues.</p>
<p>Here is my summary of the Debian kernel linux-image-2.6.18-6-686 (Etch kernel) security updates according to it&#8217;s changelog, they are not in chronological order, it&#8217;s the order of the changelog file:<br />
<span id="more-622"></span></p>
<ol>
<li>05 Jun 2008: CVE-2008-2358 for DCCP and CVE-2008-1673 for ASN.1 (NAT for CIFS and SNMP).</li>
<li>23 May 2008: CVE-2008-2136 memory leak in IPv6 over IPv4 tunnels, CVE-2007-6712 timer related bugs, CVE-2008-1615 ptrace on AMD64 architecture, and CVE-2008-2137 &#8220;Validate address ranges regardless of MAP_FIXED&#8221;.</li>
<li>07 May 2008: CVE-2008-1669 SMP race</li>
<li>11 Apr 2008: CVE-2007-6694 PPC only, CVE-2008-0007 Add VM_DONTEXPAND to vm_flags in drivers that register a fault handler but do not bounds check the offset argument, CVE-2008-1294 prevent user escape from RLIMIT_CPU, and CVE-2008-1375 fix dnotify race.</li>
<li>10 Feb 2008: CVE-2008-0010 and CVE-2008-0600 Fix missing access check in vmsplice.</li>
<li>25 Jan 2008: Not a security issue, but silent data loss on IA64.</li>
<li>22 Jan 2008: CVE-2007-6151 ISDN memory overrun, CVE-2008-0001 something related to checking the access to a directory, CVE-2007-2878 FAT filesystem related, CVE-2007-4571 ALSA bug that allows user to read kernel memory.</li>
<li>17 Sep 2007: Fix minor DOS attack for slightly privileged users (EG members of dialout group).</li>
<li>18 Dec 2007: CVE-2007-6063 overflows in ISDN subsystem, CVE-2007-6206 core dumping over an existing file can get the wrong ownership (<a href="http://etbe.coker.com.au/2007/01/05/core-files/">should be possible to use kernel.core_pattern to work around this [1]</a>), CVE-2007-5966 timer issue, CVE-2006-6058 Minix fs DOS attack via corrupted fs, and CVE-2007-6417 tmpfs memory leak.</li>
<li>29 Nov 2007: CVE-2007-3104 local kernel DOS attack (Oops), CVE-2007-4997 malicious frame on wireless interface crashes system, CVE-2007-5500 potential system hang, and CVE-2007-5904 CIFS overflows from server sending corrupt data.</li>
<li>02 Oct 2007: CVE-2007-4573 Xen 64bit with 32bit DomU, CVE-2006-5755 Xen, CVE-2007-4133 memory management DOS, and CVE-2007-5093 DOS when unplugging a webcam that is in use.</li>
<li>25 Sep 2007: CVE-2007-3731 ptrace causing Oops, CVE-2007-3739 memory management Oops, CVE-2007-3740 CIFS not honoring umask, CVE-2007-4573 ptrace of 32bit process on AMD64 bug, and CVE-2007-4849 JFFS2 (flash media) filesystem bug.</li>
<li>27 Aug 2007: CVE-2007-2172 IPv4 memory related issue (local DOS or compromise?), CVE-2007-2875 local user can read kernel memory if cpuset filesystem is mounted, CVE-2007-3105 buffer overflow in random number generator, CVE-2007-3843 CIFS, and CVE-2007-4308 AAC RAID.</li>
<li>11 Aug 2007: CVE-2007-1353 bluetooth, CVE-2007-3513 usblcd, CVE-2007-2525 PPPoE, CVE-2007-3642 H.323 connection tracking, CVE-2007-2172 IPv4 local exploit, CVE-2007-2453 slightly less random numbers, CVE-2007-2876 SCTP connection tracking, CVE-2007-3851 i965 batch buffer usage, and CVE-2007-3848 potential privilege escalation.</li>
</ol>
<ul>
<li>[1] <a href="http://etbe.coker.com.au/2007/01/05/core-files/">http://etbe.coker.com.au/2007/01/05/core-files/</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2008/06/27/kernel-security-vs-uptime/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>ISP Redundancy and Virtualisation</title>
		<link>http://etbe.coker.com.au/2008/06/23/isp-redundancy-virtualisation/</link>
		<comments>http://etbe.coker.com.au/2008/06/23/isp-redundancy-virtualisation/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 12:25:01 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Virtualisation]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/?p=615</guid>
		<description><![CDATA[If you want a reliable network then you need to determine an appropriate level of redundancy.  When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.
A common example is a large mail server.  You might have MX servers to [...]]]></description>
			<content:encoded><![CDATA[<p>If you want a reliable network then you need to determine an appropriate level of redundancy.  When servers were small and there was no well accepted virtual machine technology there were always many points at which redundancy could be employed.</p>
<p>A common example is a large mail server.  You might have MX servers to receive mail from the Internet, front-end servers to send mail to the Internet, database or LDAP servers (of which there is one server for accepting writes and redundant slave servers for allowing clients to read data), and some back-end storage.  The back-end storage is generally going to lack redundancy to some degree (all the common options involve mail being stored in one location).  So the redundancy would start with the routers which direct traffic to redundant servers (typically a pair of routers in a failover configuration &#8211; <a href="http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol">I would use OpenBSD boxes running CARP if I was given a choice in how to implement this [1]</a>, in the past I&#8217;ve used Cisco devices).</p>
<p>The next obvious place for redundancy is for the MX servers (it seems that most ISPs have machines with names such as mx01.example.net to receive mail from the Internet).  The way that MX records are used in the DNS means that there is no need for a router to direct traffic to a pair of servers, and even a pair of redundant routers is another point of failure so it&#8217;s best to avoid them where possible.  A smaller ISP might have two MX machines that are used for both sending outbound mail from their users (which needs to go through a load-balancing router) as well as inbound mail.  A larger ISP will have two or more machines dedicated to receiving mail and two or more machines dedicated to sending mail (when you scan for viruses on both sent and received mail it can take a lot of compute power).</p>
<p>Now the database or LDAP servers used for storing user account data is another possible place for redundancy.  While some database and LDAP servers support multi-master operation a more common configuration is to have a single master and multiple slaves which are read-only.  This means that you want to have more slaves than are really required so that you can lose one without impacting the service.</p>
<p>There are several ways of losing a server.  The most obvious is a hardware failure.  While server class machines will have redundant PSUs, RAID, ECC RAM, and a general high quality of hardware design and manufacture, they still have hardware problems from time to time.  Then there are a variety of software related ways of losing a server, most of which stem from operator error and bugs in software.  Of course the problem with the operator errors and software bugs is that they can easily take out all redundant machines.  If an operator mistakenly decides that a certain command needs to be run on all machines they will often run it on all machines before realising that it causes things to go horribly wrong.  A software bug will usually be triggered by the same thing on all machines (EG I&#8217;ve had bad data written to a master LDAP server cause all slaves to crash and had a mail loop between two big ISPs take out all front-end mail servers).</p>
<p>Now if you have a mail server running on a virtual platform such that the MX servers, the mail store, and the database servers all run on the same hardware then redundancy is very unlikely to alleviate hardware problems.  It&#8217;s difficult to imagine a situation where a hardware failure takes out one DomU while leaving others running.</p>
<p>It seems to me that if you are running on a single virtual server there is no benefit in having redundancy.  However there is benefit in having an infrastructure which supports redundancy.  For example if you are going to install new software on one of the servers there is a possibility that the software will fail.  Doing upgrades and then having to roll them back is one of the least pleasant parts of sys-admin work, not only is it difficult but it&#8217;s also unreliable (new software writes different data to shared files and you have to hope that the old version can cope with them).</p>
<p>To implement this you need to have a Dom0 that can direct traffic to multiple redundant servers for services which only have a single server.  Then when you need to upgrade (be it the application or the OS) you can configure a server on the designated secondary address, get it running, and then disable traffic to the primary server.  If there are any problems you can direct traffic back to the primary server (which can be done much more quickly than downgrading software).  Also if configured correctly you could have the secondary server be accessible from certain IP addresses only.  So you could test the new version of the software using employees as test users while customers use the old version.</p>
<p>One advantage a virtual machine environment for load balancing is that you can have as many virtual Ethernet devices as you desire and you can configure them using software (without changing cables in the server room).  A limitation on the use of load-balancing routers is that traffic needs to go through the router in both directions.  This is easy for the path from the Internet to the server room and the path from the server room to the customer network.  But when going between servers in the server room it&#8217;s a problem (which is not insurmountable, merely painful and expensive).  Of course there will be a cost in CPU time for all the extra routing.  If instead of having a single virtual ethernet device for all redundant nodes you have a virtual ethernet device for every type of server and use the Dom0 as a router you will end up doubling the CPU requirements for networking without even considering the potential overhead of the load balancing router functionality.</p>
<p>Finally there is a significant benefit in virtual machines for reliability of services.  That is the ability to perform snapshot backups.  If you have sufficient disk space and IO capacity you could have a snapshot of your server taken every day and store several old snapshots.  Of course doing this effectively would require some minor changes to the configuration of machines to avoid unnecessary writes, this would include not compressing old log files and using a ram disk for /tmp and any other filesystem with transient data.  When you have snapshots you can then run filesystem analysis tools on the snapshots to detect any silent corruption that may be occurring and give the potential benefit of discovering corruption before it gets severe (but I have yet to see a confirmed report of this saving anyone).  Of course similar snapshot facilities are available on almost every SAN and on many NAS devices, but there are many sites that don&#8217;t have the budget to use such equipment.</p>
<ul>
<li>[1] <a href="http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol">http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2008/06/23/isp-redundancy-virtualisation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>ECC RAM is more useful than RAID</title>
		<link>http://etbe.coker.com.au/2008/06/13/ecc-ram-vs-raid/</link>
		<comments>http://etbe.coker.com.au/2008/06/13/ecc-ram-vs-raid/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 14:37:34 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Best Posts]]></category>
		<category><![CDATA[Most Popular]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/?p=608</guid>
		<description><![CDATA[A common myth in the computer industry seems to be that ECC (Error Correcting Code &#8211;  a Hamming Code [0]) RAM is only a server feature.
The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks [...]]]></description>
			<content:encoded><![CDATA[<p>A common myth in the computer industry seems to be that ECC (Error Correcting Code &#8211; <a href="http://en.wikipedia.org/wiki/Hamming_code"> a Hamming Code [0]</a>) RAM is only a server feature.</p>
<p>The difference between a server and a desktop machine (in terms of utility) is that a server performs tasks for many people while a desktop machine only performs tasks for one person.  Therefore when purchasing a desktop machine you can decide how much you are willing to spend for the safety and continuity of your work.  For a server it&#8217;s more difficult as everyone has a different idea of how reliable a server should be in terms of uptime and in terms of data security.  When running a server for a business there is the additional issue of customer confidence.  If a server goes down occasionally customers start wondering what else might be wrong and considering whether they should trust their credit card details to the online ordering system.</p>
<p>So it is obviously apparent that servers need a different degree of reliability &#8211; and it&#8217;s easy to justify spending the money.</p>
<p>Desktop machines also need reliability, more so than most people expect.  In a business when a desktop machine crashes it wastes employee time.  If a crash wastes an hour (which is not unlikely given that previously saved work may need to be re-checked) then it can easily cost the business $100 (the value of the other work that the employee might have done).  Two such crashes per week could cost the business as much as $8000 per year.  The price difference between a typical desktop machine and a low-end workstation (or deskside server) is considerably less than that (<a href="http://etbe.coker.com.au/2007/08/01/ecc-ram-in-a-cheap-machine/">when I investigated the prices almost a year ago desktop machines with server features ranged in price from $800 to $2400 [1]</a>).</p>
<p>Some machines in a home environment need significant reliability.  For example when students are completing high-school their assignments have a lot of time invested in them.  Losing an assignment due to a computer problem shortly before it&#8217;s due in could impact their ability to get a place in the university course that they most desire!  Then there is also data which is irreplaceable, one example I heard of was of a woman who&#8217;s computer had a factory pre-load of Windows, during a storm the machine rebooted and reinstalled itself to the factory defaults &#8211; wiping several years of baby photos&#8230;  In both cases better backups would mostly solve the problem.</p>
<p>For business use the common scenario is to have file servers storing all data and have very little data stored on the PC (ideally have no data on the PC).  In this case a disk error would not lose any data (unless the swap space was corrupted and something important was paged out when the disk failed).  For home use the backup requirements are quite small.  If a student is working on an important assignment then they can back it up to removable media whenever they reach a milestone.  Probably the best protection against disk errors destroying assignments would be a bulk purchase of USB flash storage sticks.</p>
<p>Disk errors are usually easy to detect.  Most errors are in the form of data which can not be read back, when that happens the OS will give an error message to the user explaining what happened.  Then if you have good backups you revert to them and hope that you didn&#8217;t lose too much work in the mean-time (you also hope that your backups are actually readable &#8211; but that&#8217;s another issue).  The less common errors are lost-writes &#8211; where the OS writes data to disk but the disk doesn&#8217;t store it.  This is a little more difficult to discover as the drive will return bad data (maybe an old version of the file data or maybe data from a different file) and claim it to be good.</p>
<p>The general idea nowadays is that a filesystem should check the consistency of the data it returns.  Two new filesystems, <a href="http://en.wikipedia.org/wiki/ZFS">ZFS from Sun [2]</a> and <a href="http://en.wikipedia.org/wiki/Btrfs">BTRFS from Oracle [3]</a> implement checksums of data stored on disk.  ZFS is apparently production ready while BTRFS is apparently not nearly ready.  I expect that from now on whenever anyone designs a filesystem for anything but the smallest machines (EG PDAs and phones) they will include data integrity mechanisms in the design.</p>
<p>I believe that once such features become commonly used the need for RAID on low-end systems will dramatically decrease.  A combination of good backups and knowing when your live data is corrupted will often be a good substitute for preserving the integrity of the live data.  Not that RAID will necessarily protect your data &#8211; with most RAID configurations if a hard disk returns bad data and claims it to be good (the case of lost writes) then the system will not read data from other disks for checksum validation and the bad data will be accepted.</p>
<p>It&#8217;s easy to compute checksums of important files and verify them later.  One simple way of doing so is to compress the files, every file compression program that I&#8217;ve seen has some degree of error detection.</p>
<p>Now the real problem with RAM which lacks ECC is that it can lose data without the user knowing.  There is no possibility of software checks because any software which checks for data integrity could itself be mislead by memory errors.  I once had a machine which experienced filesystem corruption on occasion, eventually I discovered that it had a memory error (memtest86+ reported a problem).  I will never know whether some data was corrupted on disk because of this.  Sifting through a large amount of stored data for some files which may have been corrupted due to memory errors is almost impossible.  Especially when there was a period of weeks of unreliable operation of the machine in question.</p>
<p>Checking the integrity of file data by using the verify option of a file compression utility, fsck on a filesystem that stores checksums on data, or any of the other methods is not difficult.</p>
<p>I have a lot of important data on machines that don&#8217;t have ECC.  One reason is that machines which have ECC cost more and have other trade-offs (more expensive parts, more noise, more electricity use, and the small supply makes it difficult to get good deals).  Another is that there appear to be no laptops which support ECC (I use a laptop for most of my work).  On the other hand RAID is very cheap and simple to implement, just buy a second hard disk and install software RAID &#8211; I think that all modern OSs support RAID as a standard installation option.  So in spite of the fact that RAID does less good than a combination of ECC RAM and good backups (which are necessary even if you have RAID), it&#8217;s going to remain more popular in high-end desktop systems for a long time.</p>
<p>The next development that seems interesting is the large portion of the PC market which is designed not to have the space for more than one hard disk.  Such compact machines (known as Small Form Factor or SFF) could easily be designed to support ECC RAM.  Hopefully the PC companies will add reliability features in one area while removing them in another.</p>
<ul>
<li>[0] <a href="http://en.wikipedia.org/wiki/Hamming_code">http://en.wikipedia.org/wiki/Hamming_code</a></li>
<li>[1] <a href="http://etbe.coker.com.au/2007/08/01/ecc-ram-in-a-cheap-machine/">http://etbe.coker.com.au/2007/08/01/ecc-ram-in-a-cheap-machine/</a></li>
<li>[2] <a href="http://en.wikipedia.org/wiki/ZFS">http://en.wikipedia.org/wiki/ZFS</a></li>
<li>[3] <a href="http://en.wikipedia.org/wiki/Btrfs">http://en.wikipedia.org/wiki/Btrfs</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2008/06/13/ecc-ram-vs-raid/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Controlling a STONITH and Upgrading a Cluster</title>
		<link>http://etbe.coker.com.au/2007/08/20/controlling-a-stonith-and-upgrading-a-cluster/</link>
		<comments>http://etbe.coker.com.au/2007/08/20/controlling-a-stonith-and-upgrading-a-cluster/#comments</comments>
		<pubDate>Sun, 19 Aug 2007 21:00:45 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Best Posts]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/2007/08/20/controlling-a-stonith-and-upgrading-a-cluster/</guid>
		<description><![CDATA[One situation that you will occasionally encounter when running a Heartbeat cluster is a need to prevent a STONITH of a node.  As documented in my previous post about testing STONITH the ability to STONITH nodes is very important in an operating cluster.  However when the sys-admin is performing maintenance on the system [...]]]></description>
			<content:encoded><![CDATA[<p>One situation that you will occasionally encounter when running a <a href="http://www.linux-ha.org/">Heartbeat cluster</a> is a need to prevent a STONITH of a node.  As documented in <a href="http://etbe.coker.com.au/2007/07/17/testing-stonith/">my previous post about testing STONITH</a> the ability to STONITH nodes is very important in an operating cluster.  However when the sys-admin is performing maintenance on the system or programmers are working on a development or test system it can be rather annoying.</p>
<p>One example of where STONITH is undesired is when upgrading packages of software related to the cluster services.  If during a package upgrade the data files and programs related to the <a href="http://etbe.coker.com.au/2007/06/08/heartbeat-service-scripts/">OCF script</a> are not synchronised (EG you have two programs that interact and upgrading one requires upgrading the other) at the moment that the <b>status</b> operation is run then an error may occur which may trigger a STONITH.  Another possibility is that if using small systems for testing or development (EG running a cluster under Xen with minimal RAM assigned to each node) then a package upgrade may cause the system to thrash which might then cause a timeout of the status scripts (a problem I encounter when upgrading my Xen test instances that have 64M of RAM).</p>
<p>If a STONITH occurs during the process of a package upgrade then you are likely to have consistency problems with the OS due to <a href="http://etbe.coker.com.au/2007/07/02/committing-data-to-disk/">RPM and DPKG not correctly calling fsync()</a>, this can cause the OCF scripts to always fail to run the <b>status</b> command which can cause an infinite loop of the cluster nodes in question being STONITHed.  Incidentally the best way to test for this (given the problems of a STONITH sometimes losing log data) is to boot the node in question without Heartbeat running and then run the OCF status commands manually (<a href="http://etbe.coker.com.au/2007/08/03/starting-a-heartbeat-resource-without-heartbeat/">I previously documented three ways of doing this</a>).</p>
<p>Of course the ideal (and recommended) way of solving this problem is to migrate all services from a node using the <b>crm_resource</b> program.  But in a test or development situation you may forget to migrate all services or simply forget to run the migration before the package upgrade starts.  In that case the best thing to do is to be able to remove the ability to call STONITH .  For my testing I use Xen and have the nodes <a href="http://etbe.coker.com.au/2007/06/24/xen-and-heartbeat/">ssh to the Dom0 to call STONITH</a>, so all I have to do to remove the STONITH ability is to stop the ssh daemon on the Dom0.  For a more serious test network (EG using IPMI or an equivalent technology to perform a hardware STONITH as well as ssh for OS level STONITH on a private network) a viable option might be to shut down the switch port used for such operations &#8211; shutting down switch ports is not a nice thing to do, but to allow you to continue work on a development environment without hassle it&#8217;s a reasonable hack.</p>
<p>When choosing your method of STONITH it&#8217;s probably worth considering what the possibilities are for temporarily disabling it &#8211; preferably without having to walk to the server room.</p>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2007/08/20/controlling-a-stonith-and-upgrading-a-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Starting a Heartbeat Resource Without Heartbeat</title>
		<link>http://etbe.coker.com.au/2007/08/03/starting-a-heartbeat-resource-without-heartbeat/</link>
		<comments>http://etbe.coker.com.au/2007/08/03/starting-a-heartbeat-resource-without-heartbeat/#comments</comments>
		<pubDate>Thu, 02 Aug 2007 21:00:40 +0000</pubDate>
		<dc:creator>etbe</dc:creator>
				<category><![CDATA[Ha]]></category>
		<category><![CDATA[Best Posts]]></category>

		<guid isPermaLink="false">http://etbe.coker.com.au/2007/08/03/starting-a-heartbeat-resource-without-heartbeat/</guid>
		<description><![CDATA[The command crm_resource allows you to do basic editing of resources in the Heartbeat configuration database.  But sometimes you need to do different things and the tool xmlstarlet is a good option.
The below script can be used for testing Heartbeat OCF resource scripts.  It uses the Heartbeat management program cibadmin to get the [...]]]></description>
			<content:encoded><![CDATA[<p>The command <b>crm_resource</b> allows you to do basic editing of resources in the Heartbeat configuration database.  But sometimes you need to do different things and the tool <a href="http://xmlstar.sourceforge.net/">xmlstarlet</a> is a good option.</p>
<p>The below script can be used for testing Heartbeat OCF resource scripts.  It uses the Heartbeat management program <b>cibadmin</b> to get the XML configuration data and then uses xmlstarlet to process it.  The <b>sel</b> option for xmlstarlet selects some data from an XML file, the <b>-t -m</b> options instruct it to match data from a template.  The template is the <b>/resources/primitive</b> part.  The <b>&#45;-value-of</b> expression will print the values of some labels from the XML.  The script will concatenate the <b>name</b> and <b>value</b> tags and export them as environment variables (see my <a href="http://etbe.coker.com.au/2007/06/09/configuring-a-heartbeat-service/">post about Configuring a Heartbeat Service</a> for an explanation of the use of the variables).  The <b>TYPE</b> variable is the name of the script under the <b>/usr/lib/ocf/resource.d/heartbeat</b> directory.</p>
<p>In recent versions of Heartbeat (2.1.x) the <b>OCF_ROOT</b> environment variable must be set before an OCF script is called.  Setting it on older versions of Heartbeat doesn&#8217;t do any harm so I unconditionally set it in this script (which should work for all 2.x.x versions of Heartbeat).</p>
<p>The first parameter for the script is the <b>id</b> of the service to be operated on and the second parameter is the operation to perform (<b>start</b>, <b>stop</b>, and <b>status</b> are the only interesting values).  The script will echo the exit code to the screen (0 means success, 7 means that the service is not running or the operation failed, and any other number means a serious error that will trigger a STONITH if Heartbeat gets it).</p>
<p>#!/bin/sh<br />
$(cibadmin -Q -o resources| xmlstarlet sel -t -m \<br />
 &#34;/resources/primitive [@id=&#39;$1&#39;]/instance_attributes/attributes/nvpair&#34; \<br />
 &#45;-value-of &#34;concat(&#39;export OCF_RESKEY_&#39;,@name,&#39;=&#39;,@value,&#39;&amp;#010;&#39;,&#39;TYPE=&#39;,../../../@type,&#39;&amp;#010;&#39;)&#34;)<br />
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2<br />
echo $?</p>
<p>Below is another version of the same script that instead uses <b>crm_resource</b> to get the XML data.  The output of crm_resource has a couple of lines of non-XML data at the start (removed by the grep) and also only gives the XML tree related to the primitive in question (so the <b>/resources</b> part is removed from the xmlstarlet command-line).</p>
<p>#!/bin/sh<br />
$(crm_resource -r $1 -x | grep -v ^[a-z] | xmlstarlet sel -t -m \<br />
&nbsp; &#34;/primitive/instance_attributes/attributes/nvpair&#34; &#45;-value-of \<br />
&nbsp; &#34;concat(&#39;export OCF_RESKEY_&#39;,@name,&#39;=&#39;,@value,&#39;&amp;#010;&#39;,&#39;TYPE=&#39;,../../../@type,&#39;&amp;#010;&#39;)&#34;)<br />
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2<br />
echo $?</p>
<p>The problem with both of those scripts is that they rely on Heartbeat being operational.  Performing any operations other than a status check while Heartbeat is running is a risky thing to do.  If Heartbeat starts a service at the same time as you start it via such a script then the results will probably be undesired.  One situation where it is safe to run this is when a service fails to start.  After it has failed repeatedly Heartbeat may stop trying to restart it (depending on the configuration) in which case it will be safe to try and start it.  Also you can put in temporary constraints to stop the resource from running by repeatedly running <b>crm_resource -M -r ID</b> until all nodes have been prohibited from running it (make sure you run <b>crm_resource -U -r ID</b> afterwards to remove the temporary constraints).</p>
<p>The following script does the same thing but directly reads the XML file for the Heartbeat configuration.  This is designed to be used when Heartbeat is not running.  For example you could copy the XML file from a running cluster to a test machine and then test your OCF resource scripts.</p>
<p>#!/bin/sh<br />
$(cat /var/lib/heartbeat/crm/cib.xml| xmlstarlet sel -t -m \<br />
&nbsp; &#34;/cib/configuration/resources/primitive [@id=&#39;$1&#39;]/instance_attributes/attributes/nvpair&#34; \<br />
&nbsp; &#45;-value-of \<br />
&nbsp; &#34;concat(&#39;export OCF_RESKEY_&#39;,@name,&#39;=&#39;,@value,&#39;&amp;#010;&#39;,&#39;TYPE=&#39;,../../../@type,&#39;&amp;#010;&#39;)&#34;)<br />
OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/$TYPE $2<br />
echo $?</p>
]]></content:encoded>
			<wfw:commentRss>http://etbe.coker.com.au/2007/08/03/starting-a-heartbeat-resource-without-heartbeat/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
