<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Software vs Hardware RAID</title>
	<atom:link href="http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/feed/" rel="self" type="application/rss+xml" />
	<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/</link>
	<description>Linux, politics, and other interesting things</description>
	<pubDate>Thu, 28 Aug 2008 10:46:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1</generator>
		<item>
		<title>By: Rebuilding a software RAID &#124; Zen of Linux</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-12791</link>
		<dc:creator>Rebuilding a software RAID &#124; Zen of Linux</dc:creator>
		<pubDate>Wed, 02 Apr 2008 20:37:01 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-12791</guid>
		<description>[...] Software vs Hardware RAID Related PostsCreate a software-RAID-1 on a Linux system [...]</description>
		<content:encoded><![CDATA[<p>[...] Software vs Hardware RAID Related PostsCreate a software-RAID-1 on a Linux system [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Farnsworth</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11250</link>
		<dc:creator>Simon Farnsworth</dc:creator>
		<pubDate>Mon, 03 Dec 2007 10:42:12 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11250</guid>
		<description>Olaf,

There is a difference in terms of the severity of data loss after a cache failure and a power loss, but the basic problem is still the same.

The RAID is allowed to lose data on power loss. It is also allowed to ignore parity during reads, even if it's still checking for consistency - thus reads do *not* get verified during the background check (this is beyond the guarantees made by a RAID system, which are all about protecting your data from a drive failure, not a system failure).

The guarantee is that the only time you can read corrupt data is immediately after a power failure; you are supposed to replay journals, or otherwise validate your data, to ensure that the corruption hasn't damaged your data. If parity is damaged, *and* the background check does not reach the damaged parity before a drive failure (assuming the drive with the damaged parity block is not the drive that fails), you read correct data (and thus don't correct it from the journal), but later see it corrupted when the data block is rebuilt from parity.</description>
		<content:encoded><![CDATA[<p>Olaf,</p>
<p>There is a difference in terms of the severity of data loss after a cache failure and a power loss, but the basic problem is still the same.</p>
<p>The RAID is allowed to lose data on power loss. It is also allowed to ignore parity during reads, even if it&#8217;s still checking for consistency - thus reads do *not* get verified during the background check (this is beyond the guarantees made by a RAID system, which are all about protecting your data from a drive failure, not a system failure).</p>
<p>The guarantee is that the only time you can read corrupt data is immediately after a power failure; you are supposed to replay journals, or otherwise validate your data, to ensure that the corruption hasn&#8217;t damaged your data. If parity is damaged, *and* the background check does not reach the damaged parity before a drive failure (assuming the drive with the damaged parity block is not the drive that fails), you read correct data (and thus don&#8217;t correct it from the journal), but later see it corrupted when the data block is rebuilt from parity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olaf van der Spek</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11092</link>
		<dc:creator>Olaf van der Spek</dc:creator>
		<pubDate>Tue, 27 Nov 2007 17:51:12 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11092</guid>
		<description>Isn't there a huge difference between losing your write back cache (NVRAM failure) and a normal power loss? In the first case, you lose data that the software thinks has already been committed. In the second case, you merely corrupt data that is about to be overwritten.

&#62; you want to do the parity check in the background while the machine is back in service in many cases

In that case, you have to verify every read you do until the background check is complete.
But isn't this supposed to be caught by replaying journals?

&#62; within the guarantees of the RAID system.

I thought you just said a RAID was allowed to lose data on power loss?</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t there a huge difference between losing your write back cache (NVRAM failure) and a normal power loss? In the first case, you lose data that the software thinks has already been committed. In the second case, you merely corrupt data that is about to be overwritten.</p>
<p>&gt; you want to do the parity check in the background while the machine is back in service in many cases</p>
<p>In that case, you have to verify every read you do until the background check is complete.<br />
But isn&#8217;t this supposed to be caught by replaying journals?</p>
<p>&gt; within the guarantees of the RAID system.</p>
<p>I thought you just said a RAID was allowed to lose data on power loss?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WildKid</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11072</link>
		<dc:creator>WildKid</dc:creator>
		<pubDate>Tue, 27 Nov 2007 11:19:14 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11072</guid>
		<description>Really good and really interesting post. I expect (and other readers maybe :)) new useful posts from you! 
Good luck and successes in blogging!</description>
		<content:encoded><![CDATA[<p>Really good and really interesting post. I expect (and other readers maybe :)) new useful posts from you!<br />
Good luck and successes in blogging!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Farnsworth</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11071</link>
		<dc:creator>Simon Farnsworth</dc:creator>
		<pubDate>Tue, 27 Nov 2007 10:47:04 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-11071</guid>
		<description>Olaf,

When you've had an unexpected power fail event (whether we're talking loss of system power without NVRAM cache, or NVRAM battery failure), the RAID is permitted to corrupt data; a *good* RAID implementation guarantees that it will only corrupt areas that have been written to in the last n seconds, for some (documented) value of n. For example, if my disks guarantee that a write has completed when the drive returns completed to a write command (and not that the write has been cached), the RAID may guarantee that only the last 8MB of writes are at-risk, assuming that the data has not been synced to disk (e.g. by fsync).

Higher levels of software build on these guarantees to avoid data loss, and do things like journalling writes (or BSD-style soft updates), to ensure that once the filesystem claims something's on disk, it's safe. In turn, application software like databases or mail servers ensures that it stays within these guarantees, and can do things like replaying journals to ensure that the data is correct, or at least consistent with claims made to the outside world (e.g. a mail server doesn't give a 200 OK response to incoming mail until it's confident that the mail is safely stored - fsync is often used on UNIX-likes).

Parity corruption just means that one disk failed to write all its blocks before power loss, and in this case it happened to be the disk writing parity, not the disk writing data. The trouble with insisting on a complete parity check before you bring the machine back up is that it's slow; you want to do the parity check in the background while the machine is back in service in many cases (e.g. an outgoing SMTP server, where the RAID stores the outgoing queue), to reduce downtime to a minimum. If you don't catch the faulty parity before a data drive fails, you risk losing data that you thought was safe, within the guarantees of the RAID system.</description>
		<content:encoded><![CDATA[<p>Olaf,</p>
<p>When you&#8217;ve had an unexpected power fail event (whether we&#8217;re talking loss of system power without NVRAM cache, or NVRAM battery failure), the RAID is permitted to corrupt data; a *good* RAID implementation guarantees that it will only corrupt areas that have been written to in the last n seconds, for some (documented) value of n. For example, if my disks guarantee that a write has completed when the drive returns completed to a write command (and not that the write has been cached), the RAID may guarantee that only the last 8MB of writes are at-risk, assuming that the data has not been synced to disk (e.g. by fsync).</p>
<p>Higher levels of software build on these guarantees to avoid data loss, and do things like journalling writes (or BSD-style soft updates), to ensure that once the filesystem claims something&#8217;s on disk, it&#8217;s safe. In turn, application software like databases or mail servers ensures that it stays within these guarantees, and can do things like replaying journals to ensure that the data is correct, or at least consistent with claims made to the outside world (e.g. a mail server doesn&#8217;t give a 200 OK response to incoming mail until it&#8217;s confident that the mail is safely stored - fsync is often used on UNIX-likes).</p>
<p>Parity corruption just means that one disk failed to write all its blocks before power loss, and in this case it happened to be the disk writing parity, not the disk writing data. The trouble with insisting on a complete parity check before you bring the machine back up is that it&#8217;s slow; you want to do the parity check in the background while the machine is back in service in many cases (e.g. an outgoing SMTP server, where the RAID stores the outgoing queue), to reduce downtime to a minimum. If you don&#8217;t catch the faulty parity before a data drive fails, you risk losing data that you thought was safe, within the guarantees of the RAID system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olaf van der Spek</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10791</link>
		<dc:creator>Olaf van der Spek</dc:creator>
		<pubDate>Wed, 21 Nov 2007 16:01:40 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10791</guid>
		<description>I agree with that text.

&#62; The trouble with the situation where the parity is corrupt but not the data is that my data check after power loss shows that I’m A-OK.

What exactly does that check do? After power loss you should do a complete parity check.
And how did the parity get corrupt (while the data remained intact)?</description>
		<content:encoded><![CDATA[<p>I agree with that text.</p>
<p>&gt; The trouble with the situation where the parity is corrupt but not the data is that my data check after power loss shows that I’m A-OK.</p>
<p>What exactly does that check do? After power loss you should do a complete parity check.<br />
And how did the parity get corrupt (while the data remained intact)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Farnsworth</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10790</link>
		<dc:creator>Simon Farnsworth</dc:creator>
		<pubDate>Wed, 21 Nov 2007 15:13:53 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10790</guid>
		<description>Olaf,

A RAID system should provide a certain minimum set of guarantees; an important guarantee that the only time when data may be lost is when power goes unexpectedly. In particular, for models without NVRAM, any power loss may induce data loss. For models with NVRAM, data loss may occur if the NVRAM module loses power for more than a specified time (usually 30 days). If you go outside these conditions for any reason, you are advised to check your data, but once you have checked your data, no further corruption should occur.

The trouble with the situation where the parity is corrupt but not the data is that my data check after power loss shows that I'm A-OK. Some time later (possibly long enough that I've forgotten ever losing power), I have a disc failure while online; I hot-swap the drive, which *should* maintain my data (thanks to the data loss guarantee the RAID provides). However, the silently damaged parity means that at *this* stage, when I'm working within the limits of the guarantees RAID provides, I lose data. This is unacceptable behaviour, as the guarantee is broken; note that it *would* be acceptable to corrupt data when the power goes, so long as a data check then shows that there's been trouble.</description>
		<content:encoded><![CDATA[<p>Olaf,</p>
<p>A RAID system should provide a certain minimum set of guarantees; an important guarantee that the only time when data may be lost is when power goes unexpectedly. In particular, for models without NVRAM, any power loss may induce data loss. For models with NVRAM, data loss may occur if the NVRAM module loses power for more than a specified time (usually 30 days). If you go outside these conditions for any reason, you are advised to check your data, but once you have checked your data, no further corruption should occur.</p>
<p>The trouble with the situation where the parity is corrupt but not the data is that my data check after power loss shows that I&#8217;m A-OK. Some time later (possibly long enough that I&#8217;ve forgotten ever losing power), I have a disc failure while online; I hot-swap the drive, which *should* maintain my data (thanks to the data loss guarantee the RAID provides). However, the silently damaged parity means that at *this* stage, when I&#8217;m working within the limits of the guarantees RAID provides, I lose data. This is unacceptable behaviour, as the guarantee is broken; note that it *would* be acceptable to corrupt data when the power goes, so long as a data check then shows that there&#8217;s been trouble.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RAID and Bus Bandwidth &#124; etbe</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10761</link>
		<dc:creator>RAID and Bus Bandwidth &#124; etbe</dc:creator>
		<pubDate>Tue, 20 Nov 2007 20:01:26 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10761</guid>
		<description>[...] through Less WorkAntti-Juhani Kaijanaho on Conditions of Sending EmailOlaf van der Spek on Software vs Hardware RAIDetbe on Software vs Hardware RAIDniq on Conditions of Sending Emailalvaro on Conditions of Sending [...]</description>
		<content:encoded><![CDATA[<p>[...] through Less WorkAntti-Juhani Kaijanaho on Conditions of Sending EmailOlaf van der Spek on Software vs Hardware RAIDetbe on Software vs Hardware RAIDniq on Conditions of Sending Emailalvaro on Conditions of Sending [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Olaf van der Spek</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10749</link>
		<dc:creator>Olaf van der Spek</dc:creator>
		<pubDate>Tue, 20 Nov 2007 11:22:49 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10749</guid>
		<description>&#62; When considering the mathematical issues we consider only a single stripe of a RAID-5 which has one block of data for parity and N-1 blocks of data.

Fair enough, but I don't see how that relates to my comments.</description>
		<content:encoded><![CDATA[<p>&gt; When considering the mathematical issues we consider only a single stripe of a RAID-5 which has one block of data for parity and N-1 blocks of data.</p>
<p>Fair enough, but I don&#8217;t see how that relates to my comments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: etbe</title>
		<link>http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10748</link>
		<dc:creator>etbe</dc:creator>
		<pubDate>Tue, 20 Nov 2007 10:51:13 +0000</pubDate>
		<guid isPermaLink="false">http://etbe.coker.com.au/2007/11/16/software-vs-hardware-raid/#comment-10748</guid>
		<description>http://etbe.coker.com.au/2007/11/21/raid-and-bus-bandwidth/
cmot: Thanks for the suggestion, the above URL will have a response to your comment and post shortly.

Simon: Scrubbing is good, but is typically only run from a cron job weekly (or some other infrequent interval).  Of course if you do an entire RAID rebuild operation after every power failure (as Linux software RAID is prone to do) then you get some consistency at the cost of performance.

Ole: Auction sites have HP machines with hardware RAID at quite reasonable prices.  HP supports them for a minimum of 5 years (within which time they guarantee that they will provide replacement hardware to read the disks).  Also it's not impossible to recreate an unknown RAID format.  Some time ago I attended a lecture on how to determine the RAID format when you get a set of disks from an unknown machine.  You have to recognise some patterns in the data, a large file with a known format is good for this.  Then as there isn't much variation in RAID formats you just work out the stripe size and the order of the disks and it's simple to write a program to dump all the data to a single block device or file.

Olaf: When considering the mathematical issues we consider only a single stripe of a RAID-5 which has one block of data for parity and N-1 blocks of data.  The fact that each successive stripe rotates the order by one is not relevant when considering single-stripe issues.  Another lacking feature in Linux software RAID is the ability to read all disks and compare the result.  Of course when you can't actually do anything useful once you know the data is bad the deficiency isn't so bad.  It's a pity that you can't have a 3 disk mirror and take the majority vote or have RAID-6 read from all disks (with the ability to correct a single block error).</description>
		<content:encoded><![CDATA[<p><a href="http://etbe.coker.com.au/2007/11/21/raid-and-bus-bandwidth/" rel="nofollow">http://etbe.coker.com.au/2007/11/21/raid-and-bus-bandwidth/</a><br />
cmot: Thanks for the suggestion, the above URL will have a response to your comment and post shortly.</p>
<p>Simon: Scrubbing is good, but is typically only run from a cron job weekly (or some other infrequent interval).  Of course if you do an entire RAID rebuild operation after every power failure (as Linux software RAID is prone to do) then you get some consistency at the cost of performance.</p>
<p>Ole: Auction sites have HP machines with hardware RAID at quite reasonable prices.  HP supports them for a minimum of 5 years (within which time they guarantee that they will provide replacement hardware to read the disks).  Also it&#8217;s not impossible to recreate an unknown RAID format.  Some time ago I attended a lecture on how to determine the RAID format when you get a set of disks from an unknown machine.  You have to recognise some patterns in the data, a large file with a known format is good for this.  Then as there isn&#8217;t much variation in RAID formats you just work out the stripe size and the order of the disks and it&#8217;s simple to write a program to dump all the data to a single block device or file.</p>
<p>Olaf: When considering the mathematical issues we consider only a single stripe of a RAID-5 which has one block of data for parity and N-1 blocks of data.  The fact that each successive stripe rotates the order by one is not relevant when considering single-stripe issues.  Another lacking feature in Linux software RAID is the ability to read all disks and compare the result.  Of course when you can&#8217;t actually do anything useful once you know the data is bad the deficiency isn&#8217;t so bad.  It&#8217;s a pity that you can&#8217;t have a 3 disk mirror and take the majority vote or have RAID-6 read from all disks (with the ability to correct a single block error).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
