RAID etc

On a closed mailing list someone wrote:
2 X 120gb ide drives installed as slaves on each ide channels. … Presto. A 230’ish GB storage NAS for all my junk.

I’m not going to write a long technical response on a closed list so I’ll blog about it instead.

Firstly I wonder whether by “junk” the poster means stuff that is not important and which won’t be missed if it goes away.

If P is the probability of a drive not dying in a given time period (as a number between 1 being certain death and 0 an immortal drive) then the probability of serious data loss is P^2 for the configuration in question.

If P has a value of 0.5 over the period of 7 years (approximately what I’m seeing in production for IDE drives) then your probability of not losing data over that period is 0.25, IE there’s a 75% chance that at least one of the drives will die and data will be lost.

If the data in question really isn’t that important then this might be OK. About half the data on my file server consists of ISO images of Linux distributions and other things which aren’t of particularly great value as I can download them again at any time. Of course it would be a major PITA if a client had a problem with an old distribution and I had to wait for a 3G download to finish before fixing it, this factor alone makes it worth my effort in using RAID and backups for such relatively unimportant data. 300G IDE and S-ATA disks aren’t that expensive nowadays, if buying a pair of bigger disks saves you one data loss incident and your time has any value greater than $10 per hour then you are probably going to win by buying disks for RAID-1.

As another approach, LVM apparently has built-in functionality equivalent to RAID-1. One thing I have idly considered is using ATA over Ethernet with LVM or GFS to build some old P3 machines into a storage solution.

P3 machines use 38W of power each (with one disk, maybe as much as 70W with 4 disks but I haven’t checked) and should have the potential to perform well if they each have 4 IDE disks installed. That way a large number of small disks could combine to give a decent capacity with data mirroring. Among other things having more spindles decreases seek times when under heavy load. If you do work that involves large numbers of seeks then this could deliver significant performance benefits. If I had more spare time I would do some research on this, it would probably make for a good paper at a Linux conference.

Comments are closed.