Linux, politics, and other interesting things
I have measured the bus bottlenecks of a couple of desktop machines running IDE disks with my ZCAV  benchmark (part of the Bonnie++ suite). The results show that two typical desktop machines had significant bottlenecks when running two disks for contiguous read operations . Here is one of the graphs which shows that when two disks were active (on different IDE cables) the aggregate throughput was just under 80MB/s on a P4 1.5GHz while the disks were capable of delivering up to 120MB/s:
On a system such as the above P4 using software RAID will give a performance hit when compared to a hardware RAID device which is capable of driving both disks at full speed. I did not benchmark the relative speeds of read and write operations (writing is often slightly slower), but if for the sake of discussion we assume that read and write give the same performance then software RAID would only give 2/3 the performance of a theoretical perfect hardware RAID-1 implementation for large contiguous writes.
On a RAID-5 array the bandwidth for large contiguous writes is the data size multiplied by N/(N-1) (where N is the number of disks), and on a RAID-6 array it is N/(N-2). For the case of a four disk RAID-6 array that would give the same overhead as writing to a RAID-1 and for the case of a minimal RAID-5 array it would be 50% more writes. So from the perspective of “I need X bandwidth, can my hardware deliver it” if I needed 40MB/s of bandwidth for contiguous writes then a 3 disk RAID-5 might work but a RAID-1 definitely would hit a bottleneck.
Given that large contiguous writes to a RAID-1 is a corner case and that minimal sized RAID-5 and RAID-6 arrays are rare in most cases there should not be a significant overhead. As the number of seeks increases the actual amount of data transferred gets quite small. A few years ago I was running some mail servers which had a very intense IO load, four U320 SCSI disks in a hardware RAID-5 array was a system bottleneck – yet the IO was only 600KB/s of reads and 3MB/s of writes. In that case seeks were the bottleneck and write-back caching (which is another problem area for Linux software RAID) was necessary for good performance.
For the example of my P4 system, it is quite obvious that with a four disk software RAID array consisting of disks that are reasonably new (anything slightly newer than the machine) there would be some bottlenecks.
Another problem with Linux software RAID is that traditionally it has had to check the consistency of the entire RAID array in the case of an unexpected power failure. Such checks are the best way to get all disks in a RAID array fully utilised (Linux software RAID does not support reading from all disks in a mirror and checking that they are consistent for regular reads), so of course the issue of a bus bottleneck becomes an issue.
Of course the solution to these problems is to use a server for server tasks and then you will not run out of bus bandwidth so easily. In the days before PCI-X and PCIe there were people running Linux software RAID-0 across multiple 3ware hardware controllers to get better bandwidth. A good server will have multiple PCI buses so getting an aggregate throughput greater than PCI bus bandwidth is possible. Reports of 400MB/s transfer rates using two 64bit PCI buses (each limited to ~266MB/s) were not uncommon. Of course then you run into the same problem, but instead of being limited to the performance of IDE controllers on the motherboard in a desktop system (as in my test machine) you would be limited to the number of PCI buses and the speed of each bus.
If you were to install enough disks to even come close to the performance limits of PCIe then I expect that you would find that the CPU utilisation for the XOR operations is something that you want to off-load. But then on such a system you would probably want the other benefits of hardware RAID (dynamic growth, having one RAID that has a number of LUNs exported to different machines, redundant RAID controllers in the same RAID box, etc).
I think that probably 12 disks is about the practical limit of Linux software RAID due to these issues and the RAID check speed. But it should be noted that the vast majority of RAID installations have significantly less than 12 disks.
One thing that cmot mentioned was a RAID controller that runs on the system bus and takes data from other devices on that bus. Does anyone know of such a device?