Linux, politics, and other interesting things
I have just been running some ZCAV tests on some new supposedly 1TB disks (10^40 bytes is about 931*2^30 so is about 931G according to almost everyone in the computer industry who doesn’t work for a hard disk vendor).
I’ve added a new graph to my ZCAV results page  with the results.
One interesting thing that I discovered is that the faster disks can deliver contiguous data at a speed of more than 110MB/s, previously the best I’d seen from a single disk was about 90MB/s. When I first wrote ZCAV the best disks I had to test with all had a maximum speed of about 10MB/s so KB/s was a reasonable unit. Now I plan to change the units to MB/s to make it easier to read the graphs. Of course it’s not that difficult to munge the data before graphing it, but I think that it will give a better result for most users if I just change the units.
The next interesting thing I discovered is that by default GNUplot defaults to using exponential notation at the value of 1,000,000 (or 1e+06). I’m sure that I could override that but it would still make it difficult to read for the users. So I guess it’s time to change the units to GB.
I idly considered using the hard drive manufacturer’s definition of GB so that a 1TB disk would actually display as having 1000GB (the Wikipedia page for Gibibyte has the different definitions ). But of course having decimal and binary prefixes used in the X and Y axis of a graph would be a horror. Also the block and chunk sizes used have to be multiples of a reasonably large power of two (at least 2^14) to get reasonable performance from the OS.
The next implication of this is that it’s a bad idea to have a default block size that is not a power of two. The previous block sizes were 100M and 200M (for 1.0x and 1.9x branches respectively). Expressing these as 0.0976G and 0.1953G respectively would not be user-friendly. So I’m currently planning on 0.25G as the block size for both branches.
While changing the format it makes sense to change as many things as possible at once to reduce the number of incompatable file formats that are out there. The next thing I’m considering is the precision. In the past the speed in K/s was an integer. Obviously an integer for the speed in M/s is not going to work well for some of the slower devices that are still in use (EG a 4* CD-ROM drive maxes out at 600KB/s). Of course the accuracy of this is determined by the accuracy of the system clock. The gettimeofday() system call returns the time in micro-seconds. I expect that most systems don’t approach miro-second accuracy. I expect that it’s not worth reporting with a precision that is greater than the accuracy. Then there’s no point in making the precision of the speed any greater than the precision of the time.
Things were easier with the Bonnie++ program when I just reduced the precision as needed to fit in an 80 column display. ;)
Finally I ran my tests on my new Dell T105 system. While I didn’t get time to do as many tests as I desired before putting the machine in production I did get to do a quick test of two disks running at full speed. Previously when testing desktop systems I had not found a system which when run with two disks of the same age as the machine could extract full performance from both disks simultaneously. While the Dell T105 is a server-class system, it is a rather low-end server and I had anticipated that it would lack performance in this regard. I was pleased to note that I could run both 1TB disks at full speed at the same time. I didn’t get a chance to test three or four disks though (maybe for scheduled down-time in the future).