New ZCAV Development

I have just been running some ZCAV tests on some new supposedly 1TB disks (10^40 bytes is about 931*2^30 so is about 931G according to almost everyone in the computer industry who doesn’t work for a hard disk vendor).

I’ve added a new graph to my ZCAV results page [1] with the results.

One interesting thing that I discovered is that the faster disks can deliver contiguous data at a speed of more than 110MB/s, previously the best I’d seen from a single disk was about 90MB/s. When I first wrote ZCAV the best disks I had to test with all had a maximum speed of about 10MB/s so KB/s was a reasonable unit. Now I plan to change the units to MB/s to make it easier to read the graphs. Of course it’s not that difficult to munge the data before graphing it, but I think that it will give a better result for most users if I just change the units.

The next interesting thing I discovered is that by default GNUplot defaults to using exponential notation at the value of 1,000,000 (or 1e+06). I’m sure that I could override that but it would still make it difficult to read for the users. So I guess it’s time to change the units to GB.

I idly considered using the hard drive manufacturer’s definition of GB so that a 1TB disk would actually display as having 1000GB (the Wikipedia page for Gibibyte has the different definitions [2]). But of course having decimal and binary prefixes used in the X and Y axis of a graph would be a horror. Also the block and chunk sizes used have to be multiples of a reasonably large power of two (at least 2^14) to get reasonable performance from the OS.

The next implication of this is that it’s a bad idea to have a default block size that is not a power of two. The previous block sizes were 100M and 200M (for 1.0x and 1.9x branches respectively). Expressing these as 0.0976G and 0.1953G respectively would not be user-friendly. So I’m currently planning on 0.25G as the block size for both branches.

While changing the format it makes sense to change as many things as possible at once to reduce the number of incompatable file formats that are out there. The next thing I’m considering is the precision. In the past the speed in K/s was an integer. Obviously an integer for the speed in M/s is not going to work well for some of the slower devices that are still in use (EG a 4* CD-ROM drive maxes out at 600KB/s). Of course the accuracy of this is determined by the accuracy of the system clock. The gettimeofday() system call returns the time in micro-seconds. I expect that most systems don’t approach miro-second accuracy. I expect that it’s not worth reporting with a precision that is greater than the accuracy. Then there’s no point in making the precision of the speed any greater than the precision of the time.

Things were easier with the Bonnie++ program when I just reduced the precision as needed to fit in an 80 column display. ;)

Finally I ran my tests on my new Dell T105 system. While I didn’t get time to do as many tests as I desired before putting the machine in production I did get to do a quick test of two disks running at full speed. Previously when testing desktop systems I had not found a system which when run with two disks of the same age as the machine could extract full performance from both disks simultaneously. While the Dell T105 is a server-class system, it is a rather low-end server and I had anticipated that it would lack performance in this regard. I was pleased to note that I could run both 1TB disks at full speed at the same time. I didn’t get a chance to test three or four disks though (maybe for scheduled down-time in the future).

7 comments to New ZCAV Development

  • Jonas

    “10^40 bytes is about 931*2^30 so is about 931G according to almost everyone in the computer industry who doesn’t work for a hard disk vendor”.

    s/doesn’t work for a hard disk vendor/isn’t a scientist/

    The misuse of the metric prefixes is frankly sickening, and the willingness to blame an innocent bystander who happens to be using the correct definition is rather sad.

    If you want to butcher science for no good reason, at least have the decency to use the binary prefixes.

  • etbe

    Jonas: The hard disk vendors aren’t using the terms the way they do out of a desire to be accurate to scientific definitions, they are doing so to claim that their products are bigger than they really are.

    BTW Do you have a computer with 1.07G of RAM? ;)

  • Jonas

    I know they aren’t (not at this point anyway – they might have been decades ago), but that doesn’t make them any more wrong. What people don’t seem to realise, is that if everyone would stop misusing the metric prefixes, there wouldn’t even be any confusion for the hard disk vendors to exploit!

    Nope, my computer has exactly 2GiB of RAM.

    See, it all works out once you use the correct (IEEE-1541,ISO/IEC-80000) units.

  • etbe

    Jonas: It’s nice that they have decided to make a standard that differs from everything that everyone in the industry (apart from the hard disk vendors) has done. But having used terms such as KB for almost 30 years I don’t feel obliged to immediately adopt any new standard. Incidentally when I started using computers the vast majority of computer users had never seen a hard disk (such things were insanely expensive, the size of washing machines, and could not be connected to a computer that a home user could afford).

    Also computers with more than 1GB of RAM were on the market long before the term GiB was invented.

    The above URL has the details on IEEE 1541.

    PS As long as the computer industry is centered around a country that can’t even convert to metric measurements I don’t expect attempts to change computer terms to be successful.

  • Ben

    the “K” prefix has meant 1000 long before computers were around- and deciding to use the same prefix for a different value, while convenient at the time, has made things a mess now.

    All vendors should use the correct prefix, as hard drive vendors are (I recognize its in their interest to do so). What does Microsoft/Apple have to lose by using the correct terms? (Linux is has begun to switch).

    On a related note, I think it’s unfortunate that the binary prefixes sound silly when spoken, but at least their abbreviated forms look fine.

  • Nick

    The “industry” you’re talking about, that wouldn’t be the industry which has Gigabit network cards (one billion bits per second) and Megaflops (a million floating point operations per second) and Megapixel displays (one million pixels) would it? The industry which also uses MHz (million times per second) and any amount of other SI units with their proper, decimal prefixes?

    Feel free to dig back as far as you like, you’ll keep finding that only one group were consistently abusing prefixes and it was the group doing binary addressing. ie chip designers and RAM manufacturers. They had a justification, powers of two being what they are, and having such an influence on electronic design it really just wasn’t practical to make 8000 bit RAM chips rather than 8192 bit ones, and the difference initially seems slight. But that justification doesn’t mean they somehow get to dictate how the prefixes are used by everybody else, in their industry or any other. The SI prefixes themselves were standardised by international committee in 1960 with most having already been in widespread use prior to that.

    A great deal of misinformation has been spread about the sizing of hard disks, with people even claiming (wrongly) that the manufacturers include their own error-correction sectors and other parts of the disk unavailable for user storage in their capacity numbers. We should avoid using the wrong prefixes and thereby adding to the confusion. Disk storage does not come naturally in powers of two, and so we shouldn’t try to force it into a framework that doesn’t fit.

  • etbe

    Nick: Actually I’ve given in and use MiB and GiB in the recent releases of Bonnie++.

    Also in terms of hard disk space, if you want to store an image of a machine that has “1G of RAM” then you need 1GiB of disk space, not 1,000,000,000 bytes.

    The hard disk sector size is 512 bytes on all drives that are affordable (apparently you can get slightly larger sectors on some SAS/SCSI disks – not sure of the details of that). The block size for all common filesystems seems to be either 1K or 4K so any file size that is a power of 10 will have a partial block at the end.