Noise in Computer Rooms

Some people think that you can recognise a good restaurant by the presence of obscure dishes on the menu or having high prices. The reality is that there are two ways of quickly identifying a good restaurant, one is the Michelin Guide [1] (or a comparable guide – if such a thing exists), the other is how quiet the restaurant is.

By a quiet restaurant I certainly don’t mean a restaurant with no customers (which may become very noisy once customers arrive). I mean a restaurant which when full will still be reasonably quiet. Making a restaurant quiet is not in itself a sufficient criteria to be a good restaurant – but it’s something that is usually done after the other criteria (such as hiring good staff and preparing a good menu) are met.

The first thing to do to make a room quiet is to have good carpet. Floor boards are easy to clean and the ratio of investment to lifetime is very good (particularly for hard wood), but they reflect sound and the movement of chairs and feet makes noise. A thick carpet with a good underlay is necessary to absorb sound. Booths are also good for containing sound if the walls extend above head height. Decorations on the walls such as curtains and thick wallpaper also absorb sound. A quiet environment allows people to talk at a normal volume which improves the dining experience.

It seems to me that the same benefits apply to server rooms and offices, with the benefit being more efficient work. I found it exciting when I first had my desk in a server room (surrounded by tens of millions of pounds worth of computer gear). But as I got older I found it less interesting to work in that type of environment just as I found it less interesting to have dinner in a noisy bar – and for the same reasons.

For a server room there is no escaping the fact that it will be noisy. But if the noise can be minimised then it will allow better communication between the people who are there and less distraction which should result in higher quality of work – which matters if you want good uptime! One thing I have observed is that physically larger servers tend to make less noise per volume and per compute power. For example a 2RU server with four CPUs seems to always make less noise than two 1RU servers that each have two CPUs. I believe that this is because a fan with a larger diameter can operate at a lower rotational speed which results in less bearing noise and the larger fans also give less turbulence. While it’s obvious that using fewer servers via virtualisation has the potential to avoid noise (both directly through fans and disks and indirectly through the cooling system for the server room [2]). A less obvious way of reducing noise is to swap two 1RU servers for one 2RU server – although my experience is that for machines in a similar price band, a 2RU server often has comparable compute power (in terms of RAM and disk capacity) to three or four 1RU servers.

To reduce noise both directly and indirectly it is a requirement to increase disk IO capacity (in terms of the number of random IOs per second) without increasing the number of spindles (disks). I just read an interesting Sun blog covering some concepts related to using Solid State Disks (SSDs) on ZFS for best performance [3]. It seems that using such techniques is one way of significantly increasing the IO capacity per server (and thus allowing more virtual servers on one physical machine) – it’s a pity that we currently don’t have access to ZFS or a similar filesystem for Linux servers (ZFS has license issues and the GPL alternatives are all in a beta state AFAIK). Another possibility that seems to have some potential is the use of NetApp Filers [4] for the main storage of virtual machines. A NetApp Filer gives a better ratio of IO requests per second to the number of spindles used than most storage array products due to the way they use NVRAM caching and their advanced filesystem features (which also incidentally gives some good options for backups and for detecting and correcting errors). So a set of 2RU servers that have the maximum amount of RAM installed and which use a NetApp Filer (or two if you want redundancy) for the storage with the greatest performance requirements should give the greatest density of virtual machines.

Blade servers also have potential to reduce noise in the server room. The most significant way that they do this is by reducing the number of power supplies, instead of having one PSU per server (or two if you want redundancy) you might have three or five PSUs for a blade enclosure that has 8 or more blades. HP blade enclosures support shutting down some PSUs when the blades are idling and don’t need much power (I don’t know whether blade enclosures from other vendors do this – I expect that some do).

A bigger problem however is the noise in offices where people work. It seems that the major responsible for this is the cheap cubicles that are used in most offices (and almost all computer companies). More expensive cubicles that are at almost head-height (for someone who is standing) and which have a cloth surface absorb sound better significantly improve the office environment, and separate offices are better still. One thing I would like to see is more use of shared desktop computers, it’s not difficult to set up a desktop machine with multiple video cards, so with appropriate software support (which is really difficult) you could have one desktop machine for two, or even four users which would save electricity and reduce noise.

Better quality carpet on the floors would also be a good thing. While office carpet wears out fast adding some underlay would not increase the long-term cost (it can remain as the top layer gets replaced).

Better windows in offices are necessary to provide a quiet working environment. The use of double-glazed windows with reflective plastic film significantly decreases the amount of heating and cooling that is required in the office. This would permit a lower speed of air flow for heating and cooling which means less noise. Also an office in a central city area will have a noise problem outside the building, again double (or even triple) glazed windows help a lot.

Some people seem to believe that an operations room should have no obstacles (one ops room where I once worked had all desks facing a set of large screens that displayed network statistics and the desks were like school desks with no dividers), I think that even for an ops room there should be some effort made to reduce the ambient noise. If the room is generally reasonably quiet then it should be easy to shout the news of an outage so that everyone can hear it.

Let’s assume for the sake of discussion that a quieter working environment can increase productivity by 5% (I think this is a conservative assumption). For an office full of skilled people who are doing computer work the average salary may be about $70,000, and it’s widely regarded that to factor in the management costs etc you should double the salary – so the average cost of an employee would be about $140,000. If there are 50 people in the office then the work of those employees has a cost of $7,000,000 per annum. A 5% increase in that would be worth $350,000 per annum – you could buy a lot of windows for that!

Efficiency of Cooling Servers

One thing I had wondered was why home air-conditioning systems are more efficient than air-conditioning systems for server rooms. I received some advice on this matter from the manager of a small server room (which houses about 30 racks of very powerful and power hungry servers).

The first issue is terminology, the efficiency of a “chiller” is regarded as the number of Watts of heat energy removed divided by the number of Watts of electricity consumed by the chiller. For example when using a 200% efficient air cooling plant, a 100W light bulb is rated as being a 150W heat source. 100W to Heat it, 50W from the cooling plant to cool it.

For domestic cooling I believe that 300% is fairly common for modern “split systems” (it’s the specifications for the air-conditioning on my house and the other air-conditioners on display had similar ratings). For high-density server rooms with free air cooling I have been told that a typical efficiency range is between 80% and 110%! So it’s possible to use MORE electricity on cooling than on running the servers!

One difficulty in cooling a server room is that the air often can’t flow freely (unlike a big open space such as the lounge room of your house). Another is the range of temperatures and the density of heat production in some parts (a 2RU server can dissipate 1000W of heat in a small space). These factors can be minimised by extracting hot air at the top and/or rear of racks and forcing cold air in the bottom and/or the front and by being very careful when planning where to place equipment. HP offers some services related to designing a server room to increase cooling efficiency, one of the services is using computational fluid dynamics to simulate the air-flow in the server-room [1]! CFD is difficult and expensive (the complete package from HP for a small server room costs more than some new cars), I believe that the fact that it is necessary for correct operation of some server rooms is an indication of the difficulty of the problem.

The most effective ways of cooling servers involve tight coupling of chillers and servers. This often means using chilled water or another liquid to extract the heat. Chilled water refrigeration systems largely remove the problem of being unable to extract the heat from the right places, but instead you have some inefficiency in pumping the water and the servers are fixed in place. I have not seen or heard of chilled water being used for 2RU servers (I’m not saying that it doesn’t get used or that it wouldn’t make sense – merely that I haven’t seen it). When installing smaller servers (2RU and below) there is often a desire to move them and attaching a chilled-water cooling system would make such a move more difficult and expensive. When a server weighs a ton or more then you aren’t going to move it in a hurry (big servers have to be mostly disassembled before the shell can be moved, and the shell might require the efforts of four men to move it). Another issue related to water cooling is the weight. Managing a moderate amount of water involves a lot of heavy pipes (a leak would be really bad) and the water itself can weigh a lot. A server room that is based around 20Kg servers might have some issues with the extra weight of water cooling (particularly the older rooms), but a server room designed for a single rack that weighs a ton can probably cope.

I have been told that the cooling systems for low density server rooms are typically as efficient as those used for houses, and may even be more efficient. I expect that when designing an air-conditioner the engineering trade-offs when designing for home use favor low purchase price. But someone who approves the purchase of an industrial cooling system will be more concerned about the overall cost of operations and will be prepared to spend some extra money up-front and recover it over the course of a few years. The fact that server rooms run 24*7 also gives more opportunity to recover the money spent on the purchase (my home A-C system runs for about 3 months a year for considerably less than 24 hours a day).

So it seems that the way to cool servers efficiently is to have low density server rooms (to the largest extent possible). One step towards this goal would be to have servers nearer the end users. For example having workgroup servers near the workgroup (instead of in the server room). Of course physical security of those servers would be more challenging – but if all the users have regular desktop PCs that can be easily 0wned then having the server for them in the same room probably doesn’t make things any worse. Modern tower servers are more powerful than rack mounted servers that were available a few years ago while also being very quiet. A typical rack-mounted server is not something you would want near your desk, but one of the quiet tower servers works quite well.

Variable Names

For a long time I have opposed single letter variable names. Often I see code which has a variable for a fixed purpose with a single letter name, EG “FILE *f;“, the problem with this is that unless you choose a letter such as ‘z‘ which has a high scrabble score (and probably no relation to what your program is doing) then it will occur in other variable names and in reserved words for the language in question. As a significant part of the time spent coding will involve reading code so even for programmers working on a project a useful amount of time can be saved by using variable names that can easily by found by a search program. Often it’s necessary to read source code to understand what a system does – so that is code reading without writing.

With most editors and file viewing tools searching for a variable with a single character name in a function (or source file for a global variable) is going to be difficult. Automated searching is mostly useless, probably the best option is to have your editor highlight every instance and visually scan for the ones with are not surrounded by brackets, braces, parenthesis, spaces, commas, or whatever else is not acceptable in a variable name in the language in question.

Of course if you have a syntax highlighting editor then it might parse enough of the language to avoid this. But the heavier editors are not always available. Often I edit code on the system where the crash occurs (it makes it easier to run a debugger). Installing one of the heavier editors is often not an option for such a task (the vim-full Debian/Lenny package for AMD64 has dependencies that involve 27M of packages files to download and would take 100M of disk space to install quite a lot to ask if you just want to edit a single source file). Incidentally I am interested in suggestions for the best combination of features and space in a vi clone (color syntax highlighting is a feature I desire).

But even if you have a fancy editor, there is still the issue of using tools such as less and grep to find uses of variables. Of course for some uses (such as a loop counter) there is little benefit in using grep.

Another issue to consider is the language. If you write in Perl then a search for \$i should work reasonably well.

One of the greatest uses of single letter variable names is the ‘i‘ and ‘j‘ names for loop counters. In the early days of computing FORTRAN was the only compiled language suitable for scientific tasks and it had no explicit way of declaring variables, if a variable name started with i, j, k, l, m, or n then it was known to be an integer. So i became the commonly used name for a loop counter (the first short integer variable name). That habit has been passed on through the years so now many people who have never heard of FORTRAN use i as the name for a loop counter and j as the name for the inner loop in nested loops. [I couldn’t find a good reference for FORTRAN history – I’ll update this post if someone can find one.]

But it seems to me that using idx, index, or even names such as item_count which might refer to the meaning of the program might be more efficient overall. Searching for instances of i in a program is going to be difficult at the best of times, even without having multiple loops (either in separate functions or in the same function) with the same variable name.

So if there is to be a policy for variable names for counters, I think that it makes most sense to have multiple letters in all variable names to allow for easy grepping, and to have counter names which apply to what is being counted. Some effort to give different index names to different for/while loops would make sense too. Having two different for loops with a counter named index is going to make things more difficult for someone who reads the code. Of course there are situations where two loops should have the same variable, for example if one loop searches through an array to find a particular item and then the next loop goes backward through the array to perform some operation on all preceding items then it makes sense to use the same variable.

Label vs UUID vs Device

Someone asked on a mailing list about the issues related to whether to use a label, UUID, or device name for /etc/fstab.

The first thing to consider is where the names come from. The UUID is assigned automatically by mkfs or mkswap, so you have to discover it after the filesystem or swap space has been made (or note it during the mkfs/mkswap process). For the ext2/3 filesystems the command “tune2fs -l DEVICE” will display the UUID and label (strangely mke2fs uses the term “label” while the output of tune2fs uses the term “volume name“). For a swap space I don’t know of any tool that can extract the UUID and name. On Debian (Etch and Unstable) the file command does not display the UUID for swap spaces or ext2/3 filesystems and does not display the label for ext2/3 filesystems. After I complete this blog post I will file a bug report.

If you are using a version of Debian earlier than Lenny (or a version of Unstable with this bug fixed) then you will be able to easily determine the label and UUID of a filesystem or swap space. Other than that the inconvenience of determining the UUID and label will be a reason for not using them in /etc/fstab (keep in mind that sys-admin work sometimes needs to be done at 3AM).

One problem with mounting by UUID or label is that it doesn’t work well with snapshots and block device backups. If you have a live filesystem on /dev/sdc and an image from a backup on /dev/sdd then there is a lot of potential for excitement when mounting by UUID or label. Snapshots can be made by a volume manager (such as LVM), a SAN, or an iSCSI server.

Another problem is that if a file-based backup is made (IE tar or cpio) then you lose the UUID and label. tune2fs allows setting the UUID, but that seems like a potential recipe for disaster. So this means that if mounting by UUID then you would potentially need to change /etc/fstab after doing a full filesystem restore from a file-based backup, this is not impossible but might not be what you desire. Setting the label is not difficult, but it may be inconvenient.

When using old-style IDE disks the device names were of the form /dev/hda for the first disk on the first controller (cable) and /dev/hdd for the second disk on the second controller. This was quite unambiguous, adding an extra disk was never going to change the naming.

With SCSI disks the naming issue has always been more complex, and which device gets the name /dev/sda was determined by the order in which the SCSI HAs were discovered. So if a SCSI HA which had no disks attached suddenly had a disk installed then the naming of all the other disks would change on the next boot! To make things more exciting Fedora 9 is using the same naming scheme for IDE devices as for SCSI devices, I expect that other distributions will follow soon and then even with IDE disks permanent names will not be available.

In this situation the use of UUIDs or LABELS is required for the use of partitions. However a common trend is towards using LVM for all storage, in this case LVM manages labels and UUIDs internally (with some excitement if you do a block device backup of an LVM PV). So LV names such as /dev/vg0/root then become persistent and there is no need for mounting via UUID or label.

The most difficult problem then becomes the situation where a FC SAN has the ability to create snapshots and make them visible to the same machine. UUID or label based mounting won’t work unless you can change them when creating the snapshot (which is not impossible but is rather difficult when you use a Windows GUI to create snapshots on a FC SAN for use by Linux systems). I have had some interesting challenges with this in the past when using a FC based SAN with Linux blade servers, and I never devised a good solution.

When using iSCSI I expect that it would be possible to force an association between SCSI disk naming and names on the server, but I’ve never had time to test it out.

Update: I have submitted Debian bug #489865 with a suggested change to the magic database.

Below are /etc/magic entries for displaying the UUID and label on swap spaces and ext2/3 filesystems:

Continue reading “Label vs UUID vs Device”

Advertising a Scam

Below is a strange Google advert that appeared on my blog. It appeared when I did a search on my blog, it also appears on my post about perpetual motion. It seems quite strange that they are advertising their product as a scam. It’s accurate, but I can’t imagine it helping sales.

Google advert for SCAM

I Just Joined SAGE

I’ve just joined SAGE AU – the System Administrators Guild of Australia [1] .

I’ve known about SAGE for a long time, in 2006 I presented a paper at their conference [2] (here is the paper [3] – there are still some outstanding issues from that one, I’ll have to revisit it).

They have been doing good things for a long time, but I haven’t felt that there was enough benefit to make it worth spending money (there are a huge variety of free things that I can do related to Linux which I don’t have time to do). But now Matt Bottrell has been promoting Internode and SAGE, SAGE members get a 15% discount [4]. As I’ve got one home connection through Internode and will soon get another it seems like time to join SAGE.

Shelf-life of Hardware

Recently I’ve been having some problems with hardware dying. Having one item mysteriously fail is something that happens periodically, but having multiple items fail in a small amount of time is a concern.

One problem I’ve had is with CD-ROM drives. I keep a pile of known good CD-ROM drives because as they have moving parts they periodically break and I often buy second-hand PCs with broken drives. On each of the last two occasions when I needed a CD-ROM drive I had to try several drives before I found one that worked. It appears that over the course of about a year of sitting on a shelf I have had four CD-ROM drives spontaneously die. I expect drives to die if they are used a lot from mechanical wear, I also expect them to die over time as the system cooling fans suck air through them and dust gets caught. I don’t expect them to stop working when stored in a nice dry room. I wonder whether I would find more dead drives if I tested all my CD and DVD drives or whether my practice of using the oldest drives for machines that I’m going to give away caused me to select the drives that were most likely to die.

Today I had a problem with hard drives. I needed to test a Xen configuration for a client so I took two 20G disks from my pile of spare disks (which were only added to the pile after being tested). Normally I wouldn’t use a RAID-1 configuration for a test machine unless I was actually testing the RAID functionality, it was only the possibility that the client might want to borrow the machine that made me do it. But it was fortunate as one of the disks died a couple of hours later (just long enough to load all the data on the machine). Yay! RAID saved me losing my work!

Then I made a mistake that I wouldn’t make on a real server (I only got lazy because it was a test machine and I didn’t have much risk). I had decided to instead make it a RAID-1 of 30G disks and to save some inconvenience I transfered the LVM from the degraded RAID on the old drive to a degraded RAID on a new disk. I was using a desktop machine and it wasn’t designed for three hard disks so it was easier to transfer the data in a way that doesn’t need to have more than two disks in the machine at any time. Then the new disk died as soon as I had finished moving the LVM data. I could have probably recovered that from the LVM backup data and even if that hadn’t worked I had only created a few LVs and they were contiguous so I could have worked out where the data was.

Instead however I decided to cut my losses and reinstall it all. The ironic thing is that I had planned to make a backup of the data in question (so I would have copies of it on two disks in the RAID-1 and another separate disk), but I had a disk die before I got a chance to make a backup.

Having two disks out of the four I selected die today is quite a bad result. I’m sure that some people would suggest simply buying newer parts. But I’m not convinced that a disk manufactured in 2007 would survive being kept on a shelf for a year any better than a disk manufactured in 2001. In fact there is some evidence that the failure rates are highest when a disk is new.

Apart from stiction I wouldn’t expect drives to cease working from not being used, I would expect drives to last longer if not used. But my rate of losing disks in running machines is minute. Does anyone know of any research into disks dying while on the shelf?

Smoke from the PSU

Yesterday I received two new machines from DOLA on-line auctions [1]. I decided to use the first to replace the hardware for my SE Linux Play Machine [2]. The previous machine I had used for that purpose was a white-box 1.1GHz Celeron and I replaced it with an 800MHz Pentium3 system (which uses only 35W when slightly active and only 28W when the hard disk spins down [3]).

The next step was to get the machine in question ready for it’s next purpose, I was planning to give it to a friend of a friend. A machine of those specs which was made by Compaq would be very useful to me, but when it’s a white-box I’ll just give it away. So I installed new RAM and a new hard drive in it (both of which had been used in another machine a few hours earlier and seemed to be OK) and turned it on. Nothing happened, I was just checking that it was plugged in correctly when I noticed smoke coming from the PSU… It seems strange that the machine in question had run 24*7 for about 6 months and then suddenly started smoking after being moved to a different room and being turned off overnight.

It is possible that the hard drive was broken and shorted out the PSU (the power cables going to the hard drive are thick enough that it could damage the PSU if it had a short-circuit). What I might do in the future is keep an old and otherwise useless machine on hand for testing hard drives so that if something like that happens then it won’t destroy a machine that is useful. Another possibility is that the dust in the PSU contained some metal fragments and that moving the machine to another room caused them to short something out, but there’s not much I can do with that when I get old machines. I might put an air filter in each room that I use for running computers 24*7 to stop such problems getting worse in future though.

I recently watched the TED lecture “5 dangerous things you should let your kids do” [4], so I’m going to offer the broken machine to some of my neighbors if they want to let their children take it apart.

big and cheap USB flash devices

It’s often the case with technology that serious changes occur at a particular price or performance point in development. Something has small use until it can be developed to a certain combination of low price and high performance that everyone demands.

I believe that USB flash devices are going to be used for many interesting things starting about now. The reason is that 2G flash devices are now on sale for under $100. To be more precise 1G costs $45AU and
2G costs $85AU.

http://www.coker.com.au/hardware/usb.html

The above page on my web site has some background information on the performance of USB devices and the things that people are trying to do with them (including MS attempting to use them as cache).

One thing that has not been done much is to use USB for the main storage of a system. The OLPC machines have been designed to use only flash for storage as has the Familiar distribution for iPaQ PDAs (and probably several other Linux distributions of which I am not aware). But there are many other machines that could potentially use it. Firewall and router machines would work well. With 2G of storage you could even have a basic install of a workstation!

Some of the advantages of Flash for storage are that it uses small amounts of electricity, has no moving parts (can be dropped without damage), and has very low random access times. These are good things for firewalls and similar embedded devices.

An independent advantage of USB Flash is that it can be moved between machines with ease. Instead of moving a flash disk with your data files you can move a flash disk with your complete OS and applications!

The next thing I would like to do with USB devices is to install systems. Currently a CentOS or Red Hat Enterprise Linux install is just over 2G (I might be able to make a cut-down version that fits on a 2G flash device) and Fedora Core is over 3G. As Flash capacity goes up in powers of two I expect that soon the 4G flash devices will appear on the market and I will be able to do automated installs from Flash. This will be really convenient for my SE Linux hands-on training sessions as I like to have a quick way of re-installing a machine for when a student breaks it badly – I tell the students “play with things, experiment, break things now when no-one cares so that you can avoid breaking things at work”.

The final thing I would like to see is PCs shipped with the ability to boot from all manner of Flash devices (not just USB). I recently bought myself a new computer and it has a built-in capacity to read four different types of Flash modules for cameras etc. Unfortunately it was one of the few recent machines I’ve seen that won’t boot from USB Flash (the BIOS supported it but it didn’t work for unknown reasons). Hopefully the vendors will soon make machines that can boot from CF and other flash formats (the more format choices we have the better the prices will be).