Archives

Categories

New Storage Developments

Eweek has an article on a new 1TB Seagate drive. Most home users don’t have a need for 1TB of storage (the only way I’ve ever filled a 300G drive is by using it for multiple backups) and enterprise customers generally want small fast drives (with <100G drives still being sold for servers).

One interesting thing to note about this is the fact that the new drive is described as being able to sustain 105MB/s (although I suspect that is only for the first few zones of the disk – see my ZCAV results page for graphs of the performance of some typical disks). The previous disks that I have seen have topped out at about 90MB/s. However even if the drive could deliver 105MB/s over it’s entire capacity it would take almost 3 hours to perform a backup – increases in IO speeds have not been keeping up with capacity increases. Another interesting thing is the potential for using the same technology in low power 2.5 inch disks. While a 1TB disk isn’t going to be much use to me a 300G 2.5inch disk that uses less power will be very useful – and it might be possible to perform a backup of such a disk in a reasonable amount of time!

The latest trend in PCs seems to be small form factor (SFF) and low power machines that don’t have space for two drives. If you want RAID-1 on your home machines for reliability then that isn’t very convenient. But two 2.5 inch disks can fit in less space than a single 3.5 inch disk and therefore all the SATA based SFF machines that are currently being shipped with a single 3.5 inch disk can be upgraded to a pair of 2.5 inch disks – this will be convenient for me when those machines start going cheap at auction next year!

Even for servers the trend seems to be towards 2.5 inch disks. I recently bought a 2U HP server that supports up to 8 hot-swap SFF disks, I wonder how the performance of 8*SFF disks would compare to 4 of the bigger and faster disks.

The next thing we need is a method of backing up large volumes of data. The 650M data CD came out when a 150M disk was considered big. The 4.7G data DVD started to become popular when 45G disks were considered big. Now almost everyone has 300G disks and 1TB disks are becoming available yet the best commonly available backup method is a double-layer DVD at 9.4G – it seems that the vast majority of home users that make backups are using cheap IDE disks to do so. Fortunately there are some new technology developments that may improve the situation. Call/Recall has developed technology that may permit multiple terabytes of storage on an optical disk. It’s yet to be seen whether their technology lives up to the claims made about it, but we have to hope. The current storage situation is getting unmanagable.

Committing Data to Disk

I’ve just watched the video of Stewart Smith’s LCA talk Eat My Data about writing applications to store data reliably and not lose it. The reason I watched it was not to learn about how to correctly write such programs, but so that I could recommend it to other people.

Recently I have had problems with a system (that I won’t name) which used fwrite() to write data to disk and then used fflush() to commit it! Below is a section from the fflush(3) man page:

NOTES
       Note that fflush() only flushes the user space buffers provided by  the
       C  library.   To  ensure that the data is physically stored on disk the
       kernel buffers must be flushed too, e.g. with sync(2) or fsync(2).

Does no-one read the man pages for library calls that they use?

Then recently I discovered (after losing some data) that both dpkg and rpm do not call fsync() after writing package files to disk. The vast majority of Linux systems use either dpkg or rpm to manage their packages. All those systems are vulnerable to data loss if the power fails, a cluster STONITH event occurs, or any other unexpected reboot happens shortly after a package is installed. This means that you can use the distribution defined interface for installing a package, be told that the package was successfully installed, have a crash or power failure, and then find that only some parts of the package were installed. So far I have agreement from Jeff Johnson that RPM 5 will use fsync(), no agreement from Debian people that this would be a good idea, and I have not yet reported it as a bug in SUSE and Red Hat (I’d rather get it fixed upstream first).

During his talk Stewart says sarcastically “everyone uses the same filesystem because it’s the one true way“. Unfortunately I’m getting this reaction from many people when reporting data consistency issues that arise on XFS. The fact that Ext3 by default will delay writes by up to 5 seconds for performance (which can be changed by a mount option) and that XFS will default to delaying up to 30 seconds means that some race conditions will be more likely to occur on XFS than in the default configuration of Ext3. This doesn’t mean that they won’t occur on Ext3, and certainly doesn’t mean that you can rely on such programs working on Ext3.

Ext3 does however have the data=ordered mount option (which seems to be the default configuration on Debian and on Red Hat systems), this means that meta-data is committed to disk after the data blocks that it referrs to. This means that an operation of writing to a temporary file and then renaming it should give the desired result. Of course it’s bad luck for dpkg and rpm users who use Ext3 but decided to use data=writeback as they get better performance but significantly less reliability.

Also we have to consider the range of filesystems that may be used. Debian supports Linux and HURD kernels as main projects and there are less supported sub-projects for the NetBSD, FreeBSD, and OpenBSD kernels as well as Solaris. Each of these kernels has different implementations of the filesystems that are in common and some have native filesystems that are not supported on Linux at all. It is not reasonable to assume that all of these filesystems have the same caching algorithms as Ext3 or that they are unlike XFS. The RPM tool is mainly known for being used on Red Hat distributions (Fedora and RHEL) and on SuSE – these distributions include support for Ext2/3, ReiserFS, and XFS as root filesystems. RPM is also used on BSD Unix and on other platforms that have different filesystems and different caching algorithms.

One objection that was made to using fsync() was the fact that cheap and nasty hard drives have write-back caches that are volatile (their contents dissappear on power loss). As with such drives reliable operation will be impossible so why not just give up! Pity about the people with good hard drives that don’t do such foolishness, maybe they are expected to lose data as an expression of solidarity with people who buy the cheap and nasty hardware.

Package installation would be expected to be slower if all files are sync’d. One method of mitigating this is to write a large number of files (EG up to a maximum of 900) and then call fsync() on each of them in a loop. After the last file has been written the first file may have been entirely committed to disk, and calling fsync() on one file may result in other files being synchronised too. Another issue is that the only time package installation speed really matters is during an initial OS install. It should not be difficult to provide an option to not call fsync() for use during the OS install (where any error would result in aborting the install anyway).

Update: If you are interested in disk performance then you might want to visit the Benchmark category of my blog, my Bonnie++ storage benchmark and my Postal mail server benchmark.

Update: This is the most popular post I’ve written so far. I would appreciate some comments about what you are interested in so I can write more posts that get such interest. Also please see the Future Posts page for any other general suggestions.

Strange SATA Disk Performance

Below is a GNUPlot graph of ZCAV output from a 250G SATA disk. The disk has something wrong with it (other disks of the same brand in machines of the same configuration give more regular graphs). The expected graph will start high on the Y scale (MB/s) and steadily drop as the reads go to shorter tracks near the spindle. The ZCAV results page has some examples of the expected results.

If you have any ideas as to what might cause this performance result then please make a comment on this blog post.

graph of strange SATA performance

Update: This problem is solved, I’ve written a new post explaining the answer. It’s a pity that an NDA delayed my publication of this information.

Advertising Free Software Projects

Today I just noticed the following advert on one of my web pages:
MINIX3 is a new reliable
free operating system. Smaller
than Linux. Try it It’s free!
www.minix3.org

This made me think about some of the potential ways of advertising free software projects. It seems that in some ways Google Adwords is not the only way that can be used for advertising free software, and in some ways isn’t the most effective.

I believe that the most effective method would be to ask people to advertise the project. I am generally positive towards the aims of the Minix project and am happy to give them some free advertising if asked (I’ve just given them a free advert above without being asked).

While Google adlinks and equivalent things in blog posts are easy to set up, effective advertising may require something more. A series of pictures (in different sizes and color schemes) for the link would help, and the project would ideally have a specific landing site for people who see the advert. Someone who sees an advert targeted to newbies will have different requirements from the web site to someone who already knows about it and typed the URL in from memory! For a big campaign you would probably want to have multiple landing sites for different adverts targetted at different people.

One example of how this can be used is my Bonnie++ page which gets about 6000 hits a month – many of which are from users of proprietary Unix. Linux users often have Bonnie++ as part of their distribution and don’t have much of a need to visit my web site so I expect that even though the vast majority of Bonnie++ users run Linux the proportion of site visitors might not be so strongly in favour of Linux. I would be happy to place an advert on that page to encourage proprietary Unix users to use the most free distribution of Linux if someone was to prepare the advert and give me some HTML code I can easily add to my site.

To some extent the web site for every free software project could be used to advertise some related projects (or projects that are liked by the people who run the site).

If you are a contributor to a free software project that you think I would like then feel free to prepare an advert and send it to me. If it fits with what I’m doing then I’ll give you some free advertising!

Blogs are also a good mechanism for advertising free software projects, but it seems that this is already being used a lot – having said that…

As a gratuitous plug: my favourite distribution of Linux is Debian (here are my Debian blog entries) and I run NSA Security Enhanced Linux (SE Linux) on all my machines (here are my SE Linux blog entries).

Xen and Heartbeat

Xen (a system for running multiple virtual Linux machines) and has some obvious benefits for testing Heartbeat (the clustering system) – the cheapest new machine that is on sale in Australia can be used to simulate a four node cluster. I’m not sure whether there is any production use for a cluster running under Xen (I look forward to seeing some comments or other blog posts about this).

Most cluster operations run on a Xen virtual machine in the same way as they would under physically separate machines, and Xen even supports simulating a SAN or fiber-channel shared storage device if you use the syntax phy:/dev/VG/LV,hdd,w! in the Xen disk configuration line (the exclamation mark means that the volume is writable even if someone else is writing to it).

The one missing feature is the ability to STONITH a failed node. This is quite critical as the design of Heartbeat is that a service on a node which is not communicating will not be started on another node until the failed node comes up after a reboot or the STONITH sub-system states that it has rebooted it or turned it off. This means that the failure of a node implies the permanent failure of all services on it until/unless the node can be STONITH’d.

To solve this problem I have written a quick Xen STONITH module. The first issue is how to communicate between the DomU’s (Xen virtual machines) and the Dom0 (the physical host). It seemed that the best way to do this is to ssh to special accounts on the Dom0 and then use sudo to run a script that calls the Xen xm utility to actually restart the node. That way the Xen virtual machine gets limited access to the Dom0, and the shell script could even be written to allow each VM to only manage a sub-set of the VMs on the host (so you could have multiple virtual clusters on the one physical host and prevent them from messing with each other through accident or malice).

xen ALL=NOPASSWD:/usr/local/sbin/xen-stonith

Above is the relevant section from my /etc/sudoers file. It allows user xen to execute the script /usr/local/sbin/xen-stonith as root to do the work.

One thing to note is that from each of the DomU’s you must be able to ssh from root on the node to the specified account for the Xen STONITH service without using a password and without any unreasonable delay (IE put UseDNS no in /etc/ssh/sshd_config.

The below section (which isn’t in the feed) there are complete scripts for configuring this.

Continue reading Xen and Heartbeat

Who Benefits when Cheap Electricity is used?

In a comment on my previous blog post a question was asked as to who benefits when customers are able to use cheap electricity.

The answer is that the electricity company benefits the most! When electricity sources such as wind and solar power which can vary in capacity are used the electricity would be cheaper when there is adequate or excess supply. The electricity would be cheaper then to encourage customers to use power hungry devices at those times rather than when the electricity supply is reduced. The larger the capacity of back-up power plants such as gas-fired plants the larger the overall cost of the system (having extra peak-load capacity that is unused most of the time is a waste of money).

However the company that I once worked for on a project related to this was not an electricity company. I can’t name the company due to the confidentiality agreement but if the project ever goes into production I’ll blog about it.

Recently some of the major investment banks have been focussing on how climate change affects business. I think that with these developments there will be a lot of new investment in environment related technologies. Probably if a few people started work on an embedded Linux box for scheduling power use they could have a good chance of getting some investment. The ideal feature list would include control and monitoring over the Internet, the ability to schedule operations based on power price (received from the grid via a technology similar to X10) and local conditions (how charged the batteries are in your photo-voltaic system). It would control devices via standard X10 modules (which can control the power to the device) as well as directly interfacing with machines that need to be turned on all the time.

I would be happy to offer more suggestions via private email to anyone who is interested in implementing this.

Backup for Wind Power

A question that people often ask about wind power (and was asked in the comments section of my previous post) is what can be done when the wind speed decreases in an area. There are several methods that can be used to address this problem.

The easiest option is to simply have wind farms spread out over a large area and interconnects that can spread the load. This greatly reduces the problems but is not a total solution.

The next step is to have a series of power plants that can quickly ramp up supply to meet the demand. One good option for this is gas-fired power plants, while they aren’t ideal for the environment they are cheap to build and can react quickly to changing demand. If a gas fired plant is only used when wind speeds are low it should on average be running at a small fraction of it’s peak capacity and use little fuel. Another good option is hydro-electric power which can be turned on quickly, which doesn’t produce any CO2 emissions and is already used widely (about 10% of Australia’s electricity is provided by hydro-electric power).

The ideal solution is to have every user of grid power know when the electricity is cheap (when there is a surplus of wind power) and when it’s expensive (when gas or hydro power is being used). Then non-critical services can be run when electricity is cheap. For example you could put clothes in your washing machine and program it to start the wash when electricity becomes cheap, some time during the day there will be a cheap time and the washing will get done. Once consumers know when electricity is cheap (via X10 or similar technology) they can use that information to determine when to use electricity generated from photo-voltaic cells on their roof and when to use grid power. The same technology can be used for heating and cooling of your home or office, turning off the A/C for an hour or so is only going to be a problem in the middle of summer or winter, for most of the year any heating or cooling could be done with cheap electricity. These technologies are all being developed at the moment (I once briefly worked on a system that could be used as a pre-cursor to managing home electricity use for times of cheap electricity).

Pointers in Bash Shell Scripts

# foo=bar
# name=foo
# echo ${!name}
bar

The above example shows how to make a bash environment variable reference the value of another.

# echo ${!f*}
foo fs_abc_opts fs_name

The above example shows how to get a list of all variables beginning with “f” (NB the wildcard can only go at the end).

A Lack of Understanding of Nuclear Issues

Ben Fowler writes about the issues related to nuclear power in Australia. He spends 8 paragraphs discussing the issues on the “Right” side of politics – of which 6 concern the an Australian nuclear weapons capability and then spends 3 out of 5 paragraphs related to the “Left” side explaining that he thinks that everyone who opposes nuclear power is a Luddite.

Ben didn’t bother providing any links or statistics to back up his claims, so I’ll assist him in analysing these issues by providing some facts that we can discuss.

In March Spain had wind power provide 27% of all electricity (making wind power the main source of power for the country). I blogged about this at the time. While Spain has an ongoing program of building new wind power stations the majority of wind turbines in Spain are quite old (the Model T of wind power) and not nearly as efficient as modern turbines that would be installed for Australian power plants.

The Danish government has announced plans to use wind power for 75% of their electricity. Denmark has a much smaller land area than Australia, which means that generating so much electricity from wind power is more technically challenging for them than it would be for us. A larger land area means that when one area has low wind speeds other areas can be used to provide power.

For home electricity generation wind turbines have not yet been proven to be practical. The linear speed of the blade is determined by the wind speed and the rotational speed is therefore a factor of the wind speed divided by the radius of the blades. This means that smaller turbines have higher rotational speeds which causes more noise (bad for getting council approval), also to avoid turbulence a wind turbine will ideally be some distance above the ground (8 meters is good) which again gives problems when getting approval. The O’Connor Hush Turbine is supposed to solve the noise component of this problem. It will be interesting to see whether home based wind power becomes practical in future – if so I would like to get an O’Connor turbine on my house!

Home solar power has been proven to work well, in the form of both solar-electric and solar hot water (I know several people who have been happily using them for years). You don’t get cold showers when the sun isn’t shining, you instead use gas or electricity to heat the water (it’s a standard feature in a solar hot water system). Also your home electricity doesn’t go off when the sun stops shining, you have batteries that are charged during sunny times to run things overnight, and when they get flat you pay for power from the grid.

It is quite realistic to stick solar power systems on every roof in the country. The added cost to the process of building or purchasing a house is negligible and the benefits include having electricity when mains power is unavailable (NB water is used in generating electricity from coal or nuclear power plants so a bad drought will eventually become a time of limited mains power). Even the smallest home solar electric system will produce enough electricity to power a fridge and freezer 24*7 so it’s a very useful backup for essential power. The government is subsidising the installation of solar electric systems, so it seems that they expect everyone to get one eventually.

Dr. Ziggy Switkowski (the main advocate of nuclear power in Australia) says “the introduction of a carbon tax could make nuclear power the cheapest option by the 2020s”. In consecutive paragraphs Ben derides “carbon trading” and claims that nuclear power is “practical”. Unfortunately the main advocate of nuclear power in Australia does not believe that it is practical without a carbon tax. Ziggy also states that it would take at least 15 years to complete a nuclear power plant, unfortunately we don’t have the luxury of waiting for 15 years before starting to try and reduce the damage that we are doing to the environment. The Stern report makes the economic consequences of inaction quite clear.

I am not a Luddite. I oppose nuclear power because of the risks related to accidental release of radioactive material and the routine release of radioactive material as part of the uranium mining process, and the dangers related to long-term storage of nuclear waste (let’s not assume that Star Trek science can make it all non-radioactive). Nuclear power is not cost effective for Australia and will take so long to develop that it won’t save us from the serious economic damage predicted by the best scientific models as presented in the Stern report.

For large scale power generation wind power works now, can be easily implemented and has no hidden costs or risks. There will never be a Chernobyl type accident with wind power, it is inherently a safe technology. For small scale power generation (something you can add to your home) solar power works well, is not expensive (when considering the price of a house and especially when the government subsidy is counted) and has the potential to seriously reduce the amount of carbon dioxide produced.

Rackspace RHEL4 updates

A default RHEL4 install of a Rackspace (*) server contains a cron.d file named /etc/cron.d/rs_rhncheck that runs a job to check for Red Hat Network updates. In the default configuration this would send out a message every day indicating that up2date did nothing. To only get email when there is something interesting happening I changed the cron.d file to call the following script. The checksum is compared against the checksum of the output of up2date when it does nothing (I guess I could have tested the return code for diff). So when up2date either installs a package or decides not to install a new kernel package (or other important package) then I get an email.

#/bin/sh
/usr/sbin/up2date --nox -u > ~/up2date.out
SUM=`md5sum ~/up2date.out | cut -f1 -d\ `
if [ $SUM != "65c57b05b24bd8f656a0aec0d6db725a" ] ; then
  cat ~/up2date.out
fi

(*) Rackspace support is really good. If I was paying I might look at other options but for my clients I am happy to recommend them without hesitation.