Archives

Categories

Creepy Social Networking

Cory Doctorow wrote an interesting article about social networking [1]. One of his points is “Imagine how creepy it would be to wander into a co-worker’s cubicle and discover the wall covered with tiny photos of everyone in the office, ranked by “friend” and “foe,” with the top eight friends elevated to a small shrine decorated with Post-It roses and hearts“, another concerns the issue of forced “friends” where colleagues and casual acquaintances demand to be added to a friends list.

He speculates that the reason for social networking systems to be a fad is that once too many people you don’t really like force themselves into your friends list then you will feel compelled to join a different service.

I believe that the practice of ranking friends is simply a bad idea and wonder whether anyone who has completed high-school has ever used it seriously. If you publicly rank your friends then you will alienate other friends (particularly any who might have ranked you more highly than you ranked them). Everyone who has used social networking systems has discovered the pressure to avoid alienating people that you don’t actually like. It seems obvious that alienating people who you do like is even more of a problem.

In my previous post about Better Social Networking [2] I suggested having multiple lists on your social networking server that are published to different people. That would allow segregating the lists as a way of dealing with some demands to be listed. People who are associated with work (colleagues, managers, and in an example Cory used students at a school where a teacher worked) would be on a work list. The work list would point to the work profiles of other people which would match whatever the standards are for the industry in question (which would still allow quite a range, the standards for sys-admins of ISPs differ significantly from those for primary school teachers). I’m sure that someone who worked for an ISP in Amsterdam would understand that a work-based friend request made to someone who teaches primary school in a part of the world that is religiously conservative would understand why that request would be declined (but the same person might add them to a personal friends list).

Another way of alleviating such problems is to not require that listing be bi-directional. Current social networking systems involve one party making a friend request to another which is listed as pending in the GUI for both parties. The options are to either leave it in that state (which is an annoyance) or reject it (which may cause offence). With a uni-directional listing one party would add the other and hope for the best. If they aren’t obsessive about such things they may not even notice that the other party didn’t reciprocate. Also it would allow for famous people receiving links from many people to their public profile without any expectation of reciprocation. Of course a distributed social networking system such as I suggest would inherently have uni-directional links as there would be no central repository to force them to be all bi-directional.

Links November 2007

The web site www.CheatNeutral.com offers cheaters the possibility of paying single or monogamous people to offset their cheating. It’s an interesting spin on the carbon trading schemes that are on offer.

www.greenmaven.com – a Google search site for Green related information. www.greenerbuildings.com – information on designing buildings to be “Green”.

Binary adding machine using marbles and wood-work [1]. I’ve just been reading Accelerando by Charles Stross [2], in that book he describes the Pentagon using Babbage machines to avoid the potential of electronic surveillance.

Alan Robertson has just started a blog [3]. He is a lead developer in the Linux-HA (Heartbeat) [4] project (which incidentally lists SGI as a friend due to the work that Anibal and I did [5]).

Here is an interesting article about light pollution [6]. It covers the issues of observing the stars, saving energy, and reducing crime through effective lighting.

Pentium-3 vs Pentium-4

I recently was giving away some old P3 and P4 machines and was surprised by the level of interest in P4 machines. As you can see from my page on computer power use [1] the power use from a P4 system is significantly greater than that of a P3. The conventional wisdom is that the P4 takes 1.5 times as many clock cycles to perform an instruction as a P3, the old SPEC CPU2000 results [2] seem to indicate that a 1.5GHz P4 will be about 20% faster than a 1GHz P3, but as the P4 has significantly higher memory bandwidth the benefit may be significantly greater for memory intensive applications.

But generally as a rule of thumb I would not expect a low-end P4 desktop system (EG 1.5GHz) to give much benefit over a high-end P3 desktop system (1GHz for a desktop), and a 2GHz P4 server system probably won’t give any real benefit over a 1.4GHz P3 server system. So in terms of CPU use a P4 doesn’t really offer much.

One significant limitation of many P3 systems (and most name-brand P3 desktop systems) is the fact that the Intel chipsets limited the system to 512M of RAM. This really causes problems when you want to run Xen or similar technologies. I have a few P4 1.5GHz systems that have three PC-133 DIMM sockets allowing up to 768M of RAM (it seems that PC-133 DIMMs only go up to 256M in size – at least the ones that cost less than the value of the machine). Another issue is USB 2.0 which seems to be supported on most of the early P4 systems but none of the P3 systems.

512M of RAM is plenty for light desktop use and small servers, my Thinkpad (my main machine) had only 768M of RAM until very recently and it was only Xen that compelled me to upgrade. The extra power use of a P4 is significant, my 1.5GHz P4 desktop systems use significantly more power than a Celeron 2.4GHz (which is a much faster machine and supports more RAM etc). Low-end P4 systems have little going for them except for 50% more RAM (maybe – depends on how many sockets are on the motherboard) and USB 2.0.

So it seems strange that people want to upgrade from a P3 system to a P4.

Air Filtering for Servers

Serious server rooms have large (and expensive) air-conditioning and filtering systems. Most “server rooms” however are not like that, often it’s just some space in a store-room, sometimes near printers (which are a source of air pollution [1]).

The servers that are stored in serious server rooms have air filters as a standard feature. For small server installations it’s often a desktop PC used as a server which has no filters (and often lacking other server features such as ECC RAM). Recently Dell in Australia started selling low-end PowerEdge servers for $450 plus delivery (similar machines to the $800 Dell servers I previously blogged about [2]). Also refurbished machines from HP and IBM can often be purchased at auction for similar prices.

Even if you have a proper server with filters on all air inlets it’s still a benefit to have reasonably clean air in the server area. A few years ago I bought four Sunbeam HEPA [3] air filters for my home to alleviate allergy problems. I’ve had one running 24*7 in my main computer area for most of the time since then. As well as wanting to keep my machines free of dust I have the extra issue of machines that I buy at auction which often are filled with dust – the process of cleaning them frees some dust in the air and it’s good to have a filter running to remove it.

Excessive dust in the air can prevent cooling fans from operating and cause damage to hardware and loss of data. Of course the really good servers have fan speed sensors that allow the CPU to be throttled or the machine to be halted in case of severe problems. But for desktop machines you often only have the temperature control mechanisms that are built in to the CPU and sometimes machines just start having memory errors when the fan malfunctions and the machine gets hot.

As one of the biggest problems facing server rooms is heat dissipation I decided to measure my air filters and see how much electricity they use (it seems reasonably to assume that all their energy eventually gets converted to heat). I’ve now got a blog page about power use of items related to computers [4]. My air filters take 114W when on the highest speed and 13.9W when on the lowest. Initially I was a little surprised at the figure for the high speed, but then I recalled that the energy required to move air is proportional to the speed cubed. I’ll just have to make sure I don’t leave an air filter on overnight in summer…

I’m now going to recommend such filters to some of my clients. Spending $400 on an air filter is nothing compared to the amount of money that a server failure costs (when you have expensive down-time and pay people like me to fix it).

Election 2007

I am a member of the Greens. The main reason for joining them is that they have principles. The Greens Charter [1] guides everything, policy must comply with the charter and candidates agree to uphold the policies which have been ratified if they get elected. There are no “non-core promises“.

The policies of the Greens are all positive. The major parties have some mean-spirited policies that aim to help Australians by making things worse for people in other countries. Unfortunately for them the world is very inter-connected at the moment, so it’s difficult to harm other countries without harming yourself in the process.

As you might expect the Greens are very positive towards the environment. Anyone who expects to live for more than 30 years (or who has children) should be very concerned about this. Currently even the most cautious estimates of the scope of the climate change problem by reputable scientists suggest that there will be serious problems in the next few decades. If you are young or have children then you should vote for the Greens (that covers most people).

Many religious groups have determined that God wants them to help the environment, for example the Australian Anglican Church General Synod 2007 resolved that “the Anglican Communion’s 5th mark of mission to safeguard the integrity of creation and sustain and renew the life of the earth; and recognises that human activity contributing to Climate change is one of the most pressing ethical issues of our time”. Anglicans and believers in other religions that have similar ideas should vote for the Greens.

The Greens have policies that are very positive towards minority groups. If you are not a straight-white-Christian then this is a strong reason for voting for the Greens. If you are a straight-white-Christian but have compassion for others (as advocated in the Bible) then again voting for the Greens is the right thing to do.

The Greens policies are humane towards people in unfortunate situations, including drug addicts, the mentally ill, and the unemployed. When considering who to vote for keep in mind that at some future time you or a close friend or relative may fall into one of those categories.

Finally the Howard government’s industrial relations legislation is very bad for the majority of workers. If the Greens get the balance of power in the Senate then they will be able to get such laws significantly changed or removed.

The Greens election web site is here [2].

As an aside, I never liked Paul Keating (our last Prime Minister) until I read his op-ed piece in The Age about John Howard [3].

Also here is an interesting article on preferences and why the major parties want people to misunderstand the Australian electoral system [4]. In summary if you vote for the Greens as your first preference and they don’t win, then your second preference gets counted, and if that party doesn’t win then the third preference counts, etc. So if you have the two major parties in last and second-last position then your second-last preference may make an impact on the result! Putting Greens in the #1 position does not “waste” your vote. Parties get government funding based on the number of #1 votes, vote for the Greens as your first preference and you effectively give them $2 for their next campaign. Finally when a party wins an election they immediately look at where the #1 votes went and often adjust their policies to please the people. Vote #1 for the Greens and if Labour wins they will have more of an incentive to adopt policies that are similar to those of the Greens.

Drugs and an Election

As mentioned in my previous post [1] the government is using our money to advertise its policies. I previously covered the “Internet as a threat to children” issue, the other big one is drugs.

The first significant message in the “Talking with your kids about drugs” document concerns the criminal penalties for drug use. That could largely be removed if the penalties were dramatically decreased and if hard drugs were administered to registered addicts for a nominal fee. Forcing addicts to commit crimes or work as prostitutes to pay for their addiction just doesn’t work, economically or socially.

The next message is that parents should be involved in their children’s lives and be trusted enough that their children will accept advice. A large part of this would be related to the amount of time spent with children which is determined to a large degree by the amount of time not spent working. As I mentioned in my previous post it seems that the actions of the Howard government over the last 10 years has made things significantly worse in this regard by forcing more parents to spend large amounts of their time working (including situations where both parents work full-time) just to pay for a home.

One notable omission from the document was any mention of alcohol problems (apart from when mixed with illegal drugs). By many metrics alcohol causes more harm than all the illegal drugs combined. The US attempted to prohibit alcohol and failed dismally, and alcohol is legal in most countries. The government doesn’t want to adopt a harm-minimisation approach to drugs (which bears some similarities to the approach taken to alcohol) and therefore is stuck trying to claim that one somewhat addictive mind-altering substance is somehow inherently different to all other mind-altering substances.

The document provided no information at all on harm minimisation. The idea seems to be that it’s better for some children to die of overdoses than for other children to potentially be less scared of the consequences of drug use. The problem is that children are very bad at assessing personal risk so they simply won’t be scared off. It’s best to provide equipment for testing drug purity etc to reduce the harm.

The list of reasons for young people to try drugs starts with “availability and acceptability of the drug“, Parmesan cheese is more available and acceptable than any drug but I’ve been avoiding it at every opportunity since I was very young. :-#

More serious reasons in the list include “rebellion“, “depression“, “as a way to relax or cope with stress, boredom or pain“, and “to feel OK, at least temporarily (self-medication)“. But predictably there was nothing in the document about why children might consider that their life sucks to badly that they want to rebel in that way, be depressed, stressed, bored, or in pain to a degree that drugs seem like the solution. It seems unreasonable to believe that the school system isn’t a significant part of the cause of teenage problems. Flag poles for flying the Australian Flag [2] seems to be John Howard’s idea of a solution to school problems. I believe that the solution to school problems (both in terms of education and protection of children) is smaller classes, better supervision, an active program to eradicate bullying, and better salaries for teachers. I’ve written about related issues in my school category. I believe that the best thing for individual families to do is to home-school their children from the age of 12 instead of sending them to a high-school (which is much more damaging than a primary school).

Another significant issue is drug pushers. They are usually drug users who try to finance their own drug use by selling to others. As most drug users get the same idea there is a lot of competition for sales and therefore they try to encourage more people to use drugs to get more customers. In the Netherlands, Coffee Shops sell a range of the milder drugs at reasonable prices which removes the street market and therefore the pushers. When drugs are produced commercially the quality is standardised which dramatically reduces the risk of infection and overdoses. When drugs are sold by legal companies they are not sold to children (a typical pusher can be expected to sell to anyone).

The document has 3.5 pages of information about some of the illegal drugs (out of 23 pages total). They include all the most severe symptoms that most users don’t encounter. If someone reads that information and then talks to a drug user they will be laughed at. They also group drugs in an unreasonable way, for example listing “marijuana” and “hashish” as synonyms and list problems that are caused indirectly. If alcohol was included in that list it would include beer and methylated spirits (ethanol mixed with methanol to supposedly make it undrinkable – rumoured to be consumed by homeless people) in the same section and imply that blindness and other bad results of drinking “metho” apply to beer. It would also say that alcohol kills many people through drunk-driving and is responsible for rapes, wife-beating, and any other crime which can statistically shown to be more likely to be committed by a drunk person.

A final issue with the document is that it advises parents to be “informed, up-front and honest” when talking to their children about drugs. Unfortunately the document itself doesn’t demonstrate such qualities. By omitting all information on harm minimisation it is not being up-front at all, and by not mentioning alcohol it’s simply being dishonest. Anyone who tries to convince someone else to not use illegal drugs can expect to hear “but you drink alcohol” as the immediate response from the drug user, it’s necessary to have sensible answers to that issue if you want to be effective in discouraging drug use.

Internet and an Election

Before the election was called the Howard government (being unethical in every way) started using public money to campaign. Part of this election campaign was two documents sent out to every home (AFAIK) coupled with a media campaign, one was about children and drugs, the other was about children and the Internet. I have to wonder how many people were fooled by this, including a picture of John Howard on the first page after the index made it pretty clear what the purpose was.

Before I analyse the content of the documents one thing to keep in mind is the cost of the Iraq war, which the Sydney Morning Herald [1] estimates at $3,000,000,000. Note that this doesn’t include the life-long medical and psychiatric care that some veterans will require or the cost to the economy of such people not being productive tax-payers. It does note that 121 employees of the Australian military have been discharged on medical grounds. If each of those 121 people was disabled in a way that significantly impacted their life then the cost could be as much as $242,000,000. Let’s not assume that everyone who is currently serving and regarded as completely fit will be fine in 10 years time either.

The Prime Minister gives a personal introduction to each document, the “NetAlert” document states that $189,000,000 is being spent to protect children on the Internet. The “Talking with your kids about drugs” document states that “more than $1.4 billion” has been spent to combat illicit drug use, I’m not sure what time period that covers – it seems to be his entire 10 years as PM so that would be $140,000,000 per annum.

Obviously declaring a war of aggression against Saddam Hussein is something that he considers to be more important than protecting Australian children.

Now the content of the NetAlert document. It starts with $84,800,000 to provide access to “the best available internet filtering technology“. I have to wonder, how much would it cost to develop some new Internet filtering software from scratch, or what the opportunity cost might be to re-assign resources from organisations such as CSIRO to the task. I expect that in either case the total would be a lot less than $84,800,000. Developing free software for the task would be a much better use of resources and would also allow cooperation with similar programs in other countries. In the unlikely event that $40,000,000 was required to develop such free software it probably wouldn’t be difficult to get a few other governments to kick in some money and make it $10,000,000 per country. Of course saving $70,000,000 doesn’t sound like much (a mere $3 per person in the country), but it is significantly more than the Free Trade Agreement with the US [2] was supposed to bring in (from memory it was promoted as improving the economic position of the country by $50,000,000 but the best available information is that it made our balance of trade worse).

$43,500,000 in additional funding for the police to combat online child sex exploitation is small but OK (it would be better if some of that $84,800,000 had been saved on filtering software and allowed greater expenditure in this area). But it should be noted how child rape in Aboriginal communities is used as a political issue and nothing constructive is done about the problem [3]. I believe that protecting children against rape is far more important than controlling illegal porn (which is where the majority of that $43,500,000 will go).

Another possible avenue of research would be on more secure computer systems for Australian homes. It’s all very well to install some filtering software, but if the PC running the software in question is hacked then it’s immediately turned off. Attacks that involve turning web cameras on without the knowledge or consent of the owner are not uncommon either, it’s not unreasonable for a child to get out of bed and then do some homework on their computer before getting dressed – with an insecure computer and a web-cam connected this could be a bad thing… If the Australian government was to mandate that all government data be stored in open formats and that all communication with government computer systems can be done with free software (as a matter of principle you should not have to buy expensive software to communicate with your government) then the incidence of insecure desktop OSs would decrease. The document does recommend some basic OS security mechanisms, but if you use an OS that is insecure by design it’s not going to work.

The core message of NetAlert is that children should be supervised when using the net. There is one significant problem with this, full supervision of children requires that at least one parent is not working full-time. In a previous post I compared median house prices and median incomes [4] with the conclusion that even with both parents working it’s difficult to afford a house. It would be difficult for a single income that is below the median to be used to pay the rent on a decent house. So it seems that a significant portion (maybe the vast majority) of Australian families will be forced to do what the government would consider to be a compromise of the Internet safety of their children by having both parents working full-time jobs to pay off a large mortgage (or in some cases merely to pay rent). I believe that making houses more affordable would improve the protection of children and provide other significant benefits for society. If you look at chart 1 and chart 2 on the House Prices research note from the Australian Parliamentary Library [5] you can see a significant increase in the difference between CPI increases and house prices increases since 1996 (when John Howard became Prime Minister) and a significant increase in the ratio of house prices to average income. The charts only include data up to 2006 so don’t show the effects of the Howard government’s “workplace reform” which lowers wages for many people who are below the median income. John Howard is responsible for houses being less affordable and parents being forced to spend more time working and less time supervising their children, I find it offensive that he now tells people to spend more time supervising their children.

The document does have some good basic advice about how to supervise children and what level of supervision is necessary. The problem is that children from an early age know more than the typical parents. I doubt the ability of typical Australian parents to implement some of the supervision suggestions on 12yo children. I have been considering writing my own documents on how to filter and monitor child Internet access but the main problem is determining ways of doing so that can be implemented by someone who is not a computer expert.

In summary, the Howard government is responsible for reducing the supervision of children and therefore exacerbating the problems that they are claiming to solve. Their NetAlert program provides some information that can’t be implemented by the majority of Australians and entirely skips some of the most sensible steps such as using only secure OSs on machines that will be used on the Internet. I believe that Fedora and CentOS should be recommended by the government in this regard because of the combination of good security by default (of which SE Linux is only one component) and the ease of use. Computers are cheap enough that most families with good Internet connections (the ones at risk in this regard) have multiple computers so no extra cost would be incurred.

BoingBoing and Licenses

Today I was thrilled to see that Cory Doctorow (who among other things wrote one of my favourite Sci-fi novels [1]) copied one of my blog posts on to BoingBoing.net [2].

Then I reviewed the licence conditions (which had previously been contained in the About Page and is now a post on my documents blog [3]) and discovered that I had not permitted such use!

In the second part of this post (not included in the RSS feed) I have the old and new license conditions for my blog content. My plan is that my document blog [4] will have the current version of such documents while this blog will have every iteration along the way.

The new version of my license explicitly permits BoingBoing to do what they want with my content. I don’t have any objection to what Cory did, and I would have been rather unhappy if he had sent me an email saying “I wanted to feature your post on BoingBoing but sorry you miss out because of your license”. But his procedure does not work well.

Now I am wondering, how do I construct a license agreement that permits my content to be used by big popular sites that give my blog new readers and my ideas a wider audience while denying the content to sploggers who just want to use my patterns of words for google hits? How do I permit my content to be used by people who contribute as much to the community as Cory but deny it to talentless people who want to exploit my work while contributing nothing to the world? How can I ensure that people who want to reference my work can learn about the licence conditions (the About Page apparently doesn’t work)? These are serious questions and I invite suggestions as to how to solve them.

The fact that I have forgiven Cory for not abiding by my license and granted him permission to do the same thing again whenever he wishes is not the ideal solution. For authors to find people who copy their work and respond with forgiveness or DMCA take-down notices according to who does the copying and the reason for it is a losing game and a distraction from the work of creating the useful content.

I understand the BoingBoing situation, they deliver summaries and copies of blog posts rapidly and frequently. Discovering conditions of use and asking for clarification from the authors (which may take days or weeks) would really affect the process. Also anyone who reads my blog would probably realise that I want to have such posts copied on sites such as BoingBoing.

Continue reading BoingBoing and Licenses

RAID and Bus Bandwidth

As correctly pointed out by cmot [1] my previous post about software RAID [2] made no mention of bus bandwidth.

I have measured the bus bottlenecks of a couple of desktop machines running IDE disks with my ZCAV [3] benchmark (part of the Bonnie++ suite). The results show that two typical desktop machines had significant bottlenecks when running two disks for contiguous read operations [4]. Here is one of the graphs which shows that when two disks were active (on different IDE cables) the aggregate throughput was just under 80MB/s on a P4 1.5GHz while the disks were capable of delivering up to 120MB/s:
ZCAV result for two 300G disks on a P4 1.5GHz

On a system such as the above P4 using software RAID will give a performance hit when compared to a hardware RAID device which is capable of driving both disks at full speed. I did not benchmark the relative speeds of read and write operations (writing is often slightly slower), but if for the sake of discussion we assume that read and write give the same performance then software RAID would only give 2/3 the performance of a theoretical perfect hardware RAID-1 implementation for large contiguous writes.

On a RAID-5 array the bandwidth for large contiguous writes is the data size multiplied by N/(N-1) (where N is the number of disks), and on a RAID-6 array it is N/(N-2). For the case of a four disk RAID-6 array that would give the same overhead as writing to a RAID-1 and for the case of a minimal RAID-5 array it would be 50% more writes. So from the perspective of “I need X bandwidth, can my hardware deliver it” if I needed 40MB/s of bandwidth for contiguous writes then a 3 disk RAID-5 might work but a RAID-1 definitely would hit a bottleneck.

Given that large contiguous writes to a RAID-1 is a corner case and that minimal sized RAID-5 and RAID-6 arrays are rare in most cases there should not be a significant overhead. As the number of seeks increases the actual amount of data transferred gets quite small. A few years ago I was running some mail servers which had a very intense IO load, four U320 SCSI disks in a hardware RAID-5 array was a system bottleneck – yet the IO was only 600KB/s of reads and 3MB/s of writes. In that case seeks were the bottleneck and write-back caching (which is another problem area for Linux software RAID) was necessary for good performance.

For the example of my P4 system, it is quite obvious that with a four disk software RAID array consisting of disks that are reasonably new (anything slightly newer than the machine) there would be some bottlenecks.

Another problem with Linux software RAID is that traditionally it has had to check the consistency of the entire RAID array in the case of an unexpected power failure. Such checks are the best way to get all disks in a RAID array fully utilised (Linux software RAID does not support reading from all disks in a mirror and checking that they are consistent for regular reads), so of course the issue of a bus bottleneck becomes an issue.

Of course the solution to these problems is to use a server for server tasks and then you will not run out of bus bandwidth so easily. In the days before PCI-X and PCIe there were people running Linux software RAID-0 across multiple 3ware hardware controllers to get better bandwidth. A good server will have multiple PCI buses so getting an aggregate throughput greater than PCI bus bandwidth is possible. Reports of 400MB/s transfer rates using two 64bit PCI buses (each limited to ~266MB/s) were not uncommon. Of course then you run into the same problem, but instead of being limited to the performance of IDE controllers on the motherboard in a desktop system (as in my test machine) you would be limited to the number of PCI buses and the speed of each bus.

If you were to install enough disks to even come close to the performance limits of PCIe then I expect that you would find that the CPU utilisation for the XOR operations is something that you want to off-load. But then on such a system you would probably want the other benefits of hardware RAID (dynamic growth, having one RAID that has a number of LUNs exported to different machines, redundant RAID controllers in the same RAID box, etc).

I think that probably 12 disks is about the practical limit of Linux software RAID due to these issues and the RAID check speed. But it should be noted that the vast majority of RAID installations have significantly less than 12 disks.

One thing that cmot mentioned was a RAID controller that runs on the system bus and takes data from other devices on that bus. Does anyone know of such a device?

Perfect Code vs Quite Good Code

Some years ago I worked on a project where software reliability should have been a priority (managing data that was sometimes needed by the police, the fire brigade, and the ambulance service). Unfortunately the project had been tainted by a large consulting company that was a subsidiary of an accounting firm (I would never have expected accountants to know anything about programming and several large accounting firms have confirmed my expectations).

I was hired to help port the code from OS/2 1.2 to NT 4.0. The accounting firm had established a standard practice of never calling free() because “you might call free() on memory that was still being used”. This was a terribly bad idea at the best of times and on a 16 bit OS with memory being allocated in 64K chunks the problems were quite obvious to everyone who had any programming experience. The most amusing example of this was a function that allocated some memory and returned a pointer which was being called as if it returned a boolean, one function had a few dozen lines of code similar to if(allocate_some_memory()). I created a second function which called the first, free’d any memory which had been allocated and then returned a boolean.

Another serious problem with that project was the use of copy and paste coding. A section of code would perform a certain task and someone would need it elsewhere. Instead of making it a function and calling it from multiple places the code would be copied. Then one copy would be debugged or have new features added and the other copy wouldn’t. One classic example of this was a section of code that displayed an array of data points where each row would be in a colour that indicated it’s status. However setting a row to red would change the colour of all it’s columns, setting a row to blue would change all except the last, and changing it to green would change all but the second-last. The code in question had been copied and pasted to different sections with the colours hard-coded. Naturally I wrote a function to change the colour of a row and made it take the colour as a parameter, the program worked correctly and was smaller too. The next programmer who worked on that section of code would only need to make one change – instead of changing code in multiple places and maybe missing one.

Another example of the copy/paste coding was comparing time-stamps. Naturally using libc or OS routines for managing time stamps didn’t occur to them so they had a structure with fields for the year, month, day, hours, minutes, and seconds that was different from every other such structure that is in common use and had to write their own code to compare them, for further excitement some comparisons were only on date and some were on date and time. Many of these date comparisons were buggy and often there were two date comparisons in the same function which had different bugs. I created functions for comparing dates and the code suddenly became a lot easier to read, less buggy, and smaller.

I have just read an interesting post by Theodore Ts’o on whether perfect code exists [1]. While I understand both Theodore’s and Bryan’s points of view in this discussion I think that a more relevant issue for most programmers is how to create islands of reasonably good code in the swamp that is a typical software development project.

While it was impossible for any one person to turn around a badly broken software development project such as the one I describe, it is often possible to make some foundation code work well which gives other programmers a place to start when improving the code quality. Having the worst of the memory leaks fixed meant that memory use could be analysed to find other bugs and having good functions for comparing dates made the code more readable and thus programmers could understand what they were looking at. I don’t claim that my code was perfect, even given the limitations of the data structures that I was using there was certainly scope for improvement. But my code was solid, clean, commented, and accepted by all members of the team (so they would continue writing code in the same way). It might even have resulted in saving someone’s life as any system which provides data to the emergency services can potentially kill people if it malfunctions.

Projects based on free software tend not to be as badly run, but there are still some nasty over-grown systems based on free software where no-one seems able to debug them. I believe that the plan of starting with some library code and making it reasonably good (great code may be impossible for many reasons) and then trying to expand the sections of good code is a reasonable approach to many broken systems.

Of course the ideal situation would be to re-write such broken systems from scratch, but as that is often impossible rewriting a section at a time often gives reasonable results.