Some years ago I worked on a project where software reliability should have been a priority (managing data that was sometimes needed by the police, the fire brigade, and the ambulance service). Unfortunately the project had been tainted by a large consulting company that was a subsidiary of an accounting firm (I would never have expected accountants to know anything about programming and several large accounting firms have confirmed my expectations).
I was hired to help port the code from OS/2 1.2 to NT 4.0. The accounting firm had established a standard practice of never calling free() because “you might call free() on memory that was still being used”. This was a terribly bad idea at the best of times and on a 16 bit OS with memory being allocated in 64K chunks the problems were quite obvious to everyone who had any programming experience. The most amusing example of this was a function that allocated some memory and returned a pointer which was being called as if it returned a boolean, one function had a few dozen lines of code similar to if(allocate_some_memory()). I created a second function which called the first, free’d any memory which had been allocated and then returned a boolean.
Another serious problem with that project was the use of copy and paste coding. A section of code would perform a certain task and someone would need it elsewhere. Instead of making it a function and calling it from multiple places the code would be copied. Then one copy would be debugged or have new features added and the other copy wouldn’t. One classic example of this was a section of code that displayed an array of data points where each row would be in a colour that indicated it’s status. However setting a row to red would change the colour of all it’s columns, setting a row to blue would change all except the last, and changing it to green would change all but the second-last. The code in question had been copied and pasted to different sections with the colours hard-coded. Naturally I wrote a function to change the colour of a row and made it take the colour as a parameter, the program worked correctly and was smaller too. The next programmer who worked on that section of code would only need to make one change – instead of changing code in multiple places and maybe missing one.
Another example of the copy/paste coding was comparing time-stamps. Naturally using libc or OS routines for managing time stamps didn’t occur to them so they had a structure with fields for the year, month, day, hours, minutes, and seconds that was different from every other such structure that is in common use and had to write their own code to compare them, for further excitement some comparisons were only on date and some were on date and time. Many of these date comparisons were buggy and often there were two date comparisons in the same function which had different bugs. I created functions for comparing dates and the code suddenly became a lot easier to read, less buggy, and smaller.
I have just read an interesting post by Theodore Ts’o on whether perfect code exists [1]. While I understand both Theodore’s and Bryan’s points of view in this discussion I think that a more relevant issue for most programmers is how to create islands of reasonably good code in the swamp that is a typical software development project.
While it was impossible for any one person to turn around a badly broken software development project such as the one I describe, it is often possible to make some foundation code work well which gives other programmers a place to start when improving the code quality. Having the worst of the memory leaks fixed meant that memory use could be analysed to find other bugs and having good functions for comparing dates made the code more readable and thus programmers could understand what they were looking at. I don’t claim that my code was perfect, even given the limitations of the data structures that I was using there was certainly scope for improvement. But my code was solid, clean, commented, and accepted by all members of the team (so they would continue writing code in the same way). It might even have resulted in saving someone’s life as any system which provides data to the emergency services can potentially kill people if it malfunctions.
Projects based on free software tend not to be as badly run, but there are still some nasty over-grown systems based on free software where no-one seems able to debug them. I believe that the plan of starting with some library code and making it reasonably good (great code may be impossible for many reasons) and then trying to expand the sections of good code is a reasonable approach to many broken systems.
Of course the ideal situation would be to re-write such broken systems from scratch, but as that is often impossible rewriting a section at a time often gives reasonable results.
“Projects based on free software tend not to be as badly run, but there are still some nasty over-grown systems based on free software where no-one seems able to debug them. (snip) Of course the ideal situation would be to re-write such broken systems from scratch, but as that is often impossible rewriting a section at a time often gives reasonable results.”
GCC?
Seo: I don’t think that GCC is nearly as badly run as you suggest, both from what I know of the development processes and the end product. GCC generally works and performs tasks that are significantly more difficult than any “Enterprise” software that I have seen or heard of.