1

Bad Project Management

I have just read a rant by Sean Middleditch about bad project management [1]. He describes his post as “personal, rather angsty, and especially whiny” but I think it’s useful and informative. He makes some interesting technical points about PHP programming (I wasn’t aware that there were so many ways of easily getting things wrong and having difficulty to get them right). But of course this isn’t all limited to PHP, the web site WorseThanFailure.com has anecdotes about mistakes of similar calibre being implemented in every language imaginable.

Sean is apparently considering leaving the computer industry after having numerous bad experiences of having highly paid people mess up projects while he gets paid a lot less to try and fix the worst of the bugs and get the systems working in production. I understand what it’s like, I have occasionally idly contemplated leaving the industry after bad projects. However the fun of working on free software combined with the amounts of money that I can earn in the computer industry made me quickly abandon such ideas.

His stories in some ways resemble my experiences in working as a contractor, most of my contracts have been profoundly weird for various reasons (I’ll use the WTF [2] category of this blog to document some of them). I had two theories as to why I ended up in so many strange contracts, one was that I was in some sort of Twilight Zone and the other was that taking contracts based on the amount of money offered puts you at high risk of being employed by people who have no financial pressure to do things in a sensible manner.

My advice to anyone in such a situation is to try and find a contract position paying an unreasonable amount of money. Getting more than $80 an hour (the rate Sean cites as being paid to the idiots who cause problems) is going to be difficult, but getting $50 or $60 an hour is much easier to achieve and should be enough to alleviate the pain of working on doomed projects.

2

Perfect Code vs Quite Good Code

Some years ago I worked on a project where software reliability should have been a priority (managing data that was sometimes needed by the police, the fire brigade, and the ambulance service). Unfortunately the project had been tainted by a large consulting company that was a subsidiary of an accounting firm (I would never have expected accountants to know anything about programming and several large accounting firms have confirmed my expectations).

I was hired to help port the code from OS/2 1.2 to NT 4.0. The accounting firm had established a standard practice of never calling free() because “you might call free() on memory that was still being used”. This was a terribly bad idea at the best of times and on a 16 bit OS with memory being allocated in 64K chunks the problems were quite obvious to everyone who had any programming experience. The most amusing example of this was a function that allocated some memory and returned a pointer which was being called as if it returned a boolean, one function had a few dozen lines of code similar to if(allocate_some_memory()). I created a second function which called the first, free’d any memory which had been allocated and then returned a boolean.

Another serious problem with that project was the use of copy and paste coding. A section of code would perform a certain task and someone would need it elsewhere. Instead of making it a function and calling it from multiple places the code would be copied. Then one copy would be debugged or have new features added and the other copy wouldn’t. One classic example of this was a section of code that displayed an array of data points where each row would be in a colour that indicated it’s status. However setting a row to red would change the colour of all it’s columns, setting a row to blue would change all except the last, and changing it to green would change all but the second-last. The code in question had been copied and pasted to different sections with the colours hard-coded. Naturally I wrote a function to change the colour of a row and made it take the colour as a parameter, the program worked correctly and was smaller too. The next programmer who worked on that section of code would only need to make one change – instead of changing code in multiple places and maybe missing one.

Another example of the copy/paste coding was comparing time-stamps. Naturally using libc or OS routines for managing time stamps didn’t occur to them so they had a structure with fields for the year, month, day, hours, minutes, and seconds that was different from every other such structure that is in common use and had to write their own code to compare them, for further excitement some comparisons were only on date and some were on date and time. Many of these date comparisons were buggy and often there were two date comparisons in the same function which had different bugs. I created functions for comparing dates and the code suddenly became a lot easier to read, less buggy, and smaller.

I have just read an interesting post by Theodore Ts’o on whether perfect code exists [1]. While I understand both Theodore’s and Bryan’s points of view in this discussion I think that a more relevant issue for most programmers is how to create islands of reasonably good code in the swamp that is a typical software development project.

While it was impossible for any one person to turn around a badly broken software development project such as the one I describe, it is often possible to make some foundation code work well which gives other programmers a place to start when improving the code quality. Having the worst of the memory leaks fixed meant that memory use could be analysed to find other bugs and having good functions for comparing dates made the code more readable and thus programmers could understand what they were looking at. I don’t claim that my code was perfect, even given the limitations of the data structures that I was using there was certainly scope for improvement. But my code was solid, clean, commented, and accepted by all members of the team (so they would continue writing code in the same way). It might even have resulted in saving someone’s life as any system which provides data to the emergency services can potentially kill people if it malfunctions.

Projects based on free software tend not to be as badly run, but there are still some nasty over-grown systems based on free software where no-one seems able to debug them. I believe that the plan of starting with some library code and making it reasonably good (great code may be impossible for many reasons) and then trying to expand the sections of good code is a reasonable approach to many broken systems.

Of course the ideal situation would be to re-write such broken systems from scratch, but as that is often impossible rewriting a section at a time often gives reasonable results.

WTF – Let’s write all the code twice

There is an interesting web site WorseThanFailure.com (with the slogan “Curious Perversions in Information Technology”) that documents amusingly failed projects. The name used to be TheDailyWTF.com but changed due to the idea that for some projects success (interpreted to mean limping along in production) is worse than failure (being scrapped and re-written). I’ve created a new category WTF [1] on my blog to document such projects, both ones that I have personally witnessed and ones that friends and colleagues have seen.

In the 90’s I spent some time working on an OS/2 GUI program to be the front-end for a mainframe backed system used by call-center workers.

The first thing that they did was to develop a file naming scheme, they decided that all source files should be in the same directory (not unreasonable), but that 8.3 should be the limit of file name lengths in case there was a need to port the system to Windows 95. The fact that porting a program which did little other than display a GUI and talk to a S/390 server to a different OS was difficult (given the large amount of platform specific GUI code) was lost on them. Then they decided that the first two letters would be “FE” in case the source was merged with another bank project (of course the bank had many “Front End” systems talking to mainframes – so even this didn’t guarantee that they would be free of name clashes). Characters 3 and 4 were to represent the section of the program, and as the 3 character extension was one of “.h” or “.cpp” that left exactly 4 characters to name the file – we may as well have numbered the source files. Eventually one of the guys started making jokes about the file names by trying to pronounce them as words and that convinced a majority of the developers that the files should have long human-readable names. As the NMAKE expert (they insisted on not using unsupported software such as GNU make) it was my job to fix all the file names.

The bank had an internal users’ group for the C++ development environment, but the contractors were not invited to attend. You might think that it would make sense to have the most skillful C++ programmers attend the meetings and share their knowledge, but apparently attending such meetings was a perk of being a permanent employee.

What do you do when you have a C++ development project running behind schedule and a permanent employee who has had no prior exposure to C++ and wants to learn? One manager believed that assigning the junior programmer to the project is a solution to both problems – it was good for teaching the programmer C++ in a hurry but not so good for the deadline.

There was a lot of excitement related to the back-end development. The mainframe guys assured me that CICS could never handle bad data and it totally wasn’t their fault if my program sent bad data to them and the region crashed. The first CICS region crash occurred when the mainframe guys told me to make the unused space in text fields “empty” – to a C programmer this means filling them with the 0 character but it killed the CICS region (apparently “empty” means filled with spaces to CICS programmers). A later region crash came when they told me to change the field length for the account name everywhere that it occurred – they told me that they wanted the change immediately – so I changed a macro definition, ran “make”, and sent a transaction with the changed size. Apparently “immediately” really meant “after we have spent an hour of software development changing magic numbers throughout our system”. I’m assuming that COBOL has some facility that roughly compares to C macros for defining field lengths and that it was the programmers not the language that was at fault.

When the project became seriously behind schedule they hired some expert programmers from another country to help with the development. The experts decided that we needed to redesign the system (not that it had been designed in the first place) to use a “model view controller” architecture. When the real business logic is all on a S/390 and the only logic in the front end is to combine the contents of some screen-scrapes from a virtual 3270 terminal into a single GUI dialogue (with the first screen occasionally getting data from the IVR system) the benefits of MVC seem rather small. The new MVC version of the system was dubbed “phase 2” and there were a series of deadlines for changing over from “phase 1” which were missed. So the original team of developers continued on their project without any help from the “experts”. The last report I heard (some time after leaving the project) was that Phase 1 had gone live in production while Phase 2 was still being developed (at that time Phase 2 had a year of development and Phase 1 had about 15 months).

The lesson to be learned from this is that management should only hire people that they can control. If you hire a consulting company and then let them do whatever they want (as opposed to solving the problem that they were hired to solve) then at best they will do no good and they may just slow things down.

One of the more memorable incidents was when the project manager was talking to his 2IC, he said “I think we’re really breaking ground, [pause] I’m breaking wind and we’re breaking ground”. My desk was a few meters from where the wind (and ground) were being broken, I unfortunately was able to verify that wind was being broken but was not as certain about the ground.

I did learn some useful things during this project, one was the fact that sample code from library vendors should not be trusted and ideally should be read from a print-out not pasted into the project source tree. The CTI library vendor had buffer overflows in their sample code. Another useful thing was the coin-tossing method of dispute resolution. If two good programmers have a disagreement about how to implement something tossing a coin is a quick way of resolving the dispute – this only works with two good programmers (in which case either idea will work).

The project in question was messed up in many ways. I may write another post in the WTF category about it.