Some years ago I was working on a project that involved a database cluster of two Sun E6500 servers that were fairly well loaded. I believe that the overall price was several million pounds. It’s the type of expensive system where it would make sense to spend adequately to do things properly in all ways.
The first interesting thing was the data center where it was running. The front door had a uniformed security guard and a sign threatening immediate dismissal for anyone who left the security door open. The back door was wide open for the benefit of the electricians who were working there. Presumably anyone who had wanted to steal some servers could have gone to the back door and asked the electricians for assistance in removing them.
The system was poorly tested. My colleagues thought that with big important servers you shouldn’t risk damage by rebooting them. My opinion has always been that rebooting a cluster should be part of standard testing and that it’s especially important with clusters which have more interesting boot sequences. But I lost the vote and there was no testing of rebooting.
Along the way there were a number of WTFs in that project. One of which was when the web developers decided to force all users to install the latest beta release of Internet Explorer, a decision that was only revoked when the IE install process broke MS-Office on the PC of a senior manager. Another was putting systems with a default Solaris installation live on the Internet with all default services running, there’s never a reason for a database server to be directly accessible over the Internet.
No Backups At All
But I think that the most significant failing was the decision not to make any backups. This wasn’t merely forgetting to make backups, when I raised the issue I received a negative reaction from almost everyone. As an aside I find it particularly annoying when someone implies that I want backups because I am likely to stuff things up.
There are many ways of proving that there’s a general lack of competence in the computer industry. But I think that one of the best is the number of projects where the person who wants backups has their competence questioned instead of all the people who don’t want backups.
A decision to make no backups relies on one of two conditions, either the service has to be entirely unimportant or you need to have no bugs in the OS or hardware defects that can corrupt data, no application bugs, and a team of sysadmins who never make mistakes. The former condition raises the question of why the service is being run and the latter is impossible.
As I’m more persistent than most people I kept raising the issue via email and adding more people to the CC list until I got a positive reaction. Eventually I CC’d someone who responded with “What the fuck” which I consider to be a reasonable response to a huge and expensive project with no backups. However the managers on the CC list regarded the use of profanity in email to be a much more serious problem. To the best of my knowledge there were never any backups of that system but the policy on email was strongly enforced.
This is only a partial list of WTF incidents that assisted in my decision to leave the UK and migrate to the Netherlands.
Not Doing Much
About a year after leaving I returned to London for a holiday and had dinner with a former colleague. When I asked what he was working on he said “Not much“. It turned out that proximity to the nearest manager determined the amount of work that was assigned. As his desk was a long way from the nearest manager he had spent about 6 months getting paid to read Usenet. That wasn’t really a surprise given my observations of the company in question.