It’s common to hear a complaint of the form “I get paid to keep computers running not hack an OS” coming from someone who uses an Open Source OS such as Linux, BSD, or Open Solaris.
It seems to me that part of the job of keeping computers running when using Open Source software IS to hack the source and fix bugs. This takes the place of praying, begging, and having your employer pay arbitrary extra amounts of money to the vendor when you have problems with proprietary software.
It’s well understood that a good system-administrator will anticipate problems and implement solutions to them in advance. You don’t wait for a system to run out of disk space and then fix it – you install cron jobs to compress and remove old log files and have a monitoring system to tell you if disk space really runs low.
It seems to me that the approach that many companies take towards fixing software bugs goes against this ideal. They wait for software to fail in production and then file a bug report and hope that someone else will fix it.
When allocating time for various tasks it’s not uncommon to have various amounts of staff time devoted to different servers, departments, or projects. I believe that having a fixed amount of time devoted to finding and fixing bugs in Open Source software (both current versions and pre-release versions) would save money for a company in the long term. If 10% of the time of the most skilled programmers was assigned to finding and fixing bugs in the OS then the overall quality would improve. If a company depends on Debian then it would make sense to have this 10% time include testing out the production programs on Debian/Unstable, it if depends on Red Hat Enterprise Linux then it would make sense to test them out on Rawhide. This would increase the ability of the future releases of Debian or RHEL to support the applications in question, and might also discover some application bugs.
Also it’s very important to submit patches with bug reports. It’s not uncommon for a bug report to be critically important to a user but not overly important to the rest of the world. Such bugs can stay in a bug tracking system for a long time without getting fixed. But if there is a patch submitted that includes necessary documentation patches and a description of the tests that it has passed then it will probably be easier to include it than to debate whether it’s really needed.
If a project is only running for a matter of weeks or months (EG a consulting company that comes in, deploys a “solution” and then leaves) then there is probably no benefit for doing this. But if a company is going to be running servers for many years which will periodically be upgraded then it would be a real benefit to have bugs fixed in future versions.
As long as an administrator follows proper dev/test/UAT/Production release procedures I have no issue with this, it shows diligence, excellence in execution and a proven methodology to minimise risk to whatever server infrastructure they are maintaining. Anything less is just poor judgement.
As we all know writing changes whilst solvingproblem needs to be fully regression tested with each upgrade prior to release.
> It seems to me that the approach that many companies take towards fixing software bugs goes against this ideal. They wait for software to fail in production and then file a bug report and hope that someone else will fix it.
I am part of a team that runs a reasonable number of servers. We try to test all of the functionality in a development area. For a reasonably large subset of the functionality, we simply lack the bandwidth and test cases to fully test the software as we deploy it. Our users are much more ingenious at breaking our configurations than we are. I always try googling for an answer before filing a bug report on open source software (not always on major vendor supported software). I rarely can look at the source code and find the bug quickly, primarily because I lack the basic knowledge of how the code is internally structured and what other problems have been addressed in similar areas. I would love to have the deep knowledge of the software I am using, but what resources I have to learn applications, I am primarily spending on the custom code our in-house developers write.
Maybe I am not your perfect vision of a sysadmin. Even in your example of logs filling up a filesystem, usually it takes something happening to one server (or getting close to happening), to cause us to look at the other servers and implement solutions. With many competing priorities we are primarily reactive. We all want to be proactive, but getting to where we can be is hard.
Warren: The problem of users breaking software that passes all your tests is part of the nature of the computer industry. The number of possible combinations of all variables is far too great to test them all.
No-one is the perfect vision of anything.
But I have observed cases where people complain about issues that they have the technical ability to fix. With Open Source software there is nothing but time preventing them from fixing the issues in question. I suggest that management assign a portion of the time of such people towards fixing such problems.