When writing a shell script you need to take some care to ensure that it won’t run amok. Extra care is needed for shell scripts that run as root, firstly because of the obvious potential for random destruction, and secondly because of the potential for interaction between accounts that can cause problems.
- One possible first step towards avoiding random destruction is to start your script with “#/bin/sh -e” instead of “#/bin/sh“, this means that the script will exit on an unexpected error, which is generally better than continuing merrily along to destroy vast swathes of data. Of course sometimes you will expect an error, in which case you can use “/usr/local/bin/command-might-fail || true” to make it not abort on a command that might fail.
- #!/bin/sh -e
rm -rf *
cd /tmp/whatever || exit 1
rm -rf *
Instead of using the “-e” switch to the shell you can put “|| exit 1” after a command that really should succeed. For example neither of the above scripts is likely to destroy your system, while the following script is very likely to destroy your system:
rm -rf *
- Also consider using absolute paths. “rm -rf /tmp/whatever/*” is as safe as the above option but also easier to read – avoiding confusion tends to improve the reliability of the system. Relative paths are most useful for humans doing typing, when a program is running there is no real down-side to using long absolute paths.
- Shell scripts that cross account boundaries are a potential cause of problems, for example if a script does “cd /home/user1” instead of “cd ~user1” then if someone in the sysadmin team moves the user’s home directory to /home2/user1 (which is not uncommon when disk space runs low) then things can happen that you don’t expect – and we really don’t want unexpected things happening as root! Most shells don’t support “cd ~$1“, but that doesn’t force you to use “cd /home/$1“, instead you can use some shell code such as the following:
HOME=`grep ^$1 /etc/passwd|head -1|cut -f6 -d:`
if [ "$HOME" = "" ]; then
echo "no home for $1"
I expect that someone can suggest a better way of doing that. My point is not to try and show the best way of solving the problem, merely to show that hard coding assumptions about paths is not necessary. You don’t need to solve a problem in the ideal way, any way that doesn’t have a significant probability of making a server unavailable and denying many people the ability to do their jobs will do. Also consider using different tools, zsh supports commands such as “cd ~$1“.
- When using a command such as find make sure that you appropriately limit the results, in the case of find that means using options such as -xdev, -type, and -maxdepth. If you mistakenly believe that permission mode 666 is appropriate for all files in a directory then it won’t do THAT much harm. But if your find command goes wrong and starts applying such permissions to directories and crosses filesystem boundaries then your users are going to be very unhappy.
- Finally when multiple scripts use the same data consider using a configuration file. If you feel compelled to do something grossly ugly such as writing a dozen expect scripts which use the root password then at least make it an entry in a configuration file so that it can be changed in one place. It seems that every time I get a job working on some systems that other people have maintained there is at least one database, LDAP directory, or Unix root account for which the password can’t be changed because no-one knows how many scripts have it hard-coded. It’s usually the most important server, database, or directory too.
Please note that nothing in this post is theoretical, it’s all from real observations of real systems that have been broken.
Also note that this is not an attempt at making an exhaustive list of ways that people may write horrible scripts, merely enough to demonstrate the general problem and encourage people to think about ways to solve the general problems. But please submit your best examples of how scripts have broken systems as comments.