Some Tips for Shell Code that Won’t Destroy Your OS

November 20, 2009| etbe| 18 Comments| 19:59

Categories :

unix-tips

When writing a shell script you need to take some care to ensure that it won’t run amok. Extra care is needed for shell scripts that run as root, firstly because of the obvious potential for random destruction, and secondly because of the potential for interaction between accounts that can cause problems.

One possible first step towards avoiding random destruction is to start your script with “#/bin/sh -e” instead of “#/bin/sh“, this means that the script will exit on an unexpected error, which is generally better than continuing merrily along to destroy vast swathes of data. Of course sometimes you will expect an error, in which case you can use “/usr/local/bin/command-might-fail || true” to make it not abort on a command that might fail.
#!/bin/sh -e
cd /tmp/whatever
rm -rf *
#!/bin/sh
cd /tmp/whatever || exit 1
rm -rf *
Instead of using the “-e” switch to the shell you can put “|| exit 1” after a command that really should succeed. For example neither of the above scripts is likely to destroy your system, while the following script is very likely to destroy your system:
#!/bin/sh
cd /tmp/whatever
rm -rf *
Also consider using absolute paths. “rm -rf /tmp/whatever/*” is as safe as the above option but also easier to read – avoiding confusion tends to improve the reliability of the system. Relative paths are most useful for humans doing typing, when a program is running there is no real down-side to using long absolute paths.
Shell scripts that cross account boundaries are a potential cause of problems, for example if a script does “cd /home/user1” instead of “cd ~user1” then if someone in the sysadmin team moves the user’s home directory to /home2/user1 (which is not uncommon when disk space runs low) then things can happen that you don’t expect – and we really don’t want unexpected things happening as root! Most shells don’t support “cd ~$1“, but that doesn’t force you to use “cd /home/$1“, instead you can use some shell code such as the following:
#!/bin/sh
HOME=`grep ^$1 /etc/passwd|head -1|cut -f6 -d:`
if [ "$HOME" = "" ]; then
echo "no home for $1"
exit 1
fi
cd ~

I expect that someone can suggest a better way of doing that. My point is not to try and show the best way of solving the problem, merely to show that hard coding assumptions about paths is not necessary. You don’t need to solve a problem in the ideal way, any way that doesn’t have a significant probability of making a server unavailable and denying many people the ability to do their jobs will do. Also consider using different tools, zsh supports commands such as “cd ~$1“.
When using a command such as find make sure that you appropriately limit the results, in the case of find that means using options such as -xdev, -type, and -maxdepth. If you mistakenly believe that permission mode 666 is appropriate for all files in a directory then it won’t do THAT much harm. But if your find command goes wrong and starts applying such permissions to directories and crosses filesystem boundaries then your users are going to be very unhappy.
Finally when multiple scripts use the same data consider using a configuration file. If you feel compelled to do something grossly ugly such as writing a dozen expect scripts which use the root password then at least make it an entry in a configuration file so that it can be changed in one place. It seems that every time I get a job working on some systems that other people have maintained there is at least one database, LDAP directory, or Unix root account for which the password can’t be changed because no-one knows how many scripts have it hard-coded. It’s usually the most important server, database, or directory too.

Please note that nothing in this post is theoretical, it’s all from real observations of real systems that have been broken.

Also note that this is not an attempt at making an exhaustive list of ways that people may write horrible scripts, merely enough to demonstrate the general problem and encourage people to think about ways to solve the general problems. But please submit your best examples of how scripts have broken systems as comments.

18 thoughts on “Some Tips for Shell Code that Won’t Destroy Your OS”

Michael Goetze says:

November 20, 2009 at 20:56

grep ^$1 /etc/passwd

should surely be

/usr/bin/getent passwd $1

After all, another sysadmin might come along and change the PAM configuration of the system to use, say, LDAP.
etbe says:

November 20, 2009 at 22:39

Michael: Thanks for that. Now if only the getent tool (or something similar) would allow us to extract the home directory on a single line. It seems that there will be a huge number of shell scripts that need to know the home directories of users. While piping it through cut is not overly difficult it does add an extra possibility for things to go wrong.
Jeff Schroeder says:

November 20, 2009 at 23:46

As usual, awk to the rescue!

#!/bin/sh
test -z “$1” && exit 1
#user:pass:uid:gid:gecos:homedir:shell
awk -F: -vuser=$1 ‘{if ($6 == “”) print user ” has no home directory” }’ /etc/passwd

And to properly work in environments that use directory services like nis/ldap/kerberos/etc

#!/bin/sh
test -z “$1” && exit 1
getent passwd $1 | awk -F: -vuser=$1 ‘{if ($6 == “”) print user ” has no home directory” }’

And finally:
#!/bin/sh
test -z “$1” && exit 1
getent passwd $1 | awk -F: ‘{if ($6 == “”) print $1 ” has no home directory” }’

awk is magic
Peter Eisentraut says:

November 21, 2009 at 00:07

It’s possibly better to use

#!/bin/sh

set -e

so that running the script directly like

bash myscript.sh

won’t override the -e setting.
etbe says:

November 21, 2009 at 11:16

http://blog.andrew.net.au/2009/11/20#dash_e_shebang_bad

Andrew Pollock also makes the point about set -e but with another example of how it can fail. Andrew also mentions exit handlers.

Exit handlers are good, but I really don’t expect anyone who puts “cd /tmp/whatever ; rm -rf *” in a shell script to be able to do that in the near future.

I think it’s best to concentrate on refraining from destroying servers as a first priority. Writing the quality of shell code that Andrew advocates is a good thing to do later on.

One of the advantages of blogging about such things is learning from experts such as those of you who have commented and Andrew.
etbe says:

November 21, 2009 at 11:32

http://jmtd.net/log/sh/

Jon Dowland suggests not using shell scripts at all. That sounds nice, but there are many people who can’t/won’t learn a scripting language. So really we are stuck with shell scripts so let’s try and not do it too badly.

Then of course there are short scripts which are not particularly demanding. The vast majority of my shell scripts are less than 10 lines long including comments. For such scripts using Perl probably wouldn’t provide much of a benefit.
Jeff Schroeder says:

November 21, 2009 at 12:18

@etbe: Jow Dowland is a bit confused if he thinks shell scripts are the wrong tool for systems administration. I’d be willing to bet he hasn’t done much hard core sysadmin work ever. The shell is the least common denominator on every ‘nix including OS X. From HP-UX to Linux to ultra embedded busybox distros on your nas without perl or python it works on them all. If you can program bourne shell efficiently, you can manage posix hosts.

“””
I reckon nine times out of ten it’s the right decision. But when you just have to use a shellscript, use set -e (as Andrew Pollock points out, this is safer than putting it on the hashbang line) and set -u too.
“””

Just because he doesn’t like or know it particularly well doesn’t mean that using awk/sed/bash to solve your problem is wrong. Look at the speed comparisons between awk and perl to search text. If you can do it in awk, it is much faster than perl with a fraction of the memory footprint. His statements are amusing at best.
etbe says:

November 21, 2009 at 13:09

Jeff: You make some good technical points.

This discussion (both here and in Planet Debian and Planet Linux Australia) hasn’t taken the course that I had hoped for.

While I agree that there is scope for a lot of discussion about finding the best ways of solving such problems, my focus here is on avoiding the worst ways of doing things. In particular bad ways that involve people phoning me early in the morning because their server isn’t working. ;)

Let’s try not to assume that someone lacks knowledge because they like to do things a different way.
Alexander E. Patrakov says:

November 23, 2009 at 02:10

The advice about “set -e” or “|| exit 1” is valid only if you don’t do any parallel processing in your shell scripts. With parallel processing, it leads to undesired consequences.

Consider the following example, where someone wants to run long-running-program in parallel with other two commands and then merge the results:

long-running-program &
lpid=$!
command-that-can-fail
another-command
wait $lpid
combine-output

If the command-that-can-fail fails and causes the script to exit, you end up with the long-running-program still in the background, which may be undesirable (e.g., you can’t just fix the problem and restart the script).

And, by “no parallel processing at all”, I also mean pipelines. E.g., if you don’t want to run “publish” if file.xml is invalid, and want to save reports about invalid XML just in case, the following works:

xmllint –noout –valid file.xml >report
publish file.xml

Then, suppose that reports sometimes become too long, and you want to take only the first few lines. This doesn’t work (i.e., will publish invalid XML):

xmllint –noout –valid file.xml | head >report
publish file.xml

Of course, bash has a good solution for both problems (by handling ERR or setting the pipefail option).
Jon says:

November 23, 2009 at 23:51

Etbe, I originally tried to post my blog post as a comment here (via a phone) but had some openid problems so ended up blogging it when I got in :)

Jeff, I’ve been working as a full-time sysadmin for 5 years and administer 200+ UNIX systems, should experience count for something. I assure you I am well versed in awk/cut/sed/grep etc.

You are placing far too much emphasis on speed of execution. This is of course important in some contexts (say embedded) but if the differential between awk/sed etc. and perl is big enough to be important (and I wouldn’t be confident even measuring the difference accurately on a modern system) you would almost certainly need to be working in C or similar anyway.

In terms of least-common denominator, perl is universally available and crucially much more *consistent* across platforms. Even POSIX sh is not a low enough bar for cross platform shell scripts (solaris doesn’t support $(foo) for example)

Alexander, good point about parallelism, which is increasingly important in modern systems. I’ve attempted parallel subshells / clever use of wait / juggling file descriptors and I really think this is a classic example of a problem space which shell is terrible for.
Jeff Schroeder says:

November 24, 2009 at 01:29

@Jon: I apologize for being incorrect about your experience. My team manages a few short of 2k servers but numbers don’t matter much. If you can sucessfully manage 200 systems well, you can manage 5000. posix is posix is posix.

However, I still respectfully disagree. The shell is the lowest common denominator in ‘nix environments from supercomputers to embedded distros like emdebian (with no perl). It is all you need to manage posix.
etbe says:

November 24, 2009 at 12:01

http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fminio.jogger.pl%2F2009%2F11%2F21%2Fpopularne-bledy-popelniane-podczas-pisania-skryptow-powloki%2F&sl=pl&tl=en

http://minio.jogger.pl/2009/11/21/popularne-bledy-popelniane-podczas-pisania-skryptow-powloki/

A Polish blogger has an interesting post that has the solutions to some common shell scripting mistakes. Above are the links for the blog post in question and for the Google translation into English.
Louwrentius says:

December 18, 2009 at 03:23

In my opinion, the set -e option is just a pain and should NEVER be used in a decent shell script.

I think the set -e option is just an excuse for not writing proper error handling routines and fail safe code. The || exit 1 statements seem like an ugly solution to me.

#!/bin/sh
cd /tmp/whatever || exit 1
rm -rf *

Could be better written something like:

#!/bin/sh
DIR=/tmp/whatever
if [ -e “$DIR” ] && [ ! -z “$DIR” ]
then
rm -rf “$DIR/*”
fi

If you want to run dangerous commands like ‘rm’ you may want to spend the 2 minutes extra to write a ‘proper’ fail safe mechanism instead of using set -e or other simple solutions.

But that is just me.
Louwrentius says:

December 18, 2009 at 03:27

Sorry for creating a separate posting, but I want to state that I fully agree with the author, only not on all the solutions. ;)
etbe says:

December 18, 2009 at 10:01

Louwrentius: The thing to keep in mind is that I didn’t write this post for the benefit of skillful people who are capable of coding in the style you advocate, or for people who need to write code that will later be maintained by such people.

Putting “set -e” at the start of a script is easy, simple, and solves many problems. While it’s not ideal, it will save servers from being trashed on occasion and that’s what really matters.

But your points are good and are useful for anyone who wants to take their scripting to a higher level.
Louwrentius says:

December 20, 2009 at 03:24

Ok, i misunderstood the pov of your article. From your perspective, I agree with the set -e option.

Also, I understand why people might argue that if you write anything in shell script that would be worth the effort to do it ‘right’, you might as well do it in a ‘proper language’ such as Python or Ruby for example.

I wrote some bigger stuff in bash, but that’s because I’m just crazy. There is no other reason for it.
vk3jed says:

February 8, 2010 at 18:29

Nice article about shell scripting. I didn’t consider -e myself, as I tend to use techniques similar to what Louwrentius advocates, sometimes going as far as ensuring the script has been started by the correct user, has all the data it needs to work, and in some cases, is started from the correct place in the filesystem.

use shell scripting quite a lot, because in my spare time I play around a lot with software that is already 90% shell scripts which manage a few binaries, and there’s no guarantee that the boxes have anything else available (though Perl is usually there and has sometimes been used by others). The software is also designed to run unattended for months on end, so the scripts have to be able to deal with common situations without intervention, and if they fail, fail gracefully.

However, for quick and dirty scripts, the techniques outlined in the blog post are a good way to avoid too many tears.
etbe says:

November 5, 2010 at 21:13

http://petereisentraut.blogspot.com/2010/11/pipefail.html

Peter Eisentraut points out that in Bash you can run “set -o pipefail” to handle some failure cases for pipelined commands.

Comments are closed.

Some Tips for Shell Code that Won’t Destroy Your OS

18 thoughts on “Some Tips for Shell Code that Won’t Destroy Your OS”

Related Post

Firebuild

Load Average

Redirecting Output from a Running Process