There is always been an ongoing debate about how to assign disk space into multiple partitions. I think that nowadays the best thing to do is to assign about 10G for the root filesystem for every desktop and server system because 10G is a small fraction of the disk space available (even the smallest laptops seem to all have disks larger than 100G nowadays). Even if 10G turns out not to be enough using separate filesystems for /var or /usr provides little benefit now that it’s easy to resize the root filesystem with LVM – and a separate /usr is known to be broken [1].
In a discussion on a private mailing list there was a suggestion that multiple filesystems should be used for security.
Table of Contents
DoS Attacks
There are some minor security benefits in having multiple filesystems. If a critical program will fail when there is no free disk space then allowing an unprivileged process to use up all the space on that filesystem is a minor security issue, so having unprivileged processes not being permitted to write to important filesystems is a benefit. But most failures of this type are merely DoS attacks which usually aren’t a big deal – if you can control a local process there are usually lots of other ways of DoSing a system.
Links
Links have been the cause of many security issues in Unix over the years. Using different filesystems for different tasks can prevent the use of hard links in attacks aimed at exploiting race conditions. But even if you prevent hard links there are similar issues with symbolic links. SE Linux is one of many security improvements for Linux which allow restrictions on the creation of hard links. SE Linux also allows restricting the ability of processes to follow symbolic links, so a privileged process can be denied access to follow a sym-link that was created by an unprivileged process.
NFS
The subtree_check option in /etc/exports causes the NFS server to verify that file access is in the correct subtree. So if you export only one subdirectory of a filesystem to a given server then hostile code on that server (or on a network device which impersonates that server) can’t access other subdirectories. This option is documented as having performance implications and working best for filesystems that are mostly read-only, for this reason it’s turned off by default in recent versions of the NFS utilities.
So if you want to NFS export /home then it’s probably a good idea to have /home be on a separate filesystem to prevent attacks on the root filesystem. But of the systems with significant use of /home (IE anything other than accounts used solely for “su –“) most of them have a separate filesystem for /home anyway so this shouldn’t be an issue.
SE Linux
When mounting filesystems with SE Linux there is a “context=” mount option that allows specifying the context for all files on the filesystem. This can save a small amount of storage space for XATTRs and theoretically improve performance (although the difference is unlikely to show up on benchmarks for anything other than fsck). Generally the context mount option is only used for a filesystem that has a huge number of files with the same context, such as a mail spool that uses Maildir, Cyrus, or any of the other formats that involve one file per message. But again such data is generally stored on a separate filesystem for other reasons anyway.
I found one interesting corner case in regard to SE Linux systems mounting files from an NFS server. When an NFS server exports multiple subdirectories of a filesystem mounted on /foo then if one NFS client running SE Linux is to mount two subdirectories of /foo with different contexts then the second mount attempt will give the error “an incorrect mount option was specified”. This is because as of kernel 2.6.18 by default it’s not permitted to mount parts of the same filesystem with different mount options. The option “nosharecache” allows you to use different mount options, but does apparently permit some undesirable behavior in the case of hard links that cross between the subtrees. Thanks to Eric Paris for the tip about nosharecache.
The best example I can think of for which you might want context mount options that differ among files that are used for the same purpose on an NFS mount is a web server which has data files and CGI-BIN scripts. So it seems that a SE Linux web server that mounts it’s data over NFS and is at risk of having hard links between the CGI-BIN directory and the data directory is a corner case in which multiple filesystems is required for security. This seems to be a very unlikely case.
Conclusion
Servers that are deployed in the real world are complex enough that there are always systems with some unusual corner cases demanding configuration choices that aren’t expected. There are some real corner cases for SE Linux where multiple filesystems are compelled for security or for a combination of security and best performance.
But I wouldn’t make a generic recommendation of using lots of filesystems for security. I think that the people who encounter the strange corner cases can usually work out that they need to do something different. So a small number of filesystems seems like a good general aim that doesn’t conflict with security.
I agree with most of what you wrote. One small detail concerning the first paragraph: here I am using separate /usr, with Debian sid. My system doesn’t *seem* to be broken. In fact, I believe this configuration has been supported for many years. I’ve read Lennart’s page on separate /usr and it doesn’t seem to say anything relevant about this; instead, it lists some non-bugs and some minor bugs that do not affect this system. Have I missed something fundamental?
The issues with /usr are not that everything will break – if that was the case some Debian bug reports would be filed and eventually it would get fixed. The issue is that some things may call programs on /usr and due to the ongoing changes and complexity of all the software the amount of effort involved in keeping it all working is greater than the interest. Finally some of these things are timing related, so if you don’t connect your device before /usr is mounted then it will work anyway – there are probably people who have laptops working for years until they finally reboot with devices connected and get a failure.
But the real issue isn’t the difficulty of getting a separate /usr filesystem working but the difficulty vs the benefit. One desktop system I run which has lots of GUI things and games installed has 5G for /usr. It has an 80G SATA disk of which 40G is unused. A separate /usr for that system really wouldn’t provide any benefit. I think that most systems are in a similar situation nowadays.
I share your recommendation for 10GB for the root filesystem, it’s enough for my systems. But I’m using the remaining for /var and doing a bind-mount from /var/local/home to /home and everywhere I need a writeable filesystem. My root filesystem is readonly. That’s the reason I have two separate filesystem.
Bind-mounts have the advantage or disadvantage that you can’t create hardlinks between them. It’s bad, because an mv /var/tmp/foo /home/me results in a copy while mv /var/tmp/foo /var/local/home/me doesn’t.