BTRFS and SE Linux

I’ve had problems with systems running SE Linux on BTRFS losing the XATTRs used for storing the SE Linux file labels after a power outage.

Here is the link to the patch that fixes this [1]. Thanks to Hans van Kranenburg and Holger Hoffstätte for the information about this patch which was already included in kernel 4.16.11. That was uploaded to Debian on the 27th of May and got into testing about the time that my message about this issue got to the SE Linux list (which was a couple of days before I sent it to the BTRFS developers).

The kernel from Debian/Stable still has the issue. So using a testing kernel might be a good option to deal with this problem at the moment.

Below is the information on reproducing this problem. It may be useful for people who want to reproduce similar problems. Also all sysadmins should know about “reboot -nffd”, if something really goes wrong with your kernel you may need to do that immediately to prevent corrupted data being written to your disks.

The command “reboot -nffd” (kernel reboot without flushing kernel buffers or writing status) when run on a BTRFS system with SE Linux will often result in /var/log/audit/audit.log being unlabeled. It also results in some systemd-journald files like /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being unlabeled but that is rarer. I think that the same
problem afflicts both systemd-journald and auditd but it’s a race condition that on my systems (both production and test) is more likely to affect auditd.

root@stretch:/# xattr -l /var/log/audit/audit.log 
0000   73 79 73 74 65 6D 5F 75 3A 6F 62 6A 65 63 74 5F    system_u:object_ 
0010   72 3A 61 75 64 69 74 64 5F 6C 6F 67 5F 74 3A 73    r:auditd_log_t:s 
0020   30 00                                              0.

SE Linux uses the xattr “security.selinux”, you can see what it’s doing with xattr(1) but generally using “ls -Z” is easiest.

If this issue just affected “reboot -nffd” then a solution might be to just not run that command. However this affects systems after a power outage.

I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security update for Debian/Stretch which is the latest supported release of Debian). I have also reproduced it in an identical manner with kernel 4.16.0-1-amd64 (the latest from Debian/Unstable). For testing I reproduced this with a 4G filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, both SSD and HDD.

set -e 
COUNT=$(ps aux|grep [s]bin/auditd|wc -l) 
if [ "$COUNT" = "1" ]; then 
 echo "all good" 
 echo "failed" 
 exit 1 

Firstly the above is the script /usr/local/sbin/testit, I test for auditd running because it aborts if the context on it’s log file is wrong. When SE Linux is in enforcing mode an incorrect/missing label on the audit.log file causes auditd to abort.

root@stretch:~# ls -liZ /var/log/audit/audit.log 
37952 -rw-------. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  1 
12:23 /var/log/audit/audit.log

Above is before I do the tests.

while ssh stretch /usr/local/sbin/testit ; do 
 ssh stretch "reboot -nffd" > /dev/null 2>&1 & 
 sleep 20 

Above is the shell code I run to do the tests. Note that the VM in question runs on SSD storage which is why it can consistently boot in less than 20 seconds.

Fri  1 Jun 12:26:13 UTC 2018 
all good 
Fri  1 Jun 12:26:33 UTC 2018 

Above is the output from the shell code in question. After the first reboot it fails. The probability of failure on my test system is greater than 50%.

root@stretch:~# ls -liZ /var/log/audit/audit.log  
37952 -rw-------. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 12:26 /var/log/audit/audit.log

Now the result. Note that the Inode has not changed. I could understand a newly created file missing an xattr, but this is an existing file which shouldn’t have had it’s xattr changed. But somehow it gets corrupted.

The first possibility I considered was that SE Linux code might be at fault. I asked on the SE Linux mailing list (I haven’t been involved in SE Linux kernel code for about 15 years) and was informed that this isn’t likely at
all. There have been no problems like this reported with other filesystems.

SE Linux in Debian/Stretch

Debian/Stretch has been frozen. Before the freeze I got almost all the bugs in policy fixed, both bugs reported in the Debian BTS and bugs that I know about. This is going to be one of the best Debian releases for SE Linux ever.

Systemd with SE Linux is working nicely. The support isn’t as good as I would like, there is still work to be done for systemd-nspawn. But it’s close enough that anyone who needs to use it can use audit2allow to generate the extra rules needed. Systemd-nspawn is not used by default and it’s not something that a new Linux user is going to use, I think that expert users who are capable of using such features are capable of doing the extra work to get them going.

In terms of systemd-nspawn and some other rough edges, the issue is the difference between writing policy for a single system vs writing policy that works for everyone. If you write policy for your own system you can allow access for a corner case without a lot of effort. But if I wrote policy to allow access for every corner case then they might add up to a combination that can be exploited. I don’t recommend blindly adding the output of audit2allow to your local policy (be particularly wary of access to shadow_t and write access to etc_t, lib_t, etc). But OTOH if you have a system that’s running in enforcing mode that happens to have one daemon with more access than is ideal then all the other daemons will still be restricted.

As for previous releases I plan to keep releasing updates to policy packages in my own apt repository. I’m also considering releasing policy source to updates that can be applied on existing Stretch systems. So if you want to run the official Debian packages but need updates that came after Stretch then you can get them. Suggestions on how to distribute such policy source are welcome.

Please enjoy SE Linux on Stretch. It’s too late for most bug reports regarding Stretch as most of them won’t be sufficiently important to justify a Stretch update. The vast majority of SE Linux policy bugs are issues of denying wanted access not permitting unwanted access (so not a security issue) and can be easily fixed by local configuration, so it’s really difficult to make a case for an update to Stable. But feel free to send bug reports for Buster (Stretch+1).

Running a Shell in a Daemon Domain

allow unconfined_t logrotate_t:process transition;
allow logrotate_t { shell_exec_t bin_t }:file entrypoint;
allow logrotate_t unconfined_t:fd use;
allow logrotate_t unconfined_t:process sigchld;

I recently had a problem with SE Linux policy related to logrotate. To test it out I decided to run a shell in the domain logrotate_t to interactively perform some of the operations that logrotate performs when run from cron. I used the above policy to allow unconfined_t (the default domain for a sysadmin shell) to enter the daemon domain.

Then I used the command “runcon -r system_r -t logrotate_t bash” to run a shell in the domain logrotate_t. The utility runcon will attempt to run a program in any SE Linux context you specify, but to succeed the system has to be in permissive mode or you need policy to permit it. I could have written policy to allow the logrotate_t domain to be in the role unconfined_r but it was easier to just use runcon to change roles.

Then I had a shell in the logrotate_t command to test out the post-rotate scripts. It turned out that I didn’t really need to do this (I had misread the output of an earlier sesearch command). But this technique can be used for debugging other SE Linux related problems so it seemed worth blogging about.

SE Linux Play Machine Over Tor

I work on SE Linux to improve security for all computer users. I think that my work has gone reasonably well in that regard in terms of directly improving security of computers and helping developers find and fix certain types of security flaws in apps. But a large part of the security problems we have at the moment are related to subversion of Internet infrastructure. The Tor project is a significant step towards addressing such problems. So to achieve my goals in improving computer security I have to support the Tor project. So I decided to put my latest SE Linux Play Machine online as a Tor hidden service. There is no real need for it to be hidden (for the record it’s in my bedroom), but it’s a learning experience for me and for everyone who logs in.

A Play Machine is what I call a system with root as the guest account with only SE Linux to restrict access.

Running a Hidden Service

A Hidden Service in TOR is just a cryptographically protected address that forwards to a regular TCP port. It’s not difficult to setup and the Tor project has good documentation [1]. For Debian the file to edit is /etc/tor/torrc.

I added the following 3 lines to my torrc to create a hidden service for SSH. I forwarded port 80 for test purposes because web browsers are easier to configure for SOCKS proxying than ssh.

HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 22
HiddenServicePort 80

Generally when setting up a hidden service you want to avoid using an IP address that gives anything away. So it’s a good idea to run a hidden service on a virtual machine that is well isolated from any public network. My Play machine is hidden in that manner not for secrecy but to prevent it being used for attacking other systems.

SSH over Tor

Howtoforge has a good article on setting up SSH with Tor [2]. That has everything you need for setting up Tor for a regular ssh connection, but the tor-resolve program only works for connecting to services on the public Internet. By design the .onion addresses used by Hidden Services have no mapping to anything that reswemble IP addresses and tor-resolve breaks it. I believe that the fact that tor-resolve breaks thins in this situation is a bug, I have filed Debian bug report #776454 requesting that tor-resolve allow such things to just work [3].

Host *.onion
ProxyCommand connect -5 -S localhost:9050 %h %p

I use the above ssh configuration (which can go in ~/.ssh/config or /etc/ssh/ssh_config) to tell the ssh client how to deal with .onion addresses. I also had to install the connect-proxy package which provides the connect program.

ssh root@zp7zwyd5t3aju57m.onion
The authenticity of host ‘zp7zwyd5t3aju57m.onion ()
ECDSA key fingerprint is 3c:17:2f:7b:e2:f6:c0:c2:66:f5:c9:ab:4e:02:45:74.
Are you sure you want to continue connecting (yes/no)?

I now get the above message when I connect, the ssh developers have dealt with connecting via a proxy that doesn’t have an IP address.

Also see the general information page about my Play Machine, that information page has the root password [4].

Fixing Strange Directory Write Access

type=AVC msg=audit(1403622580.061:96): avc:  denied  { write } for  pid=1331 comm="mysqld_safe" name="/" dev="dm-0" ino=256 scontext=system_u:system_r:mysqld_safe_t:s0 tcontext=system_u:object_r:root_t:s0 tclass=dir
type=SYSCALL msg=audit(1403622580.061:96): arch=c000003e syscall=269 success=yes exit=0 a0=ffffffffffffff9c a1=7f5e09bfe798 a2=2 a3=2 items=0 ppid=1109 pid=1331 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mysqld_safe" exe="/bin/dash" subj=system_u:system_r:mysqld_safe_t:s0 key=(null)

For a long time (probably years) I’ve been seeing messages like the above in the log from auditd (/var/log/audit/audit.log) when starting mysqld. I haven’t fixed it because the amount of work exceeded the benefit, it’s just a couple of lines logged at every system boot. But today I decided to fix it.

The first step was to find out what was going on, I ran a test system in permissive mode and noticed that there were no attempts to create a file (that would have been easy to fix). Then I needed to discover which system call was triggering this. The syscall number is 269, the file linux/x86_64/syscallent.h in the strace source shows that 269 is the system call faccessat. faccessat(2) and access(2) are annoying cases, they do all the permission checks for access but don’t involve doing the operation so when a program uses those system calls but for some reason doesn’t perform the operation in question (in this case writing to the root directory) then we just get a log entry but nothing happening to examine.

A quick look at the shell script didn’t make the problem obvious, note that this is probably obvious to people who are more skilled at shell scripting than me – but it’s probably good for me to describe how to solve these problems every step of the way. So the next step was to use gdb. Here is the start of my gdb session:

# gdb /bin/sh
Reading symbols from /bin/dash…(no debugging symbols found)…done.
(gdb) b faccessat
Breakpoint 1 at 0x3960
(gdb) r -x /usr/bin/mysqld_safe
[lots skipped]
+ test -r /usr/my.cnf
Breakpoint 1, 0x00007ffff7b0c7e0 in faccessat ()
from /lib/x86_64-linux-gnu/

After running gdb on /bin/sh (which is a symlink to /bin/dash) I used the “b” command to set a breakpoint on the function faccessat (which is a library call from glibc that calls the system call sys_faccessat()). A breakpoint means that program execution will stop when the function is called. I run the shell script with “-x” as the first parameter to instruct the shell to show me the shell commands that are run so I can match shell commands to system calls. The above output shows the first call to faccessat() which isn’t interesting (it’s testing for read access).

I then ran the “c” command in gdb to continue execution and did so a few times until I found something interesting.

+ test -w / -o root = root
Breakpoint 1, 0x00007ffff7b0c7e0 in faccessat ()
from /lib/x86_64-linux-gnu/

Above is the interesting part of the gdb output. It shows that the offending shell command is “test -w /“.

I filed Debian bug #752593 [1] with a patch to fix this problem.

I also filed a wishlist bug against strace asking for an easier way to discover the name of a syscall [2].

SE Linux Things To Do

At the end of my talk on Monday about the status of SE Linux [1] I described some of the things that I want to do with SE Linux in Debian (and general SE Linux stuff). Here is a brief summary of some of them:

One thing I’ve wanted to do for years is to get X Access Controls working in Debian. This means that two X applications could have windows on the same desktop but be unable to communicate with each other by any of the X methods (this includes screen capture and clipboard). It seems that the Fedora people are moving to sandbox processes with Xephyr for X access (see Dan Walsh’s blog post about sandbox -X [2]). But XAce will take a lot of work and time is always an issue.

An ongoing problem with SE Linux (and most security systems) is the difficulty in running applications with minimum privilege. One example of this is utility programs which can be run by multiple programs, if a utility is usually run by a process that is privileged then we probably won’t notice that it requires excess privileges until it’s run in a different context. This is a particular problem when trying to restrict programs that may be run as part of a user session. A common example is programs that open files read-write when they only need to read them, if the program then aborts when it can’t open the file in question then we will have a problem when it’s run from a context that doesn’t grant it write access. To deal with such latent problems I am considering ways of analysing the operation of systems to try and determine which programs request more access than they really need.

During my talk I discussed the possibility of using a shared object to log file open/read/write to find such latent problems. A member of the audience suggested static code analysis which seems useful for some languages but doesn’t seem likely to cover all necessary languages. Of course the benefit of static code analysis is that it will catch operations that the program doesn’t perform in a test environment – error handling is one particularly important corner case in this regard.

My SE Linux Status Report – LCA 2013

This morning I gave a status report on SE Linux. The talk initially didn’t go too well, I wasn’t in the right mental state for it and I moved through the material too fast. Fortunately Casey Schaufler asked some really good questions which helped me to get back on track. The end result seemed reasonably good. Here’s a summary of the things I discussed:

Transaction hooks for RPM to support SE Linux operations. This supports signing packages to indicate their security status and preventing packages from overwriting other packages or executing scripts in the wrong context. There is also work to incorporate some of the features of that into “dpkg” for Debian.

Some changes to libraries to allow faster booting. Systems with sysvinit and a HDD won’t be affected but with systemd and SSD it makes a real difference. Mostly Red Hat’s work.

Filename transition rules to allow the initial context to be assigned based on file name were created in 2011 but are not starting to get used.

When systemd is used for starting/stopping daemons some hacks such as run_init can be avoided. Fedora is making the best progress in this regard due to only supporting systemd while the support for other init systems will limit what we can do for Debian. This improves security by stopping terminal buffer insertion attacks while also improving reliability by giving the daemon the same inherited settings each time it’s executed.

Labelled NFS has been accepted as part of the NFSv4.2 specification. This is a big deal as labelled NFS work has been going for many years without hitting such a milestone in the past.

ZFS and BTRFS support but we still need to consider management issues for such snapshot based filesystems. Filesystem snapshots have the potential to interact badly with relabelling if we don’t develop code and sysadmin practices to deal with it properly.

The most significant upstream focus of SE Linux development over the last year is SE Android. I hope that will result in more work on the X Access Controls for use on the desktop.

During question time I also gave a 3 minute “lightning talk” description of SE Linux.

New SE Linux Policy for Wheezy

I’ve just uploaded a new SE Linux policy for Debian/Wheezy. It now works correctly with systemd and Chromium, two significant features that I wanted for Wheezy. Now it turns out that we have until the end of the month for Wheezy updates, so I may get another version of the policy uploaded before then. If so it will only be for relatively minor changes, I think that most SE Linux users would be reasonably happy with policy the way it is. Anything that doesn’t work now can probably be solved by local configuration changes.


The current version of KDE in Debian is 4.8.4, it seems that large parts of the KDE environment depend on execmem access, this includes kwin and plasma-desktop. Basically there is no possibility of having a KDE desktop environment without those programs and therefore KDE depends on execmem access.

Debugging this is difficult as the important programs SEGV when denied execmem access and the KDE crash handler really gets in the way of debugging it – running /usr/bin/plasma-desktop results in the process forking a child and detaching from the gdb session.

The most clear example of an execmem issue in KDE is from the program /usr/lib/kde4/libexec/kwin_opengl_test which gives the following error:
LLVM ERROR: Allocation failed when allocating new memory in the JIT
Can’t allocate RWX Memory: Permission denied

To make this work you run the command “setsebool -P allow_execmem 1” which gives many domains the ability to create writable-executable memory regions.

I raised this issue for discussion on the SE Linux mailing list and Hinnerk van Bruinehsen wrote an informative message in response summarising the situation [1]. It seems that it’s possible to compile some of the programs in question to not use the JIT and therefore not require such access and there is a build option in Gentoo to allow it. But it’s impractically difficult for me to fork KDE in Debian so the only option is to recommend that people enable the allow_execmem boolean for Debian desktop systems running SE Linux.

Debian SE Linux Status June 2012

It’s almost the Wheezy freeze time and I’ve been working frantically to get things working properly.

Policy Status

At the moment I’m preparing an upload of the policy which will support KDE (and probably most desktop environment) logins and many little fixes related to server operations (particularly MTAs). I would like to get another version done before Wheezy is released, but if Wheezy releases with version 2.20110726-6 of the policy that will be OK. It will work well enough for most things that users will be able to use local changes for the things that don’t work.

One significant lack with the current policy is that systemd won’t work. I’ve included most of the policy changes needed, but haven’t done any of the testing and tweaking that is necessary to make it work properly.

I would like to see policy support for systemd in a Wheezy update if I don’t get it done in time for the first release. If I don’t get it done in time for the release and if the release team don’t accept it for an update then I’ll put it in my own repository so anyone who needs it can get it.

/run Labelling

One significant change for Wheezy is to use a tmpfs mounted on /run instead of /var/run. This means that lots of daemon start scripts create subdirectories of /run at boot time which need to have SE Linux labels applied for correct operation. The way things work is that usually the daemon will write to the directory immediately after the init script has created it, so I can’t just have my own script recursively relabel all of /run.

Some packages that need to be patched are x11-common #677831, clamav-daemon #677686, sasl2-bin #677685, dkim-filter #677684, and cups #677580. I am sure that there are others.

[ -x /sbin/restorecon ] && /sbin/restorecon -R $DIR

Generally if you are writing an init script and creating a directory under /run then you need to have some shell code like the above immediately after it’s created. Also the same applies for directories under /tmp and any other significant directories that are created at boot time.


Currently there are some potential problems with the upgrade process, I’m working on them at the moment. Ideally an “apt-get dist-upgrade” would cleanly upgrade everything. But at the moment it seems likely that the upgrade might initially go wrong and then work on the second try. There are some complications such as the selinux-policy-default package owning a config file which is used by mcstransd (which is part of the policycoreutils package), when the config file format changes you get order dependencies for the upgrade.

Kernel Support

My aim when developing a new SE Linux release for Debian is that the policy should work as much as possible with the user-space from the previous release. So if you upgrade from Squeeze to Wheezy you should be able to start the process by upgrading the SE Linux policy (which drags in the utilities and lots of libraries). This means that if you have a server running you don’t have to put it out of action for the entire upgrade, you can get the policy going and then get other things going. I haven’t tested this yet but I don’t expect any problems (apart from all the dependencies).

Also the policy should work with the kernel from the previous release. So if you have a virtual server where it’s not convenient to upgrade the kernel then that shouldn’t stop you from upgrading the user-space and the SE Linux policy. I’ve tested this and found one bug, the sepolgen-ifgen utility that you need to run before audit2allow -R won’t work if the kernel is older than the utilities #677730. I don’t know if it will be possible to get this fixed. Anyway it’s not that important, you can always copy the audit log to another system running the same policy to run audit2allow, it’s not convenient but not THAT difficult either.

The End Result

I think that the result of using SE Linux in Wheezy will be quite good for the people who get the upgrade done and who modify a few init scripts that don’t get the necessary changes in time. I anticipate that someone who doesn’t know much about SE Linux will be able to get a basic workstation or small server installation done in considerably less than an hour if they read the documentation and someone who knows what they are doing will get it done in a matter of minutes (plus download and install time which can be significant on old hardware).

At the moment I’m in the process of upgrading all of my systems to Unstable (currently Testing has versions of some SE Linux packages that are too broken). While doing this I will keep discovering bugs and fix as many of them as possible. But it seems that I’ve already fixed most things that affect common users.

Also BTRFS works well. Not that supporting a new filesystem is a big deal (all that’s needed is XATTR support), but having all the nice new features on one system is a good thing. Now I just need to get systemd working.

SE Linux Status in Debian 2012-03

I have just finished updating the user-space SE Linux code in Debian/Unstable to the version released on 2012-02-16. There were some changes to the build system from upstream which combined with the new Debian multi-arch support involved a fair bit of work for me. While I was at it I converted more of them to the new Quilt format to make it easier to send patches upstream. In the past I have been a bit slack about sending patches upstream, my aim for the next upstream release of user-space is to have at least half of my patches included upstream – this will make things easier for everyone.

Recently Mika Pflüger and Laurent Bigonville have started work on Debian SE Linux, they have done some good work converting the refpolicy source (which is used to build selinux-policy-default) to Quilt. Now it will be a lot easier to send policy patches upstream and porting them to newer versions of the upstream refpolicy.

Now the next significant thing that I want to do is to get systemd working correctly with SE Linux. But first I have to get it working correctly wit cryptsetup.