Archives

Categories

Finding Corrupt Files that cause a Kernel Error

There is a BTRFS bug in kernel 3.13 which is triggered by Kmail and causes Kmail index files to become seriously corrupt. Another bug in BTRFS causes a kernel GPF when an application tries to read such a file, that results in a SEGV being sent to the application. After that the kernel ceases to operate correctly for any files on that filesystem and no command other than “reboot -nf” (hard reset without flushing write-back caches) can be relied on to work correctly. The second bug should be fixed in Linux 3.14, I’m not sure about the first one.

In the mean time I have several systems running Kmail on BTRFS which have this problem.

(strace tar cf – . |cat > /dev/null) 2>&1|tail

To discover which file is corrupt I run the above command after a reboot. Below is a sample of the typical output of that command which shows that the file named “.trash.index” is corrupt. After discovering the file name I run “reboot -nf” and then delete the file (the file can be deleted on a clean system but not after a kernel GPF). Of recent times I’ve been doing this about once every 5 days, so on average each Kmail/BTRFS system has been getting disk corruption every two weeks. Fortunately every time the corruption has been on an index file so I don’t need to restore from backups.

newfstatat(4, ".trash.index", {st_mode=S_IFREG|0600, st_size=33, …}, AT_SYMLINK_NOFOLLOW) = 0
openat(4, ".trash.index", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0600, st_size=33, …}) = 0
read(5,  <unfinished …>
+++ killed by SIGSEGV +++

Comments are closed.