Firstly a disclaimer, I’m not an expert on this and I’m not trying to instruct anyone who is aiming to become an expert. The aim of this blog post is to help someone who has a single kernel issue they want to debug as part of doing something that’s mostly not kernel coding. I welcome comments about the second step to kernel debugging for the benefit of people who need more than this (which might include me next week). Also suggestions for people who can’t use a kvm/qemu debugger would be good.
Below is a command to run qemu with GDB. It should be run from the Linux kernel source directory. You can add other qemu options for a blog device and virtual networking if necessary, but the bug I encountered gave an oops from the initrd so I didn’t need to go further. The “nokaslr” is to avoid address space randomisation which deliberately makes debugging tasks harder (from a certain perspective debugging a kernel and compromising a kernel are fairly similar). Loading the bzImage is fine, gdb can map that to the different file it looks at later on.
qemu-system-x86_64 -kernel arch/x86/boot/bzImage -initrd ../initrd-$KERN_VER -curses -m 2000 -append "root=/dev/vda ro nokaslr" -gdb tcp::1200
The command to run GDB is “gdb vmlinux“, when at the GDB prompt you can run the command “target remote localhost:1200” to connect to the GDB server port 1200. Note that there is nothing special about port 1200, it was given in an example I saw and is as good as any other port. It is important that you run GDB against the “vmlinux” file in the main directory not any of the several stripped and packaged files, GDB can’t handle a bzImage file but that’s OK, it ends up much the same in RAM.
When the “target remote” command is processed the kernel will be suspended by the debugger, if you are looking for a bug early in the boot you may need to be quick about this. Using “qemu-system-x86_64” instead of “kvm” slows things down and can help in that regard. The bug I was hunting happened 1.6 seconds after kernel load with KVM and 7.8 seconds after kernel load with qemu. I am not aware of all the implications of the kvm vs qemu decision on debugging. If your bug is a race condition then trying both would be a good strategy.
After the “target remote” command you can debug the kernel just like any other program.
If you put a breakpoint on print_modules() that will catch the operation of printing an Oops which can be handy.
Update: Address Space Randomisation
(gdb) b setxattr Breakpoint 1 at 0xffffffff81332bf0: file fs/xattr.c, line 546. (gdb) c Continuing. Warning: Cannot insert breakpoint 1. Cannot access memory at address 0xffffffff81332bf0
If you get an error like the above while trying to set a breakpoint then it’s probably Address Space Randomisation (known as “KASLR”). Put the parameter “nokaslr” on the kernel command line to stop this. Note that KASLR is a REALLY good thing to have in a normal system as it makes attacks on the kernel security a lot harder, only do this for debugging purposes.
Update: Breakpoints Not Applied
If your gdb session is using a vmlinux that doesn’t match the kernel booted in the VM then things will appear to work, breakpoints can be set, but the running kernel will never break (presumably it would break on some other random kernel code that has the same addresses as the requested function in the other kernel).
So if breakpoints mysteriously don’t work double check that you have a matching vmlinux for the kernel being debugged.