RISC-V and Qemu

RISC-V is the latest RISC architecture that’s become popular. It is the 5th RISC architecture from the University of California Berkeley. It seems to be a competitor to ARM due to not having license fees or restrictions on alterations to the architecture (something you have to pay extra for when using ARM). RISC-V seems the most popular architecture to implement in FPGA.

When I first tried to run RISC-V under QEMU it didn’t work, which was probably due to running Debian/Unstable on my QEMU/KVM system and there being QEMU bugs in Unstable at the time. I have just tried it again and got it working.

The Debian Wiki page about RISC-V is pretty good [1]. The instructions there got it going for me. One thing I wasted some time on before reading that page was trying to get a netinst CD image, which is what I usually do for setting up a VM. Apparently there isn’t RISC-V hardware that boots from a CD/DVD so there isn’t a Debian netinst CD image. But debootstrap can install directly from the Debian web server (something I’ve never wanted to do in the past) and that gave me a successful installation.

Here are the commands I used to setup the base image:

apt-get install debootstrap qemu-user-static binfmt-support debian-ports-archive-keyring

debootstrap --arch=riscv64 --keyring /usr/share/keyrings/debian-ports-archive-keyring.gpg --include=debian-ports-archive-keyring unstable /mnt/tmp http://deb.debian.org/debian-ports

I first tried running RISC-V Qemu on Buster, but even ls didn’t work properly and the installation failed.

chroot /mnt/tmp bin/bash
# ls -ld .
/usr/bin/ls: cannot access '.': Function not implemented

When I ran it on Unstable ls works but strace doesn’t work in a chroot, this gave enough functionality to complete the installation.

chroot /mnt/tmp bin/bash
# strace ls -l
/usr/bin/strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Function not implemented
/usr/bin/strace: ptrace(PTRACE_TRACEME, ...): Function not implemented
/usr/bin/strace: PTRACE_SETOPTIONS: Function not implemented
/usr/bin/strace: detach: waitpid(1602629): No child processes
/usr/bin/strace: Process 1602629 detached

When running the VM the operation was noticably slower than the emulation of PPC64 and S/390x which both ran at an apparently normal speed. When running on a server with equivalent speed CPU a ssh login was obviously slower due to the CPU time taken for encryption, a ssh connection from a system on the same LAN took 6 seconds to connect. I presume that because RISC-V is a newer architecture there hasn’t been as much effort made on optimising the Qemu emulation and that a future version of Qemu will be faster. But I don’t think that Debian/Bullseye will give good Qemu performance for RISC-V, probably more changes are needed than can happen before the freeze. Maybe a version of Qemu with better RISC-V performance can be uploaded to backports some time after Bullseye is released.

Here’s the Qemu command I use to run RISC-V emulation:

qemu-system-riscv64 -machine virt -device virtio-blk-device,drive=hd0 -drive file=/vmstore/riscv,format=raw,id=hd0 -device virtio-blk-device,drive=hd1 -drive file=/vmswap/riscv,format=raw,id=hd1 -m 1024 -kernel /boot/riscv/vmlinux-5.10.0-1-riscv64 -initrd /boot/riscv/initrd.img-5.10.0-1-riscv64 -nographic -append net.ifnames=0 noresume security=selinux root=/dev/vda ro -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -device virtio-net-device,netdev=net0,mac=02:02:00:00:01:03 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Currently the program /usr/sbin/sefcontext_compile from the selinux-utils package needs execmem access on RISC-V while it doesn’t on any other architecture I have tested. I don’t know why and support for debugging such things seems to be in early stages of development, for example the execstack program doesn’t work on RISC-V now.

RISC-V emulation in Unstable seems adequate for people who are serious about RISC-V development. But if you want to just try a different architecture then PPC64 and S/390 will work better.

1

Weather and Boinc

I just wrote a Perl script to look at the Australian Bureau of Meteorology pages to find the current temperature in an area and then adjust BOINC settings accordingly. The Perl script (in this post after the break, which shouldn’t be in the RSS feed) takes the URL of a Bureau of Meteorology observation point as ARGV[0] and parses that to find the current (within the last hour) temperature. Then successive command line arguments are of the form “24:100” and “30:50” which indicate that at below 24C 100% of CPU cores should be used and below 30C 50% of CPU cores should be used. In warm weather having a couple of workstations in a room running BOINC (or any other CPU intensive task) will increase the temperature and also make excessive noise from cooling fans.

To change the number of CPU cores used the script changes /etc/boinc-client/global_prefs_override.xml and then tells BOINC to reload that config file. This code is a little ugly (it doesn’t properly parse XML, it just replaces a line of text) and could fail on a valid configuration file that wasn’t produced by the current BOINC code.

The parsing of the BoM page is a little ugly too, it relies on the HTML code in the BoM page – they could make a page that looks identical which breaks the parsing or even a page that contains the same data that looks different. It would be nice if the BoM published some APIs for getting the weather. One thing that would be good is TXT records in the DNS. DNS supports caching with specified lifetime and is designed for high throughput in aggregate. If you had a million IOT devices polling the current temperature and forecasts every minute via DNS the people running the servers wouldn’t even notice the load, while a million devices polling a web based API would be a significant load. As an aside I recommend playing nice and only running such a script every 30 minutes, the BoM page seems to be updated on the half hour so I have my cron jobs running at 5 and 35 minutes past the hour.

If this code works for you then that’s great. If it merely acts as an inspiration for developing your own code then that’s great too! BOINC users outside Australia could replace the code for getting meteorological data (or even interface to a digital thermometer). Australians who use other CPU intensive batch jobs could take the BoM parsing code and replace the BOINC related code. If you write scripts inspired by this please blog about it and comment here with a link to your blog post.

Continue reading

5

MPV vs Mplayer

After writing my post about VDPAU in Debian [1] I received two great comments from anonymous people. One pointed out that I should be using VA-API (also known as VAAPI) on my Intel based Thinkpad and gave a reference to an Arch Linux Wiki page, as usual Arch Linux Wiki is awesome and I learnt a lot of great stuff there. I also found the Debian Wiki page on Hardware Video Acceleration [2] which has some good information (unfortunately I had already found all that out through more difficult methods first, I should read the Debian Wiki more often.

It seems that mplayer doesn’t suppoer VAAPI. The other comment suggested that I try the mpv fork of Mplayer which does support VAAPI but that feature is disabled by default in Debian.

I did a number of tests on playing different videos on my laptop running Debian/Buster with Intel video and my workstation running Debian/Unstable with ATI video. The first thing I noticed is that mpv was unable to use VAAPI on my laptop and that VDPAU won’t decode VP9 videos on my workstation and most 4K videos from YouTube seem to be VP9. So in most cases hardware decoding isn’t going to help me.

The Wikipedia page about Unified Video Decoder [3] shows that only VCN (Video Core Next) supports VP9 decoding while my R7-260x video card [4] has version 4.2 of the Unified Video Decoder which doesn’t support VP9, H.265, or JPEG. Basically I need a new high-end video card to get VP9 decoding and that’s not something I’m interested in buying now (I only recently bought this video card to do 4K at 60Hz).

The next thing I noticed is that for my combination of hardware and software at least mpv tends to take about 2/3 the CPU time to play videos that mplayer does on every video I tested. So it seems that using mpv will save me 1/3 of the power and heat from playing videos on my laptop and save me 1/3 of the CPU power on my workstation in the worst case while sometimes saving me significantly more than that.

Conclusion

To summarise quite a bit of time experimenting with video playing and testing things: I shouldn’t think too much about hardware decoding until VP9 hardware is available (years for me). But mpv provides some real benefits right now on the same hardware, I’m not sure why.

2

Testing VDPAU in Debian

VDPAU is the Video Decode and Presentation API for Unix [1]. I noticed an error with mplayer “Failed to open VDPAU backend libvdpau_i965.so: cannot open shared object file: No such file or directory“, Googling that turned up Debian Bug #869815 [2] which suggested installing the packages vdpau-va-driver and libvdpau-va-gl1 and setting the environment variable “VDPAU_DRIVER=va_gl” to enable VPDAU.

The command vdpauinfo from the vdpauinfo shows the VPDAU capabilities, which showed that VPDAU was working with va_gl.

When mplayer was getting the error about a missing i915 driver it took 35.822s of user time and 1.929s of system time to play Self Control by Laura Branigan [3] (a good music video to watch several times while testing IMHO) on my Thinkpad Carbon X1 Gen1 with Intel video and a i7-3667U CPU. When I set “VDPAU_DRIVER=va_gl” mplayer took 50.875s of user time and 4.207s of system time but didn’t have the error.

It’s possible that other applications on my Thinkpad might benefit from VPDAU with the va_gl driver, but it seems unlikely that any will benefit to such a degree that it makes up for mplayer taking more time. It’s also possible that the Self Control video I tested with was a worst case scenario, but even so taking almost 50% more CPU time made it unlikely that other videos would get a benefit.

For this sort of video (640×480 resolution) it’s not a problem, 38 seconds of CPU time to play a 5 minute video isn’t a real problem (although it would be nice to use less battery). For a 1600*900 resolution video (the resolution of the laptop screen) it took 131 seconds of user time to play a 433 second video. That’s still not going to be a problem when playing on mains power but will suck a bit when on battery. Most Thinkpads have Intel video and some have NVidia as well (which has issues from having 2 video cards and from having poor Linux driver support). So it seems that the only option for battery efficient video playing on the go right now is to use a tablet.

On the upside, screen resolution is not increasing at a comparable rate to Moore’s law so eventually CPUs will get powerful enough to do all this without using much electricity.

5

KDE Icons Disappearing in Debian/Unstable

One of my workstations is running Debian/Unstable with KDE and SDDM on an AMD Radeon R7 260X video card. Recently it stopped displaying things correctly after a reboot, all the icons failed to display as well as many of the Qt controls. When I ran a KDE application from the command line I got the error “QSGTextureAtlas: texture atlas allocation failed, code=501“. Googling that error gave a blog post about a very similar issue in 2017 [1]. From that blog post I learned that I could stop the problem by setting MESA_EXTENSION_OVERRIDE=”-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888″ in the environment. In a quick test I found that the environment variable setting worked, making the KDE apps display correctly and not report an error about a texture atlas.

I created a file ~/.config/plasma-workspace/env/bgra.sh with the following contents:

export MESA_EXTENSION_OVERRIDE="-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888"

Then after the next login things worked as desired!

Now the issue is, where is the bug? GL, X, and the internals of KDE are things I don’t track much. I welcome suggestions from readers of my blog as to what the culprit might be and where to file a Debian bug – or a URL to a Debian bug report if someone has already filed one.

Update

When I run the game warzone2100 with this setting it crashes with the below output. So this Mesa extension override isn’t always a good thing, just solves one corner case of a bug.

$ warzone2100 
/usr/bin/gdb: warning: Couldn't determine a path for the index cache directory.
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
No frame at level 0x7ffc3392ab50.
Saved dump file to '/home/etbe/.local/share/warzone2100-3.3.0//logs/warzone2100.gdmp-VuGo2s'
If you create a bugreport regarding this crash, please include this file.
Segmentation fault (core dumped)

Update 2

Carsten provided the REAL solution to this, run “apt remove libqt5quick5-gles” which will automatically install “libqt5quick5” which makes things work. Another workstation I run that tracks Testing had libqt5quick5 installed which was why it didn’t have the problem.

The system in question had most of KDE removed due to package dependency issues when tracking Unstable and when I reinstalled it I guess the wrong one was installed.

Dell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.

More About the PowerEdge R710

I’ve got the R710 (mentioned in my previous post [1]) online. When testing the R710 at home I noticed that sometimes the VGA monitor I was using would start flickering when in some parts of the BIOS setup, it seemed that the horizonal sync wasn’t working properly. It didn’t seem to be a big deal at the time. When I deployed it the KVM display that I had planned to use with it mostly didn’t display anything. When the display was working the KVM keyboard wouldn’t work (and would prevent a regular USB keyboard from working if they were both connected at the same time). The VGA output of the R710 also wouldn’t work with my VGA->HDMI device so I couldn’t get it working with my portable monitor.

Fortunately the Dell front panel has a display and tiny buttons that allow configuring the IDRAC IP address, so I was able to get IDRAC going. One thing Dell really should do is allow the down button to change 0 to 9 when entering numbers, that would make it easier to enter 8.8.8.8 for the DNS server. Another thing Dell should do is make the default gateway have a default value according to the IP address and netmask of the server.

When I got IDRAC going it was easy to setup a serial console, boot from a rescue USB device, create a new initrd with the driver for the MegaRAID controller, and then reboot into the server image.

When I transferred the SSDs from the old server to the newer Dell server the problem I had was that the Dell drive caddies had no holes in suitable places for attaching SSDs. I ended up just pushing the SSDs in so they are hanging in mid air attached only by the SATA/SAS connectors. Plugging them in took the space from the above drive, so instead of having 2*3.5″ disks I have 1*2.5″ SSD and need the extra space to get my hand in. The R710 is designed for 6*3.5″ disks and I’m going to have trouble if I ever want to have more than 3*2.5″ SSDs. Fortunately I don’t think I’ll need more SSDs.

After booting the system I started getting alerts about a “fault” in one SSD, with no detail on what the fault might be. My guess is that the SSD in question is M.2 and it’s in a M.2 to regular SATA adaptor which might have some problems. The data seems fine though, a BTRFS scrub found no checksum errors. I guess I’ll have to buy a replacement SSD soon.

I configured the system to use the “nosmt” kernel command line option to disable hyper-threading (which won’t provide much performance benefit but which makes certain types of security attacks much easier). I’ve configured BOINC to run on 6/8 CPU cores and surprisingly that didn’t cause the fans to be louder than when the system was idle. It seems that a system that is designed for 6 SAS disks doesn’t need a lot of cooling when run with SSDs.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

4

systemd-nspawn and Private Networking

Currently there’s two things I want to do with my PC at the same time, one is watching streaming services like ABC iView (which won’t run from non-Australian IP addresses) and another is torrenting over a VPN. I had considered doing something ugly with iptables to try and get routing done on a per-UID basis but that seemed to difficult. At the time I wasn’t aware of the ip rule add uidrange [1] option. So setting up a private networking namespace with a systemd-nspawn container seemed like a good idea.

Chroot Setup

For the chroot (which I use as a slang term for a copy of a Linux installation in a subdirectory) I used a btrfs subvol that’s a snapshot of the root subvol. The idea is that when I upgrade the root system I can just recreate the chroot with a new snapshot.

To get this working I created files in the root subvol which are used for the container.

I created a script like the following named /usr/local/sbin/container-sshd to launch the container. It sets up the networking and executes sshd. The systemd-nspawn program is designed to launch init but that’s not required, I prefer to just launch sshd so there’s only one running process in a container that’s not being actively used.

#!/bin/bash

# restorecon commands only needed for SE Linux
/sbin/restorecon -R /dev
/bin/mount none -t tmpfs /run
/bin/mkdir -p /run/sshd
/sbin/restorecon -R /run /tmp
/sbin/ifconfig host0 10.3.0.2 netmask 255.255.0.0
/sbin/route add default gw 10.2.0.1
exec /usr/sbin/sshd -D -f /etc/ssh/sshd_torrent_config

How to Launch It

To setup the container I used a command like “/usr/bin/systemd-nspawn -D /subvols/torrent -M torrent –bind=/home -n /usr/local/sbin/container-sshd“.

First I had tried the --network-ipvlan option which creates a new IP address on the same MAC address. That gave me an interface iv-br0 on the container that I could use normally (br0 being the bridge used in my workstation as it’s primary network interface). The IP address I assigned to that was in the same subnet as br0, but for some reason that’s unknown to me (maybe an interaction between bridging and network namespaces) I couldn’t access it from the host, I could only access it from other hosts on the network. I then tried the --network-macvlan option (to create a new MAC address for virtual networking), but that had the same problem with accessing the IP address from the local host outside the container as well as problems with MAC redirection to the primary MAC of the host (again maybe an interaction with bridging).

Then I tried just the “-n” option which gave it a private network interface. That created an interface named ve-torrent on the host side and one named host0 in the container. Using ifconfig and route to configure the interface in the container before launching sshd is easy. I haven’t yet determined a good way of configuring the host side of the private network interface automatically.

I had to use a bind for /home because /home is a subvol and therefore doesn’t get included in the container by default.

How it Works

Now when it’s running I can just “ssh -X” to the container and then run graphical programs that use the VPN while at the same time running graphical programs on the main host that don’t use the VPN.

Things To Do

Find out why --network-ipvlan and --network-macvlan don’t work with communication from the same host.

Find out why --network-macvlan gives errors about MAC redirection when pinging.

Determine a good way of setting up the host side after the systemd-nspawn program has run.

Find out if there are better ways of solving this problem, this way works but might not be ideal. Comments welcome.

9

4K Monitors

A couple of years ago a relative who uses a Linux workstation I support bought a 4K (4096*2160 resolution) monitor. That meant that I had to get 4K working, which was 2 years of pain for me and probably not enough benefit for them to justify it. Recently I had the opportunity to buy some 4K monitors at a low enough price that it didn’t make sense to refuse so I got to experience it myself.

The Need for 4K

I’m getting older and my vision is decreasing as expected. I recently got new glasses and got a pair of reading glasses as a reduced ability to change focus is common as you get older. Unfortunately I made a mistake when requesting the focus distance for the reading glasses and they work well for phones, tablets, and books but not for laptops and desktop computers. Now I have the option of either spending a moderate amount of money to buy a new pair of reading glasses or just dealing with the fact that laptop/desktop use isn’t going to be as good until the next time I need new glasses (sometime 2021).

I like having lots of terminal windows on my desktop. For common tasks I might need a few terminals open at a time and if I get interrupted in a task I like to leave the terminal windows for it open so I can easily go back to it. Having more 80*25 terminal windows on screen increases my productivity. My previous monitor was 2560*1440 which for years had allowed me to have a 4*4 array of non-overlapping terminal windows as well as another 8 or 9 overlapping ones if I needed more. 16 terminals allows me to ssh to lots of systems and edit lots of files in vi. Earlier this year I had found it difficult to read the font size that previously worked well for me so I had to use a larger font that meant that only 3*3 terminals would fit on my screen. Going from 16 non-overlapping windows and an optional 8 overlapping to 9 non-overlapping and an optional 6 overlapping is a significant difference. I could get a second monitor, and I won’t rule out doing so at some future time. But it’s not ideal.

When I got a 4K monitor working properly I found that I could go back to a smaller font that allowed 16 non overlapping windows. So I got a real benefit from a 4K monitor!

Video Hardware

Version 1.0 of HDMI released in 2002 only supports 1920*1080 (FullHD) resolution. Version 1.3 released in 2006 supported 2560*1440. Most of my collection of PCIe video cards have a maximum resolution of 1920*1080 in HDMI, so it seems that they only support HDMI 1.2 or earlier. When investigating this I wondered what version of PCIe they were using, the command “dmidecode |grep PCI” gives that information, seems that at least one PCIe video card supports PCIe 2 (released in 2007) but not HDMI 1.3 (released in 2006).

Many video cards in my collection support 2560*1440 with DVI but only 1920*1080 with HDMI. As 4K monitors don’t support DVI input that meant that when initially using a 4K monitor I was running in 1920*1080 instead of 2560*1440 with my old monitor.

I found that one of my old video cards supported 4K resolution, it has a NVidia GT630 chipset (here’s the page with specifications for that chipset [1]). It seems that because I have a video card with 2G of RAM I have the “Keplar” variant which supports 4K resolution. I got the video card in question because it uses PCIe*8 and I had a workstation that only had PCIe*8 slots and I didn’t feel like cutting a card down to size (which is apparently possible but not recommended), it is also fanless (quiet) which is handy if you don’t need a lot of GPU power.

A couple of months ago I checked the cheap video cards at my favourite computer store (MSY) and all the cheap ones didn’t support 4K resolution. Now it seems that all the video cards they sell could support 4K, by “could” I mean that a Google search of the chipset says that it’s possible but of course some surrounding chips could fail to support it.

The GT630 card is great for text, but the combination of it with a i5-2500 CPU (rating 6353 according to cpubenchmark.net [3]) doesn’t allow playing Netflix full-screen and on 1920*1080 videos scaled to full-screen sometimes gets mplayer messages about the CPU being too slow. I don’t know how much of this is due to the CPU and how much is due to the graphics hardware.

When trying the same system with an ATI Radeon R7 260X/360 graphics card (16* PCIe and draws enough power to need a separate connection to the PSU) the Netflix playback appears better but mplayer seems no better.

I guess I need a new PC to play 1920*1080 video scaled to full-screen on a 4K monitor. No idea what hardware will be needed to play actual 4K video. Comments offering suggestions in this regard will be appreciated.

Software Configuration

For GNOME apps (which you will probably run even if like me you use KDE for your desktop) you need to run commands like the following to scale menus etc:

gsettings set org.gnome.settings-daemon.plugins.xsettings overrides "[{'Gdk/WindowScalingFactor', <2>}]"
gsettings set org.gnome.desktop.interface scaling-factor 2

For KDE run the System Settings app, go to Display and Monitor, then go to Displays and Scale Display to scale things.

The Arch Linux Wiki page on HiDPI [2] is good for information on how to make apps work with high DPI (or regular screens for people with poor vision).

Conclusion

4K displays are still rather painful, both in hardware and software configuration. For serious computer use it’s worth the hassle, but it doesn’t seem to be good for general use yet. 2560*1440 is pretty good and works with much more hardware and requires hardly any software configuration.

1

KMail Crashing and LIBGL

One problem I’ve had recently on two systems with NVideo video cards is KMail crashing (SEGV) while reading mail. Sometimes it goes for months without having problems, and then it gets into a state where reading a few messages (or sometimes reading one particular message) causes a crash. The crash happens somewhere in the Mesa library stack.

In an attempt to investigate this I tried running KMail via ssh (as that precludes a lot of the GL stuff), but that crashed in a different way (I filed an upstream bug report [1]).

I have discovered a workaround for this issue, I set the environment variable LIBGL_ALWAYS_SOFTWARE=1 and then things work. At this stage I can’t be sure exactly where the problems are. As it’s certain KMail operations that trigger it I think that’s evidence of problems originating in KMail, but the end result when it happens often includes a kernel error log so there’s probably a problem in the Nouveau driver. I spent quite a lot of time investigating this, including recompiling most of the library stack with debugging mode and didn’t get much of a positive result. Hopefully putting it out there will help the next person who has such issues.

Here is a list of environment variables that can be set to debug LIBGL issues (strangely I couldn’t find documentation on this when Googling it). If you are stuck with a problem related to LIBGL you can try setting each of these to “1” in turn and see if it makes a difference. That can either be for the purpose of debugging a problem or creating a workaround that allows you to run the programs you need to run. I don’t know why GL is required to read email.

LIBGL_DIAGNOSTIC
LIBGL_ALWAYS_INDIRECT
LIBGL_ALWAYS_SOFTWARE
LIBGL_DRI3_DISABLE
LIBGL_NO_DRAWARRAYS
LIBGL_DEBUG
LIBGL_DRIVERS_PATH
LIBGL_DRIVERS_DIR
LIBGL_SHOW_FPS