Archives

Categories

ZFS 2.0.0 Released

Version 2.0 of ZFS has been released, it’s now known as OpenZFS and has a unified release for Linux and BSD which is nice.

One new feature is persistent L2ARC (which means that when you use SSD or NVMe to cache hard drives that cache will remain after a reboot) is an obvious feature that was needed for a long time.

The Zstd compression invented by Facebook is the best way of compressing things nowadays, it’s generally faster while giving better compression than all other compression algorithms and it’s nice to have that in the filesystem.

The PAM module for encrypted home directories is interesting, I haven’t had a need for directory level encryption as block device encryption has worked well for my needs. But there are good use cases for directory encryption.

I just did a quick test of OpenZFS on a VM, the one thing it doesn’t do is let me remove a block device accidentally added to a zpool. If you have a zpool with a RAID-Z array and mistakenly add a new disk as a separate device instead of replacing a part of the RAID-Z then you can’t remove it. There are patches to allow such removal but they didn’t appear to get in to the OpenZFS 2.0.0 release.

For my use ZFS on Linux 0.8.5 has been working fairly well and apart from the inability to remove a mistakenly added device I haven’t had any problems with it. So I have no immediate plans to upgrade existing servers. But if I was about to install a new ZFS server I’d definitely use the latest version.

People keep asking about license issues. I’m pretty sure that Oracle lawyers have known about sites like zfsonlinux.org for years, the fact that they have decided not to do anything about it seems like clear evidence that it’s OK for me to keep using that code.

KDE Icons Disappearing in Debian/Unstable

One of my workstations is running Debian/Unstable with KDE and SDDM on an AMD Radeon R7 260X video card. Recently it stopped displaying things correctly after a reboot, all the icons failed to display as well as many of the Qt controls. When I ran a KDE application from the command line I got the error “QSGTextureAtlas: texture atlas allocation failed, code=501“. Googling that error gave a blog post about a very similar issue in 2017 [1]. From that blog post I learned that I could stop the problem by setting MESA_EXTENSION_OVERRIDE=”-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888″ in the environment. In a quick test I found that the environment variable setting worked, making the KDE apps display correctly and not report an error about a texture atlas.

I created a file ~/.config/plasma-workspace/env/bgra.sh with the following contents:

export MESA_EXTENSION_OVERRIDE="-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888"

Then after the next login things worked as desired!

Now the issue is, where is the bug? GL, X, and the internals of KDE are things I don’t track much. I welcome suggestions from readers of my blog as to what the culprit might be and where to file a Debian bug – or a URL to a Debian bug report if someone has already filed one.

Update

When I run the game warzone2100 with this setting it crashes with the below output. So this Mesa extension override isn’t always a good thing, just solves one corner case of a bug.

$ warzone2100 
/usr/bin/gdb: warning: Couldn't determine a path for the index cache directory.
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
No frame at level 0x7ffc3392ab50.
Saved dump file to '/home/etbe/.local/share/warzone2100-3.3.0//logs/warzone2100.gdmp-VuGo2s'
If you create a bugreport regarding this crash, please include this file.
Segmentation fault (core dumped)

Update 2

Carsten provided the REAL solution to this, run “apt remove libqt5quick5-gles” which will automatically install “libqt5quick5” which makes things work. Another workstation I run that tracks Testing had libqt5quick5 installed which was why it didn’t have the problem.

The system in question had most of KDE removed due to package dependency issues when tracking Unstable and when I reinstalled it I guess the wrong one was installed.

Links November 2020

KDE has a long term problem of excessive CPU time used by the screen locker [1]. Part of it is due to software GL emulation, and part of it is due to the screen locker doing things like flashing the cursor when nothing else is happening. One of my systems has an NVidia card and enabling GL would cause it to crash. So now I have kscreenlocker using 30% of a CPU core even when the screen is powered down.

Informative NYT article about the latest security features for iPhones [2]. Android needs new features like this!

Russ Allbery wrote an interesting review of the book Hand to Mouth by Linda Tirado [3], it’s about poverty in the US and related things. Linda first became Internet famous for her essay “Why I Make Terrible Decisions or Poverty Thoughts” which is very insightful and well written, this is the latest iteration of that essay [4].

This YouTube video by Ruby Payne gives great insights to class based attitudes towards time and money [5].

News Week has an interesting article about chicken sashimi, apparently you can safely eat raw chicken if it’s prepared well [6].

Vanity Fair has an informative article about how Qanon and Trumpism have infected the Catholic Church [7]. Some of Mel Gibson’s mental illness is affecting a significant portion of the Catholic Church in the US and some parts in the rest of the world.

Noema has an interesting article on toxic Internet culture, Japan’s 2chan, 4chan, 8chan/8kun, and the conspiracy theories they spawned [8].

Benjamin Corey is an ex-Fundie who wrote an amusing analysis of the Biblical statements about the anti-Christ [9].

NYMag has an interesting article The Final Gasp of Donald Trump’s Presidency [10].

Mother Jones has an informative article about the fact that Jim Watkins (the main person behind QAnon) has a history of hosting child porn on sites he runs [11], but we all knew QAnon was never about protecting kids.

Eand has an insightful article America’s Problem is That White People Want It to Be a Failed State [12].

Video Decoding

I’ve had a saga of getting 4K monitors to work well. My latest issue has been video playing, the dreaded mplayer error about the system being too slow. My previous post about 4K was about using DisplayPort to get more than 30Hz scan rate at 4K [1]. I now have a nice 60Hz scan rate which makes WW2 documentaries display nicely among other things.

But when running a 4K monitor on a 3.3GHz i5-2500 quad-core CPU I can’t get a FullHD video to display properly. Part of the process of decoding the video and scaling it to 4K resolution is too slow, so action scenes in movies lag. When running a 2560*1440 monitor on a 2.4GHz E5-2440 hex-core CPU with the mplayer option “-lavdopts threads=3” everything is great (but it fails if mplayer is run with no parameters). In doing tests with apparent performance it seemed that the E5-2440 CPU gains more from the threaded mplayer code than the i5-2500, maybe the E5-2440 is more designed for server use (it’s in a Dell PowerEdge T320 while the i5-2500 is in a random white-box system) or maybe it’s just because it’s newer. I haven’t tested whether the i5-2500 system could perform adequately at 2560*1440 resolution.

The E5-2440 system has an ATI HD 6570 video card which is old, slow, and only does PCIe 2.1 which gives 5GT/s or 8GB/s. The i5-2500 system has a newer ATI video card that is capable of PCIe 3.0, but “lspci -vv” as root says “LnkCap: Port #0, Speed 8GT/s, Width x16” and “LnkSta: Speed 5GT/s (downgraded), Width x16 (ok)”. So for reasons unknown to me the system with a faster PCIe 3.0 video card is being downgraded to PCIe 2.1 speed. A quick check of the web site for my local computer store shows that all ATI video cards costing less than $300 have PCI3 3.0 interfaces and the sole ATI card with PCIe 4.0 (which gives double the PCIe speed if the motherboard supports it) costs almost $500. I’m not inclined to spend $500 on a new video card and then a greater amount of money on a motherboard supporting PCIe 4.0 and CPU and RAM to go in it.

According to my calculations 3840*2160 resolution at 24bpp (probably 32bpp data transfers) at 30 frames/sec means 3840*2160*4*30/1024/1024=950MB/s. PCIe 2.1 can do 8GB/s so that probably isn’t a significant problem.

I’d been planning on buying a new video card for the E5-2440 system, but due to some combination of having a better CPU and lower screen resolution it is working well for video playing so I can save my money.

As an aside the T320 is a server class system that had been running for years in a corporate DC. When I replaced the high speed SAS disks with SSDs SATA disks it became quiet enough for a home workstation. It works very well at that task but the BIOS is quite determined to keep the motherboard video running due to the remote console support. So swapping monitors around was more pain than I felt like going through, I just got it working and left it. I ordered a special GPU power cable but found that the older video card that doesn’t need an extra power cable performs adequately before the cable arrived.

Here is a table comparing the systems.

2560*1440 works well 3840*2160 goes slow
System Dell PowerEdge T320 White Box PC from rubbish
CPU 2.4GHz E5-2440 3.3GHz i5-2500
Video Card ATI Radeon HD 6570 ATI Radeon R7 260X
PCIe Speed PCIe 2.1 – 8GB/s PCIe 3.0 downgraded to PCIe 2.1 – 8GB/s

Conclusion

The ATI Radeon HD 6570 video card is one that I had previously tested and found inadequate for 4K support, I can’t remember if it didn’t work at that resolution or didn’t support more than 30Hz scan rate. If the 2560*1440 monitor dies then it wouldn’t make sense to buy anything less than a 4K monitor to replace it which means that I’d need to get a new video card to match. But for the moment 2560*1440 is working well enough so I won’t upgrade it any time soon. I’ve already got the special power cable (specified as being for a Dell PowerEdge R610 for anyone else who wants to buy one) so it will be easy to install a powerful video card in a hurry.

First Attempt at Gnocchi-Statsd

I’ve been investigating the options for tracking system statistics to diagnose performance problems. The idea is to track all sorts of data about the system (network use, disk IO, CPU, etc) and look for correlations at times of performance problems. DataDog is pretty good for this but expensive, it’s apparently based on or inspired by the Etsy Statsd. It’s claimed that the gnocchi-statsd is the best implementation of the protoco used by the Etsy Statsd, so I decided to install that.

I use Debian/Buster for this as that’s what I’m using for the hardware that runs KVM VMs. Here is what I did:

# it depends on a local MySQL database
apt -y install mariadb-server mariadb-client
# install the basic packages for gnocchi
apt -y install gnocchi-common python3-gnocchiclient gnocchi-statsd uuid

In the Debconf prompts I told it to “setup a database” and not to manage keystone_authtoken with debconf (because I’m not doing a full OpenStack installation).

This gave a non-working configuration as it didn’t configure the MySQL database for the [indexer] section and the sqlite database that was configured didn’t work for unknown reasons. I filed Debian bug #971996 about this [1]. To get this working you need to edit /etc/gnocchi/gnocchi.conf and change the url line in the [indexer] section to something like the following (where the password is taken from the [database] section).

url = mysql+pymysql://gnocchi-common:PASS@localhost:3306/gnocchidb

To get the statsd interface going you have to install the gnocchi-statsd package and edit /etc/gnocchi/gnocchi.conf to put a UUID in the resource_id field (the Debian package uuid is good for this). I filed Debian bug #972092 requesting that the UUID be set by default on install [2].

Here’s an official page about how to operate Gnocchi [3]. The main thing I got from this was that the following commands need to be run from the command-line (I ran them as root in a VM for test purposes but would do so with minimum privs for a real deployment).

gnocchi-api
gnocchi-metricd

To communicate with Gnocchi you need the gnocchi-api program running, which uses the uwsgi program to provide the web interface by default. It seems that this was written for a version of uwsgi different than the one in Buster. I filed Debian bug #972087 with a patch to make it work with uwsgi [4]. Note that I didn’t get to the stage of an end to end test, I just got it to basically run without error.

After getting “gnocchi-api” running (in a terminal not as a daemon as Debian doesn’t seem to have a service file for it), I ran the client program “gnocchi” and then gave it the “status” command which failed (presumably due to the metrics daemon not running), but at least indicated that the client and the API could communicate.

Then I ran the “gnocchi-metricd” and got the following error:

2020-10-12 14:59:30,491 [9037] ERROR    gnocchi.cli.metricd: Unexpected error during processing job
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 87, in run
    self._run_job()
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 248, in _run_job
    self.coord.update_capabilities(self.GROUP_ID, self.store.statistics)
  File "/usr/lib/python3/dist-packages/tooz/coordination.py", line 592, in update_capabilities
    raise tooz.NotImplemented
tooz.NotImplemented

At this stage I’ve had enough of gnocchi. I’ll give the Etsy Statsd a go next.

Update

Thomas has responded to this post [5]. At this stage I’m not really interested in giving Gnocchi another go. There’s still the issue of the indexer database which should be different from the main database somehow and sqlite (the config file default) doesn’t work.

I expect that if I was to persist with Gnocchi I would encounter more poorly described error messages from the code which either don’t have Google hits when I search for them or have Google hits to unanswered questions from 5+ years ago.

The Gnocchi systemd config files are in different packages to the programs, this confused me and I thought that there weren’t any systemd service files. I had expected that installing a package with a daemon binary would also get the systemd unit file to match.

The cluster features of Gnocchi are probably really good if you need that sort of thing. But if you have a small instance (EG a single VM server) then it’s not needed. Also one of the original design ideas of the Etsy Statsd was that UDP was used because data could just be dropped if there was a problem. I think for many situations the same concept could apply to the entire stats service.

If the other statsd programs don’t do what I need then I may give Gnocchi another go.

Bandwidth for Video Conferencing

For the Linux Users of Victoria (LUV) I’ve run video conferences on Jitsi and BBB (see my previous post about BBB vs Jitsi [1]). One issue with video conferences is the bandwidth requirements.

The place I’m hosting my video conference server has a NBN link with allegedly 40Mb/s transmission speed and 100Mb/s reception speed. My tests show that it can transmit at about 37Mb/s and receive at speeds significantly higher than that but also quite a bit lower than 100Mb/s (around 60 or 70Mb/s). For a video conference server you have a small number of sources of video and audio and a larger number of targets as usually most people will have their microphones muted and video cameras turned off. This means that the transmission speed is the bottleneck. In every test the reception speed was well below half the transmission speed, so the tests confirmed my expectation that transmission was the only bottleneck, but the reception speed was higher than I had expected.

When we tested bandwidth use the maximum upload speed we saw was about 4MB/s (32Mb/s) with 8+ video cameras and maybe 20 people seeing some of the video (with a bit of lag). We used 3.5MB/s (28Mb/s) when we only had 6 cameras which seemed to be the maximum for good performance.

In another test run we had 4 people all sending video and the transmission speed was about 260KB/s.

I don’t know how BBB manages the small versions of video streams. It might reduce the bandwidth when the display window is smaller.

I don’t know the resolutions of the cameras. When you start sending video in BBB you are prompted for the “quality” with “medium” being default. I don’t know how different camera hardware and different choices about “quality” affect bandwidth.

These tests showed that for the cameras we had available a small group of people video chatting a 100/40 NBN link (the fastest Internet link in Australia that’s not really expensive) a small group of people can be all sending video or a medium size group of people can watch video streams from a small group.

For meetings of the typical size of LUV meetings we won’t have a bandwidth problem.

There is one common case that I haven’t yet tested, where there is a single video stream that many people are watching. If 4 people are all sending video with 260KB/s transmission bandwidth then 1 person could probably send video to 4 for 65KB/s. Doing some simple calculations on those numbers implies that we could have 1 person sending video to 240 people without running out of bandwidth. I really doubt that would work, but further testing is needed.

Qemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

Links September 2020

MD5 cracker, find plain text that matches MD5 hash [1].

Debian Quick Image Baker – Debian VM images for various architectures [2].

Insightful article on Scientific American about how dental and orthodontic problems are caused by our modern lifestyle [3].

Cory Doctorow wrote an insightful article for Locus Magazine about Intellectual Property [4]. He makes the case that we are losing our rights due to changes in the legal system that benefits corporations.

Anarcat has an informative blog post about how to stop people using your Mailman installation to attack others [5]. Since version 1:2.1.27-1 the Debian package of Mailman has created a SUBSCRIBE_FORM_SECRET by default on install, but any installation upgraded from an older version of the Debian package won’t have it.

Burning Lithium Ion Batteries

I had an old Nexus 4 phone that was expanding and decided to test some of the theories about battery combustion.

The first claim that often gets made is that if the plastic seal on the outside of the battery is broken then the battery will catch fire. I tested this by cutting the battery with a craft knife. With every cut the battery sparked a bit and then when I levered up layers of the battery (it seems to be multiple flat layers of copper and black stuff inside the battery) there were more sparks. The battery warmed up, it’s plausible that in a confined environment that could get hot enough to set something on fire. But when the battery was resting on a brick in my backyard that wasn’t going to happen.

The next claim is that a Li-Ion battery fire will be increased with water. The first thing to note is that Li-Ion batteries don’t contain Lithium metal (the Lithium high power non-rechargeable batteries do). Lithium metal will seriously go off it exposed to water. But lots of other Lithium compounds will also react vigorously with water (like Lithium oxide for example). After cutting through most of the center of the battery I dripped some water in it. The water boiled vigorously and the corners of the battery (which were furthest away from the area I cut) felt warmer than they did before adding water. It seems that significant amounts of energy are released when water reacts with whatever is inside the Li-Ion battery. The reaction was probably giving off hydrogen gas but didn’t appear to generate enough heat to ignite hydrogen (which is when things would really get exciting). Presumably if a battery was cut in the presence of water while in an enclosed space that traps hydrogen then the sparks generated by the battery reacting with air could ignite hydrogen generated from the water and give an exciting result.

It seems that a CO2 fire extinguisher would be best for a phone/tablet/laptop fire as that removes oxygen and cools it down. If that isn’t available then a significant quantity of water will do the job, water won’t stop the reaction (it can prolong it), but it can keep the reaction to below 100C which means it won’t burn a hole in the floor and the range of toxic chemicals released will be reduced.

The rumour that a phone fire on a plane could do a “China syndrome” type thing and melt through the Aluminium body of the plane seems utterly bogus. I gave it a good try and was unable to get a battery to burn through it’s plastic and metal foil case. A spare battery for a laptop in checked luggage could be a major problem for a plane if it ignited. But a battery in the passenger area seems unlikely to be a big problem if plenty of water is dumped on it to prevent the plastic case from burning and polluting the air.

I was not able to get a result that was even worthy of a photograph. I may do further tests with laptop batteries.

Dell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.