Archives

Categories

SMART and SSDs

The Hetzner server that hosts my blog among other things has 2*256G SSDs for the root filesystem. The smartctl and smartd programs report both SSDs as in FAILING_NOW state for the Wear_Leveling_Count attribute. I don’t have a lot of faith in SMART. I run it because it would be stupid not to consider data about possible drive problems, but don’t feel obliged to immediately replace disks with SMART errors when not convenient (if I ordered a new server and got those results I would demand replacement before going live).

Doing any sort of SMART scan will cause service outage. Replacing devices means 2 outages, 1 for each device.

I noticed the SMART errors 2 weeks ago, so I guess that the SMART claims that both of the drives are likely to fail within 24 hours have been disproved. The system is running BTRFS so I know there aren’t any unseen data corruption issues and it uses BTRFS RAID-1 so if one disk has an unreadable sector that won’t cause data loss.

Currently Hetzner Server Bidding has ridiculous offerings for systems with SSD storage. Search for a server with 16G of RAM and SSD storage and the minimum prices are only 2E cheaper than a new server with 64G of RAM and 2*512G NVMe. In the past Server Bidding has had servers with specs not much smaller than the newest systems going for rates well below the costs of the newer systems. The current Hetzner server is under a contract from Server Bidding which is significantly cheaper than the current Server Bidding offerings, so financially it wouldn’t be a good plan to replace the server now.

Monitoring

I have just released a new version of etbe-mon [1] which has a new monitor for SMART data (smartctl). It also has a change to the sslcert check to search all IPv6 and IPv4 addresses for each hostname, makes freespace check look for filesystem mountpoint, and makes the smtpswaks check use latest swaks command-line.

For the new smartctl check there is an option to treat “Marginal” alert status from smartctl as errors and there is an option to name attributes that will be treated as marginal even if smartctl thinks they are significant. So now I have my monitoring system checking the SMART data on the servers of mine which have real hard drives (not VMs) and aren’t using RAID hardware that obscures such things. Also it’s not alerting me about the Wear_Leveling_Count on that particular Hetzner server.

Testing VDPAU in Debian

VDPAU is the Video Decode and Presentation API for Unix [1]. I noticed an error with mplayer “Failed to open VDPAU backend libvdpau_i965.so: cannot open shared object file: No such file or directory“, Googling that turned up Debian Bug #869815 [2] which suggested installing the packages vdpau-va-driver and libvdpau-va-gl1 and setting the environment variable “VDPAU_DRIVER=va_gl” to enable VPDAU.

The command vdpauinfo from the vdpauinfo shows the VPDAU capabilities, which showed that VPDAU was working with va_gl.

When mplayer was getting the error about a missing i915 driver it took 35.822s of user time and 1.929s of system time to play Self Control by Laura Branigan [3] (a good music video to watch several times while testing IMHO) on my Thinkpad Carbon X1 Gen1 with Intel video and a i7-3667U CPU. When I set “VDPAU_DRIVER=va_gl” mplayer took 50.875s of user time and 4.207s of system time but didn’t have the error.

It’s possible that other applications on my Thinkpad might benefit from VPDAU with the va_gl driver, but it seems unlikely that any will benefit to such a degree that it makes up for mplayer taking more time. It’s also possible that the Self Control video I tested with was a worst case scenario, but even so taking almost 50% more CPU time made it unlikely that other videos would get a benefit.

For this sort of video (640×480 resolution) it’s not a problem, 38 seconds of CPU time to play a 5 minute video isn’t a real problem (although it would be nice to use less battery). For a 1600*900 resolution video (the resolution of the laptop screen) it took 131 seconds of user time to play a 433 second video. That’s still not going to be a problem when playing on mains power but will suck a bit when on battery. Most Thinkpads have Intel video and some have NVidia as well (which has issues from having 2 video cards and from having poor Linux driver support). So it seems that the only option for battery efficient video playing on the go right now is to use a tablet.

On the upside, screen resolution is not increasing at a comparable rate to Moore’s law so eventually CPUs will get powerful enough to do all this without using much electricity.

Thinkpad Storage Problem

For a while I’ve had a problem with my Thinkpad X1 Carbon Gen 1 [1] where storage pauses for 30 seconds, it’s become more common recently and unfortunately everything seems to depend on storage (ideally a web browser playing a video from the Internet could keep doing so with no disk access, but that’s not what happens).

echo 3 > /sys/block/sda/device/timeout

I’ve put the above in /etc/rc.local which should make it only take 3 seconds to recover from storage problems instead of 30 seconds, the problem hasn’t recurred since that change so I don’t know if it works as desired. The Thinkpad is running BTRFS and no data is reported as being lost, so it’s not a problem to use it at this time.

The bottom of this post has an example of the errors I’m getting. A friend said that his searches for such errors shows people claiming that it’s a “SATA controller or cable” problem, which for a SATA device bolted directly to the motherboard means it’s likely to be a motherboard problem (IE more expensive to fix than the $289 I paid for the laptop almost 3 years ago).

A Lenovo forum discussion says that the X1 Carbon Gen1 uses an unusual connector that’s not M.2 SATA and not mSATA [2]. This means that a replacement is going to be expensive, $100US for a replacement from eBay when a M.2 SATA or NVMe device would cost about $50AU from my local store. It also means that I can’t use a replacement for anything else. If my laptop took a regular NVMe I’d buy one and test it out, if it didn’t solve the problem a spare NVMe device is always handy to have. But I’m not interested in spending $100US for a part that may turn out to be useless.

I bought the laptop expecting that there was nothing I could fix inside it. While it is theoretically possible to upgrade the CPU etc in a laptop most people find that the effort and expense makes it better to just replace the entire laptop. With the ultra light category of laptops the RAM is soldered on the motherboard and there are even less options for replacing things. So while it’s annoying that Lenovo didn’t use a standard interface for this device (they did so for other models in the range) it’s not a situation that I hadn’t expected.

Now the question is what to do next. The Thinkpad has a SD socket and micro SD cards are getting large capacities nowadays. If it won’t boot from a SD card I could boot from USB and then run from SD. So even if the internal SATA device fails entirely it should still be useful. I wonder if I could find a Thinkpad with a similar problem going cheap as I’m pretty sure that a Windows user couldn’t make any storage device usable the way I can with Linux.

I’ve looked at prices on auction sites but haven’t seen a Thinkpad X1 Carbon going for anywhere near the price I got this one. The nearest I’ve seen is around $350 for a laptop with significant damage.

While I am a little unhappy at Lenovo’s choice of interface, I’ve definitely got a lot more than $289 of value out of this laptop. So I’m not really complaining.

[315041.837612] ata1.00: status: { DRDY }
[315041.837613] ata1.00: failed command: WRITE FPDMA QUEUED
[315041.837616] ata1.00: cmd 61/20:48:28:1e:3e/00:00:00:00:00/40 tag 9 ncq dma 
16384 out
                         res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 
(timeout)                
[315041.837617] ata1.00: status: { DRDY }
[315041.837618] ata1.00: failed command: READ FPDMA QUEUED
[315041.837621] ata1.00: cmd 60/08:50:e0:26:84/00:00:00:00:00/40 tag 10 ncq 
dma 4096 in
                         res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 
(timeout)                
[315041.837622] ata1.00: status: { DRDY }
[315041.837625] ata1: hard resetting link
[315042.151781] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[315042.163368] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) 
succeeded
[315042.163370] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[315042.163372] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered 
out
[315042.183332] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) 
succeeded
[315042.183334] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) 
filtered out
[315042.183336] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered 
out
[315042.193332] ata1.00: configured for UDMA/133
[315042.193789] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[315042.193791] sd 0:0:0:0: [sda] tag#10 Sense Key : Illegal Request [current] 
[315042.193793] sd 0:0:0:0: [sda] tag#10 Add. Sense: Unaligned write command
[315042.193795] sd 0:0:0:0: [sda] tag#10 CDB: Read(10) 28 00 00 84 26 e0 00 00 
08 00
[315042.193797] print_req_error: I/O error, dev sda, sector 8660704
[315042.193810] ata1: EH complete

Phone Plan Comparison 2020

It’s been over 6 years since my last post comparing 3G data plans in Australia.

I’m currently using Aldi Mobile [1]. They have a $15 package which gives 3G of data and unlimited calls and SMS in Australia for 30 days, and a $25 package that gives 20G of data and unlimited international calls to 15 unspecified countries (presumably the US and much of the EU). It’s on the Telstra network so has good access all around Australia. I’m on the grandfathered $20 per 30 days plan which gives me more than 3G of data (not sure how much but I need more than 3G) while costing less than the current offer of $25.

Telco Cost Data Network Calling
Aldi $15/30 days 3G Telstra Unlimited National
Aldi $25/30 days 20G Telstra Unlimited International
Dodo [2] $5/month none Optus Unlimited National + Unlimited International SMS
Moose [3] $8.80/month 1G Optus 300min national calls + Unlimited National SMS
AmaySIM [4] $10/month 2G Optus Unlimited National
Kogan [5] $135/year 100G Vodafone Unlimited National
Kogan $180/year 180G Vodafone Unlimited National
Dodo $20/month 12G Optus Unlimited National + Unlimited International SMS + 100mins International talk

In the above table I put Aldi at the top because Telstra has the best network. If you want to use a mobile phone anywhere in Australia then Telstra is the best choice. If you only care about city areas then the other networks are usually good enough. The rest of the services are in order of price, cheapest first. For every service I only mentioned the offerings that are price competitive.

I didn’t list the offerings that are larger than these. There are offerings of 300G/year or more, but most people don’t need them. Every service in this table has more expensive offerings with more data.

Please inform me in the comments of any services I missed.

Electromagnetic Space Launch

The G-Force Wikipedia page says that humans can survive 20G horizontally “eyes in” for up to 10 seconds and 10G for 1 minute.

An accelerator of 14G for 10 seconds (well below the level that’s unsafe) gives a speed of mach 4 and an acceleration distance of just under 7km. Launching a 100 metric ton spacecraft in that way would require 14MW at the end of the launch path plus some extra for the weight of the part that contains magnets which would be retrieved by parachute. 14MW is a small fraction of the power used by a train or tram network and brown-outs of the transit network is something that they deal with so such a launch could be powered by diverting power from a transit network. The Rocky Mountains in the US peak at 4.4KM above sea level, so a magnetic launch that starts 2.6KM below sea level and extends the height of the Rocky Mountains would do.

A speed of mach 4 vertically would get a height of 96Km if we disregard drag, that’s almost 1/4 of the orbital altitude of the ISS. This seems like a more practical way to launch humans into space than a space elevator.

The Mass Driver page on Wikipedia documents some of the past research on launching satellites that way, with shorter launch hardware and significantly higher G forces.

Links December 2020

Business Insider has an informative article about the way that Google users can get locked out with no apparent reason and no recourse [1]. Something to share with clients when they consider putting everything in “the cloud”.

Vice has an interestoing article about people jailbreaking used Teslas after Tesla has stolen software licenses that were bought with the car [2].

The Atlantic has an interesting article titled This Article Won’t Change Your Mind [3]. It’s one of many on the topic of echo chambers but has some interesting points that others don’t seem to cover, such as regarding the benefits of groups when not everyone agrees.

Inequality.org has lots of useful information about global inequality [4].

Jeffrey Goldberg has an insightful interview with Barack Obama for the Atlantic about the future course of American politics and a retrospective on his term in office [5].

A Game Designer’s Analysis Of QAnon is an insightful Medium article comparing QAnon to an augmented reality game [6]. This is one of the best analysis of QAnon operations that I’ve seen.

Decrypting Rita is one of the most interesting web comics I’ve read [7]. It makes good use of side scrolling and different layers to tell multiple stories at once.

PC Mag has an article about the new features in Chrome 87 to reduce CPU use [8]. On my laptop I have 1/3 of all CPU time being used when it is idle, the majority of which is from Chrome. As the CPU has 2 cores this means the equivalent of 1 core running about 66% of the time just for background tabs. I have over 100 tabs open which I admit is a lot. But it means that the active tabs (as opposed to the plain HTML or PDF ones) are averaging more than 1% CPU time on an i7 which seems obviously unreasonable. So Chrome 87 doesn’t seem to live up to Google’s claims.

The movie Bad President starring Stormy Daniels as herself is out [9]. Poe’s Law is passe.

Interesting summary of Parler, seems that it was designed by the Russians [10].

Wired has an interesting article about Indistinguishability Obfuscation, how to encrypt the operation of a program [11].

Joerg Jaspert wrote an interesting blog post about the difficulties packagine Rust and Go for Debian [12]. I think that the problem is many modern languages aren’t designed well for library updates. This isn’t just a problem for Debian, it’s a problem for any long term support of software that doesn’t involve transferring a complete archive of everything and it’s a problem for any disconnected development (remote sites and sites dealing with serious security. Having an automatic system for downloading libraries is fine. But there should be an easy way of getting the same source via an archive format (zip will do as any archive can be converted to any other easily enough) and with version numbers.

ZFS 2.0.0 Released

Version 2.0 of ZFS has been released, it’s now known as OpenZFS and has a unified release for Linux and BSD which is nice.

One new feature is persistent L2ARC (which means that when you use SSD or NVMe to cache hard drives that cache will remain after a reboot) is an obvious feature that was needed for a long time.

The Zstd compression invented by Facebook is the best way of compressing things nowadays, it’s generally faster while giving better compression than all other compression algorithms and it’s nice to have that in the filesystem.

The PAM module for encrypted home directories is interesting, I haven’t had a need for directory level encryption as block device encryption has worked well for my needs. But there are good use cases for directory encryption.

I just did a quick test of OpenZFS on a VM, the one thing it doesn’t do is let me remove a block device accidentally added to a zpool. If you have a zpool with a RAID-Z array and mistakenly add a new disk as a separate device instead of replacing a part of the RAID-Z then you can’t remove it. There are patches to allow such removal but they didn’t appear to get in to the OpenZFS 2.0.0 release.

For my use ZFS on Linux 0.8.5 has been working fairly well and apart from the inability to remove a mistakenly added device I haven’t had any problems with it. So I have no immediate plans to upgrade existing servers. But if I was about to install a new ZFS server I’d definitely use the latest version.

People keep asking about license issues. I’m pretty sure that Oracle lawyers have known about sites like zfsonlinux.org for years, the fact that they have decided not to do anything about it seems like clear evidence that it’s OK for me to keep using that code.

KDE Icons Disappearing in Debian/Unstable

One of my workstations is running Debian/Unstable with KDE and SDDM on an AMD Radeon R7 260X video card. Recently it stopped displaying things correctly after a reboot, all the icons failed to display as well as many of the Qt controls. When I ran a KDE application from the command line I got the error “QSGTextureAtlas: texture atlas allocation failed, code=501“. Googling that error gave a blog post about a very similar issue in 2017 [1]. From that blog post I learned that I could stop the problem by setting MESA_EXTENSION_OVERRIDE=”-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888″ in the environment. In a quick test I found that the environment variable setting worked, making the KDE apps display correctly and not report an error about a texture atlas.

I created a file ~/.config/plasma-workspace/env/bgra.sh with the following contents:

export MESA_EXTENSION_OVERRIDE="-GL_EXT_bgra -GL_EXT_texture_format_BGRA8888"

Then after the next login things worked as desired!

Now the issue is, where is the bug? GL, X, and the internals of KDE are things I don’t track much. I welcome suggestions from readers of my blog as to what the culprit might be and where to file a Debian bug – or a URL to a Debian bug report if someone has already filed one.

Update

When I run the game warzone2100 with this setting it crashes with the below output. So this Mesa extension override isn’t always a good thing, just solves one corner case of a bug.

$ warzone2100 
/usr/bin/gdb: warning: Couldn't determine a path for the index cache directory.
27      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
No frame at level 0x7ffc3392ab50.
Saved dump file to '/home/etbe/.local/share/warzone2100-3.3.0//logs/warzone2100.gdmp-VuGo2s'
If you create a bugreport regarding this crash, please include this file.
Segmentation fault (core dumped)

Update 2

Carsten provided the REAL solution to this, run “apt remove libqt5quick5-gles” which will automatically install “libqt5quick5” which makes things work. Another workstation I run that tracks Testing had libqt5quick5 installed which was why it didn’t have the problem.

The system in question had most of KDE removed due to package dependency issues when tracking Unstable and when I reinstalled it I guess the wrong one was installed.

Links November 2020

KDE has a long term problem of excessive CPU time used by the screen locker [1]. Part of it is due to software GL emulation, and part of it is due to the screen locker doing things like flashing the cursor when nothing else is happening. One of my systems has an NVidia card and enabling GL would cause it to crash. So now I have kscreenlocker using 30% of a CPU core even when the screen is powered down.

Informative NYT article about the latest security features for iPhones [2]. Android needs new features like this!

Russ Allbery wrote an interesting review of the book Hand to Mouth by Linda Tirado [3], it’s about poverty in the US and related things. Linda first became Internet famous for her essay “Why I Make Terrible Decisions or Poverty Thoughts” which is very insightful and well written, this is the latest iteration of that essay [4].

This YouTube video by Ruby Payne gives great insights to class based attitudes towards time and money [5].

News Week has an interesting article about chicken sashimi, apparently you can safely eat raw chicken if it’s prepared well [6].

Vanity Fair has an informative article about how Qanon and Trumpism have infected the Catholic Church [7]. Some of Mel Gibson’s mental illness is affecting a significant portion of the Catholic Church in the US and some parts in the rest of the world.

Noema has an interesting article on toxic Internet culture, Japan’s 2chan, 4chan, 8chan/8kun, and the conspiracy theories they spawned [8].

Benjamin Corey is an ex-Fundie who wrote an amusing analysis of the Biblical statements about the anti-Christ [9].

NYMag has an interesting article The Final Gasp of Donald Trump’s Presidency [10].

Mother Jones has an informative article about the fact that Jim Watkins (the main person behind QAnon) has a history of hosting child porn on sites he runs [11], but we all knew QAnon was never about protecting kids.

Eand has an insightful article America’s Problem is That White People Want It to Be a Failed State [12].

Video Decoding

I’ve had a saga of getting 4K monitors to work well. My latest issue has been video playing, the dreaded mplayer error about the system being too slow. My previous post about 4K was about using DisplayPort to get more than 30Hz scan rate at 4K [1]. I now have a nice 60Hz scan rate which makes WW2 documentaries display nicely among other things.

But when running a 4K monitor on a 3.3GHz i5-2500 quad-core CPU I can’t get a FullHD video to display properly. Part of the process of decoding the video and scaling it to 4K resolution is too slow, so action scenes in movies lag. When running a 2560*1440 monitor on a 2.4GHz E5-2440 hex-core CPU with the mplayer option “-lavdopts threads=3” everything is great (but it fails if mplayer is run with no parameters). In doing tests with apparent performance it seemed that the E5-2440 CPU gains more from the threaded mplayer code than the i5-2500, maybe the E5-2440 is more designed for server use (it’s in a Dell PowerEdge T320 while the i5-2500 is in a random white-box system) or maybe it’s just because it’s newer. I haven’t tested whether the i5-2500 system could perform adequately at 2560*1440 resolution.

The E5-2440 system has an ATI HD 6570 video card which is old, slow, and only does PCIe 2.1 which gives 5GT/s or 8GB/s. The i5-2500 system has a newer ATI video card that is capable of PCIe 3.0, but “lspci -vv” as root says “LnkCap: Port #0, Speed 8GT/s, Width x16” and “LnkSta: Speed 5GT/s (downgraded), Width x16 (ok)”. So for reasons unknown to me the system with a faster PCIe 3.0 video card is being downgraded to PCIe 2.1 speed. A quick check of the web site for my local computer store shows that all ATI video cards costing less than $300 have PCI3 3.0 interfaces and the sole ATI card with PCIe 4.0 (which gives double the PCIe speed if the motherboard supports it) costs almost $500. I’m not inclined to spend $500 on a new video card and then a greater amount of money on a motherboard supporting PCIe 4.0 and CPU and RAM to go in it.

According to my calculations 3840*2160 resolution at 24bpp (probably 32bpp data transfers) at 30 frames/sec means 3840*2160*4*30/1024/1024=950MB/s. PCIe 2.1 can do 8GB/s so that probably isn’t a significant problem.

I’d been planning on buying a new video card for the E5-2440 system, but due to some combination of having a better CPU and lower screen resolution it is working well for video playing so I can save my money.

As an aside the T320 is a server class system that had been running for years in a corporate DC. When I replaced the high speed SAS disks with SSDs SATA disks it became quiet enough for a home workstation. It works very well at that task but the BIOS is quite determined to keep the motherboard video running due to the remote console support. So swapping monitors around was more pain than I felt like going through, I just got it working and left it. I ordered a special GPU power cable but found that the older video card that doesn’t need an extra power cable performs adequately before the cable arrived.

Here is a table comparing the systems.

2560*1440 works well 3840*2160 goes slow
System Dell PowerEdge T320 White Box PC from rubbish
CPU 2.4GHz E5-2440 3.3GHz i5-2500
Video Card ATI Radeon HD 6570 ATI Radeon R7 260X
PCIe Speed PCIe 2.1 – 8GB/s PCIe 3.0 downgraded to PCIe 2.1 – 8GB/s

Conclusion

The ATI Radeon HD 6570 video card is one that I had previously tested and found inadequate for 4K support, I can’t remember if it didn’t work at that resolution or didn’t support more than 30Hz scan rate. If the 2560*1440 monitor dies then it wouldn’t make sense to buy anything less than a 4K monitor to replace it which means that I’d need to get a new video card to match. But for the moment 2560*1440 is working well enough so I won’t upgrade it any time soon. I’ve already got the special power cable (specified as being for a Dell PowerEdge R610 for anyone else who wants to buy one) so it will be easy to install a powerful video card in a hurry.