etbe - Russell Coker

Storage Trends 2021

The Viability of Small Disks

Less than a year ago I wrote a blog post about storage trends [1]. My main point in that post was that disks smaller than 2TB weren’t viable then and 2TB disks wouldn’t be economically viable in the near future.

Now MSY has 2TB disks for $72 and 2TB SSD for $245, saving $173 if you get a hard drive (compared to saving $240 10 months ago). Given the difference in performance and noise 2TB hard drives won’t be worth using for most applications nowadays.

NVMe vs SSD

Last year NVMe prices were very comparable for SSD prices, I was hoping that trend would continue and SSDs would go away. Now for sizes 1TB and smaller NVMe and SSD prices are very similar, but for 2TB the NVMe prices are twice that of SSD – presumably partly due to poor demand for 2TB NVMe. There are also no NVMe devices larger than 2TB on sale at MSY (a store which caters to home stuff not special server equipment) but SSDs go up to 8TB.

It seems that NVMe is only really suitable for workstation storage and for cache etc on a server. So SATA SSDs will be around for a while.

Small Servers

There are a range of low end servers which support a limited number of disks. Dell has 2 disk servers and 4 disk servers. If one of those had 8TB SSDs you could have 8TB of RAID-1 or 24TB of RAID-Z storage in a low end server. That covers the vast majority of servers (small business or workgroup servers tend to have less than 8TB of storage).

Larger Servers

Anandtech has an article on Seagates roadmap to 120TB disks [2]. They currently sell 20TB disks using HAMR technology

Currently the biggest disks that MSY sells are 10TB for $395, which was also the biggest disk they were selling last year. Last year MSY only sold SSDs up to 2TB in size (larger ones were available from other companies at much higher prices), now they sell 8TB SSDs for $949 (4* capacity increase in less than a year). Seagate is planning 30TB disks for 2023, if SSDs continue to increase in capacity by 4* per year we could have 128TB SSDs in 2023. If you needed a server with 100TB of storage then having 2 or 3 SSDs in a RAID array would be much easier to manage and faster than 4*30TB disks in an array.

When you have a server with many disks you can expect to have more disk failures due to vibration. One time I built a server with 18 disks and took disks from 2 smaller servers that had 4 and 5 disks. The 9 disks which had been working reliably for years started having problems within weeks of running in the bigger server. This is one of the many reasons for paying extra for SSD storage.

Seagate is apparently planning 50TB disks for 2026 and 100TB disks for 2030. If that’s the best they can do then SSD vendors should be able to sell larger products sooner at prices that are competitive. Matching hard drive prices is not required, getting to less than 4* the price should be enough for most customers.

The Anandtech article is worth reading, it mentions some interesting features that Seagate are developing such as having 2 actuators (which they call Mach.2) so the drive can access 2 different tracks at the same time. That can double the performance of a disk, but that doesn’t change things much when SSDs are more than 100* faster. Presumably the Mach.2 disks will be SAS and incredibly expensive while providing significantly less performance than affordable SATA SSDs.

Computer Cases

In my last post I speculated on the appearance of smaller cases designed to not have DVD drives or 3.5″ hard drives. Such cases still haven’t appeared apart from special purpose machines like the NUC that were available last year.

It would be nice if we could get a new industry standard for smaller power supplies. Currently power supplies are expected to be almost 5 inches wide (due to the expectation of a 5.25″ DVD drive mounted horizontally). We need some industry standards for smaller PCs that aren’t like the NUC, the NUC is very nice, but most people who build their own PC need more space than that. I still think that planning on USB DVD drives is the right way to go. I’ve got 4PCs in my home that are regularly used and CDs and DVDs are used so rarely that sharing a single DVD drive among all 4 wouldn’t be a problem.

Conclusion

I’m tempted to get a couple of 4TB SSDs for my home server which cost $487 each, it currently has 2*500G SSDs and 3*4TB disks. I would have to remove some unused files but that’s probably not too hard to do as I have lots of old backups etc on there. Another possibility is to use 2*4TB SSDs for most stuff and 2*4TB disks for backups.

I’m recommending that all my clients only use SSDs for their storage. I only have one client with enough storage that disks are the only option (100TB of storage) but they moved all the functions of that server to AWS and use S3 for the storage. Now I don’t have any clients doing anything with storage that can’t be done in a better way on SSD for a price difference that’s easy for them to afford.

Affordable SSD also makes RAID-1 in workstations more viable. 2 disks in a PC is noisy if you have an office full of them and produces enough waste heat to be a reliability issue (most people don’t cool their offices adequately on weekends). 2 SSDs in a PC is no problem at all. As 500G SSDs are available for $73 it’s not a significant cost to install 2 of them in every PC in the office (more cost for my time than hardware). I generally won’t recommend that hard drives be replaced with SSDs in systems that are working well. But if a machine runs out of space then replacing it with SSDs in a RAID-1 is a good choice.

Moore’s law might cover SSDs, but it definitely doesn’t cover hard drives. Hard drives have fallen way behind developments of most other parts of computers over the last 30 years, hopefully they will go away soon.

Censoring Images

A client asked me to develop a system for “censoring” images from an automatic camera. The situation is that we have a camera taking regular photos from a fixed location which includes part of someone else’s property. So my client made a JPEG with some black rectangles in the sections that need to be covered. The first thing I needed to do was convert the JPEG to a PNG with transparency for the sections that aren’t to be covered.

To convert it I loaded the JPEG in the GIMP and went to the Layer->Transparency->Add Alpha Channel menu to enabled the Alpha channel. Then I selected the “Bucket Fill tool” and used “Mode Erase” and “Fill by Composite” and then clicked on the background (the part of the JPEG that was white) to make it transparent. Then I exported it to PNG.

If anyone knows of an easy way to convert the file then please let me know. It would be nice if there was a command-line program I could run to convert a specified color (default white) to transparent. I say this because I can imagine my client going through a dozen iterations of an overlay file that doesn’t quite fit.

To censor the image I ran the “composite” command from imagemagick. The command I used was “composite -gravity center overlay.png in.jpg out.jpg“. If anyone knows a better way of doing this then please let me know.

The platform I’m using is a ARM926EJ-S rev 5 (v5l) which takes 8 minutes of CPU time to convert a single JPEG at full DSLR resolution (4 megapixel). It also required enabling swap on a SD card to avoid running out of RAM and running “systemctl disable tmp.mount” to stop using tmpfs for /tmp as the system only has 256M of RAM.

Links February 2021

Elestic Search gets a new license to deal with AWS not paying them [1]. Of course AWS will fork the products in question. We need some anti-trust action against Amazon.

Big Think has an interesting article about what appears to be ritualistic behaviour in chompanzees [2]. The next issue is that if they are developing a stone-age culture does that mean we should treat them differently from other less developed animals?

Last Week in AWS has an informative article about Parler’s new serverless architecture [3]. They explain why it’s not easy to move away from a cloud platform even for a service that’s designed to not be dependent on it. The moral of the story is that running a service so horrible that none of the major cloud providers will touch it doesn’t scale.

Patheos has an insightful article about people who spread the most easily disproved lies for their religion [4]. A lot of political commentary nowadays is like that.

Indi Samarajiva wrote an insightful article comparing terrorism in Sri Lanka with the right-wing terrorism in the US [5]. The conclusion is that it’s only just starting in the US.

Belling Cat has an interesting article about the FSB attempt to murder Russian presidential candidate Alexey Navalny [6].

Russ Allbery wrote an interesting review of Anti-Social, a book about the work of an anti-social behavior officer in the UK [7]. The book (and Russ’s review) has some good insights into how crime can be reduced. Of course a large part of that is allowing people who want to use drugs to do so in an affordable way.

Informative post from Electrical Engineering Materials about the difference between KVW and KW [8]. KVA is bigger than KW, sometimes a lot bigger.

Arstechnica has an interesting but not surprising article about a “supply chain” attack on software development [9]. Exploiting the way npm and similar tools resolve dependencies to make them download hostile code. There is no possibility of automatic downloads being OK for security unless they are from known good sites that don’t allow random people to upload. Any sort of system that allows automatic download from sites like the Node or Python repositories, Github, etc is ripe for abuse. I think the correct solution is to have dependencies installed manually or automatically from a distribution like Debian, Ubuntu, Fedora, etc where there have been checks on the source of the source.

Devon Price wrote an insightful Medium article “Laziness Does Not Exist” about the psychological factors which can lead to poor results that many people interpret as “laziness” [10]. Everyone who supervises other people’s work should read this.

Links January 2021

Krebs on Security has an informative article about web notifications and how they are being used for spamming and promoting malware [1]. He also includes links for how to permanently disable them. If nothing else clicking “no” on each new site that wants to send notifications is annoying.

Michael Stapelberg wrote an insightful posts about inefficiencies in the Debian development processes [2]. While I agree with most of his assessment of Debian issues I am not going to decrease my involvement in Debian. Of the issues he mentions the 2 that seem to have the best effort to reward ratio are improvements to mailing list archives (to ideally make it practical to post to lists without subscribing and read responses in the archives) and the issues of forgetting all the complexities of the development process which can be alleviated by better Wiki pages. In my Debian work I’ve contributed more to the Wiki in recent times but not nearly as much as I should.

Jacobin has an insightful article “Ending Poverty in the United States Would Actually Be Pretty Easy” [3].

Mark Brown wrote an interesting blog post about the Rust programming language [4]. He links to a couple of longer blog posts about it. Rust has some great features and I’ve been meaning to learn it.

Scientific America has an informative article about research on the spread of fake news and memes [5]. Something to consider when using social media.

Bruce Schneier wrote an insightful blog post on whether there should be limits on persuasive technology [6].

Jonathan Dowland wrote an interesting blog post about git rebasing and lab books [7]. I think it’s an interesting thought experiment to compare the process of developing code worthy of being committed to a master branch of a VCS to the process of developing a Ph.D thesis.

CBS has a disturbing article about the effect of Covid19 on people’s lungs [8]. Apparently it usually does more lung damage than long-term smoking and even 70%+ of people who don’t have symptoms of the disease get significant lung damage. People who live in heavily affected countries like the US now have to worry that they might have had the disease and got lung damage without knowing it.

Russ Allbery wrote an interesting review of the book “Because Internet” about modern linguistics [9]. The topic is interesting and I might read that book at some future time (I have many good books I want to read).

Jonathan Carter wrote an interesting blog post about CentOS Streams and why using a totally free OS like Debian is going to be a better option for most users [10].

Linus has slammed Intel for using ECC support as a way of segmenting the market between server and desktop to maximise profits [11]. It would be nice if a company made a line of Ryzen systems with ECC RAM support, but most manufacturers seem to be in on the market segmentation scam.

Russ Allbery wrote an interesting review of the book “Can’t Even” about millenials as the burnout generation and the blame that the corporate culture deserves for this [12].

PSI and Cgroup2

In the comments on my post about Load Average Monitoring [1] an anonymous person recommended that I investigate PSI. As an aside, why do I get so many great comments anonymously? Don’t people want to get credit for having good ideas and learning about new technology before others?

PSI is the Pressure Stall Information subsystem for Linux that is included in kernels 4.20 and above, if you want to use it in Debian then you need a kernel from Testing or Unstable (Bullseye has kernel 4.19). The place to start reading about PSI is the main Facebook page about it, it was originally developed at Facebook [2].

I am a little confused by the actual numbers I get out of PSI, while for the load average I can often see where they come from (EG have 2 processes each taking 100% of a core and the load average will be about 2) it’s difficult to work out where the PSI numbers come from. For my own use I decided to treat them as unscaled numbers that just indicate problems, higher number is worse and not worry too much about what the number really means.

With the cgroup2 interface which is supported by the version of systemd in Testing (and which has been included in Debian backports for Buster) you get PSI files for each cgroup. I’ve just uploaded version 1.3.5-2 of etbemon (package mon) to Debian/Unstable which displays the cgroups with PSI numbers greater than 0.5% when the load average test fails.

System CPU Pressure: avg10=0.87 avg60=0.99 avg300=1.00 total=20556310510
/system.slice avg10=0.86 avg60=0.92 avg300=0.97 total=18238772699
/system.slice/system-tor.slice avg10=0.85 avg60=0.69 avg300=0.60 total=11996599996
/system.slice/system-tor.slice/tor@default.service avg10=0.83 avg60=0.69 avg300=0.59 total=5358485146

System IO Pressure: avg10=18.30 avg60=35.85 avg300=42.85 total=310383148314
 full avg10=13.95 avg60=27.72 avg300=33.60 total=216001337513
/system.slice avg10=2.78 avg60=3.86 avg300=5.74 total=51574347007
/system.slice full avg10=1.87 avg60=2.87 avg300=4.36 total=35513103577
/system.slice/mariadb.service avg10=1.33 avg60=3.07 avg300=3.68 total=2559016514
/system.slice/mariadb.service full avg10=1.29 avg60=3.01 avg300=3.61 total=2508485595
/system.slice/matrix-synapse.service avg10=2.74 avg60=3.92 avg300=4.95 total=20466738903
/system.slice/matrix-synapse.service full avg10=2.74 avg60=3.92 avg300=4.95 total=20435187166

Above is an extract from the output of the loadaverage check. It shows that tor is a major user of CPU time (the VM runs a ToR relay node and has close to 100% of one core devoted to that task). It also shows that Mariadb and Matrix are the main users of disk IO. When I installed Matrix the Debian package told me that using SQLite would give lower performance than MySQL, but that didn’t seem like a big deal as the server only has a few users. Maybe I should move Matrix to the Mariadb instance. to improve overall system performance.

So far I have not written any code to display the memory PSI files. I don’t have a lack of RAM on systems I run at the moment and don’t have a good test case for this. I welcome patches from people who have the ability to test this and get some benefit from it.

We are probably about 6 months away from a new release of Debian and this is probably the last thing I need to do to make etbemon ready for that.

RISC-V and Qemu

RISC-V is the latest RISC architecture that’s become popular. It is the 5th RISC architecture from the University of California Berkeley. It seems to be a competitor to ARM due to not having license fees or restrictions on alterations to the architecture (something you have to pay extra for when using ARM). RISC-V seems the most popular architecture to implement in FPGA.

When I first tried to run RISC-V under QEMU it didn’t work, which was probably due to running Debian/Unstable on my QEMU/KVM system and there being QEMU bugs in Unstable at the time. I have just tried it again and got it working.

The Debian Wiki page about RISC-V is pretty good [1]. The instructions there got it going for me. One thing I wasted some time on before reading that page was trying to get a netinst CD image, which is what I usually do for setting up a VM. Apparently there isn’t RISC-V hardware that boots from a CD/DVD so there isn’t a Debian netinst CD image. But debootstrap can install directly from the Debian web server (something I’ve never wanted to do in the past) and that gave me a successful installation.

Here are the commands I used to setup the base image:

apt-get install debootstrap qemu-user-static binfmt-support debian-ports-archive-keyring

debootstrap --arch=riscv64 --keyring /usr/share/keyrings/debian-ports-archive-keyring.gpg --include=debian-ports-archive-keyring unstable /mnt/tmp http://deb.debian.org/debian-ports

I first tried running RISC-V Qemu on Buster, but even ls didn’t work properly and the installation failed.

chroot /mnt/tmp bin/bash
# ls -ld .
/usr/bin/ls: cannot access '.': Function not implemented

When I ran it on Unstable ls works but strace doesn’t work in a chroot, this gave enough functionality to complete the installation.

chroot /mnt/tmp bin/bash
# strace ls -l
/usr/bin/strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Function not implemented
/usr/bin/strace: ptrace(PTRACE_TRACEME, ...): Function not implemented
/usr/bin/strace: PTRACE_SETOPTIONS: Function not implemented
/usr/bin/strace: detach: waitpid(1602629): No child processes
/usr/bin/strace: Process 1602629 detached

When running the VM the operation was noticably slower than the emulation of PPC64 and S/390x which both ran at an apparently normal speed. When running on a server with equivalent speed CPU a ssh login was obviously slower due to the CPU time taken for encryption, a ssh connection from a system on the same LAN took 6 seconds to connect. I presume that because RISC-V is a newer architecture there hasn’t been as much effort made on optimising the Qemu emulation and that a future version of Qemu will be faster. But I don’t think that Debian/Bullseye will give good Qemu performance for RISC-V, probably more changes are needed than can happen before the freeze. Maybe a version of Qemu with better RISC-V performance can be uploaded to backports some time after Bullseye is released.

Here’s the Qemu command I use to run RISC-V emulation:

qemu-system-riscv64 -machine virt -device virtio-blk-device,drive=hd0 -drive file=/vmstore/riscv,format=raw,id=hd0 -device virtio-blk-device,drive=hd1 -drive file=/vmswap/riscv,format=raw,id=hd1 -m 1024 -kernel /boot/riscv/vmlinux-5.10.0-1-riscv64 -initrd /boot/riscv/initrd.img-5.10.0-1-riscv64 -nographic -append net.ifnames=0 noresume security=selinux root=/dev/vda ro -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -device virtio-net-device,netdev=net0,mac=02:02:00:00:01:03 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper

Currently the program /usr/sbin/sefcontext_compile from the selinux-utils package needs execmem access on RISC-V while it doesn’t on any other architecture I have tested. I don’t know why and support for debugging such things seems to be in early stages of development, for example the execstack program doesn’t work on RISC-V now.

RISC-V emulation in Unstable seems adequate for people who are serious about RISC-V development. But if you want to just try a different architecture then PPC64 and S/390 will work better.

[1] https://wiki.debian.org/RISC-V

Monopoly the Game

The Smithsonian Mag has an informative article about the history of the game Monopoly [1]. The main point about Monopoly teaching about the problems of inequality is one I was already aware of, but there are some aspects of the history that I learned from the article.

Here’s an article about using modified version of Monopoly to teach Sociology [2].

Maria Paino and Jeffrey Chin wrote an interesting paper about using Monopoly with revised rules to teach Sociology [3]. They publish the rules which are interesting and seem good for a class.

I think it would be good to have some new games which can teach about class differences. Maybe have an “Escape From Poverty” game where you have choices that include drug dealing to try and improve your situation or a cooperative game where people try to create a small business. While Monopoly can be instructive it’s based on the economic circumstances of the past. The vast majority of rich people aren’t rich from land ownership.

Planet Linux Australia

Linux Australia have decided to cease running the Planet installation on planet.linux.org.au. I believe that blogging is still useful and a web page with a feed of Australian Linux blogs is a useful service. So I have started running a new Planet Linux Australia on https://planet.luv.asn.au/. There has been discussion about getting some sort of redirection from the old Linux Australia page, but they don’t seem able to do that.

If you have a blog that has a reasonable portion of Linux and FOSS content and is based in or connected to Australia then email me on russell at coker.com.au to get it added.

When I started running this I took the old list of feeds from planet.linux.org.au, deleted all blogs that didn’t have posts for 5 years and all blogs that were broken and had no recent posts. I emailed people who had recently broken blogs so they could fix them. It seems that many people who run personal blogs aren’t bothered by a bit of downtime.

As an aside I would be happy to setup the monitoring system I use to monitor any personal web site of a Linux person and notify them by Jabber or email of an outage. I could set it to not alert for a specified period (10 mins, 1 hour, whatever you like) so it doesn’t alert needlessly on routine sysadmin work and I could have it check SSL certificate validity as well as the basic page header.

Weather and Boinc

I just wrote a Perl script to look at the Australian Bureau of Meteorology pages to find the current temperature in an area and then adjust BOINC settings accordingly. The Perl script (in this post after the break, which shouldn’t be in the RSS feed) takes the URL of a Bureau of Meteorology observation point as ARGV[0] and parses that to find the current (within the last hour) temperature. Then successive command line arguments are of the form “24:100” and “30:50” which indicate that at below 24C 100% of CPU cores should be used and below 30C 50% of CPU cores should be used. In warm weather having a couple of workstations in a room running BOINC (or any other CPU intensive task) will increase the temperature and also make excessive noise from cooling fans.

To change the number of CPU cores used the script changes /etc/boinc-client/global_prefs_override.xml and then tells BOINC to reload that config file. This code is a little ugly (it doesn’t properly parse XML, it just replaces a line of text) and could fail on a valid configuration file that wasn’t produced by the current BOINC code.

The parsing of the BoM page is a little ugly too, it relies on the HTML code in the BoM page – they could make a page that looks identical which breaks the parsing or even a page that contains the same data that looks different. It would be nice if the BoM published some APIs for getting the weather. One thing that would be good is TXT records in the DNS. DNS supports caching with specified lifetime and is designed for high throughput in aggregate. If you had a million IOT devices polling the current temperature and forecasts every minute via DNS the people running the servers wouldn’t even notice the load, while a million devices polling a web based API would be a significant load. As an aside I recommend playing nice and only running such a script every 30 minutes, the BoM page seems to be updated on the half hour so I have my cron jobs running at 5 and 35 minutes past the hour.

If this code works for you then that’s great. If it merely acts as an inspiration for developing your own code then that’s great too! BOINC users outside Australia could replace the code for getting meteorological data (or even interface to a digital thermometer). Australians who use other CPU intensive batch jobs could take the BoM parsing code and replace the BOINC related code. If you write scripts inspired by this please blog about it and comment here with a link to your blog post.

Continue reading Weather and Boinc

MPV vs Mplayer

After writing my post about VDPAU in Debian [1] I received two great comments from anonymous people. One pointed out that I should be using VA-API (also known as VAAPI) on my Intel based Thinkpad and gave a reference to an Arch Linux Wiki page, as usual Arch Linux Wiki is awesome and I learnt a lot of great stuff there. I also found the Debian Wiki page on Hardware Video Acceleration [2] which has some good information (unfortunately I had already found all that out through more difficult methods first, I should read the Debian Wiki more often.

It seems that mplayer doesn’t suppoer VAAPI. The other comment suggested that I try the mpv fork of Mplayer which does support VAAPI but that feature is disabled by default in Debian.

I did a number of tests on playing different videos on my laptop running Debian/Buster with Intel video and my workstation running Debian/Unstable with ATI video. The first thing I noticed is that mpv was unable to use VAAPI on my laptop and that VDPAU won’t decode VP9 videos on my workstation and most 4K videos from YouTube seem to be VP9. So in most cases hardware decoding isn’t going to help me.

The Wikipedia page about Unified Video Decoder [3] shows that only VCN (Video Core Next) supports VP9 decoding while my R7-260x video card [4] has version 4.2 of the Unified Video Decoder which doesn’t support VP9, H.265, or JPEG. Basically I need a new high-end video card to get VP9 decoding and that’s not something I’m interested in buying now (I only recently bought this video card to do 4K at 60Hz).

The next thing I noticed is that for my combination of hardware and software at least mpv tends to take about 2/3 the CPU time to play videos that mplayer does on every video I tested. So it seems that using mpv will save me 1/3 of the power and heat from playing videos on my laptop and save me 1/3 of the CPU power on my workstation in the worst case while sometimes saving me significantly more than that.

Conclusion

To summarise quite a bit of time experimenting with video playing and testing things: I shouldn’t think too much about hardware decoding until VP9 hardware is available (years for me). But mpv provides some real benefits right now on the same hardware, I’m not sure why.

etbe – Russell Coker

Archives

Categories

Storage Trends 2021

The Viability of Small Disks

NVMe vs SSD

Small Servers

Larger Servers

Computer Cases

Conclusion

Censoring Images

Links February 2021

Links January 2021

PSI and Cgroup2

RISC-V and Qemu

Monopoly the Game

Planet Linux Australia

Weather and Boinc

MPV vs Mplayer

Conclusion

Archives

Email and RSS

Archives

Categories

Tags

The Viability of Small Disks

NVMe vs SSD

Small Servers

Larger Servers

Computer Cases

Conclusion

Conclusion

Archives

Email and RSS