Archives

Categories

Dell PowerEdge T320 and Linux

I recently bought a couple of PowerEdge T320 servers, so now to learn about setting them up. They are a little newer than the R710 I recently setup (which had iDRAC version 6), they have iDRAC version 7.

RAM Speed

One system has a E5-2440 CPU with 2*16G DDR3 DIMMs and a Memtest86+ speed of 13,043MB/s, the other is essentially identical but with a E5-2430 CPU and 4*16G DDR3 DIMMs and a Memtest86+ speed of 8,270MB/s. I had expected that more DIMMs means better RAM performance but this isn’t what happened. I firstly upgraded the BIOS, as I expected it didn’t make a difference but it’s a good thing to try first.

On the E5-2430 I tried removing a DIMM after it was pointed out on Facebook that the CPU has 3 memory channels (here’s a link to a great site with information on that CPU and many others [1]). When I did that I was prompted to disable advanced ECC (which treats pairs of DIMMs as a single unit for ECC allowing correcting more than 1 bit errors) and I had to move the 3 remaining DIMMS to different slots. That improved the performance to 13,497MB/s. I then put the spare DIMM into the E5-2440 system and the performance increased to 13,793MB/s, when I installed 4 DIMMs in the E5-2440 system the performance remained at 13,793MB/s and the E5-2430 went down to 12,643MB/s.

This is a good result for me, I now have the most RAM and fastest RAM configuration in the system with the fastest CPU. I’ll sell the other one to someone who doesn’t need so much RAM or performance (it will be really good for a small office mail server and NAS).

Firmware Update

BIOS

The first issue is updating the BIOS, unfortunately the first link I found to the Dell web site didn’t have a link to download the Linux installer. It offered a Windows binary, an EFI program, and a DOS binary. I’m not about to install Windows if there is any other option and EFI is somewhat annoying, so that leaves DOS. The first Google result for installing FreeDOS advised using “unetbootin”, that didn’t work at all for me (created a USB image that the Dell BIOS didn’t recognise as bootable) and even if it did it wouldn’t have been a good solution.

I went to the FreeDOS download page [2] and got the “Lite USB” zip file. That contained “FD12LITE.img” which I could just dd to a USB stick. I then used fdisk to create a second 32MB partition, used mkfs.fat to format it, and then copied the BIOS image file to it. I booted the USB stick and then ran the BIOS update program from drive D:. After the BIOS update this became the first system I’ve seen get a totally green result from “spectre-meltdown-checker“!

I found the link to the Linux installer for the new Dell BIOS afterwards, but it was still good to play with FreeDOS.

PERC Driver

I probably didn’t really need to update the PERC (PowerEdge Raid Controller) firmware as I’m just going to run it in JBOD mode. But it was easy to do, a simple bash shell script to update it.

Here are the perccli commands needed to access disks, it’s all hot-plug so you can insert disks and do all this without a reboot:

# show overview
perccli show
# show controller 0 details
perccli /c0 show all
# show controller 0 info with less detail
perccli /c0 show
# clear all "foreign" RAID members
perccli /c0 /fall delete
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

The “perccli /c0 show” command gives the following summary of disk (“PD” in perccli terminology) information amongst other information. The EID is the enclosure, Slt is the “slot” (IE the bay you plug the disk into) and the DID is the disk identifier (not sure what happens if you have multiple enclosures). The allocation of device names (sda, sdb, etc) will be in order of EID:Slt or DID at boot time, and any drives added at run time will get the next letters available.

----------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                     Sp 
----------------------------------------------------------------------------------
32:0      0 Onln   0  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:1      1 Onln   1  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:3      3 Onln   2   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
32:4      4 Onln   3   3.637 TB SATA HDD N   N  512B WDC WD40EURX-64WRWY0      U  
32:5      5 Onln   5 278.875 GB SAS  HDD Y   N  512B ST300MM0026               U  
32:6      6 Onln   6 558.375 GB SAS  HDD N   N  512B AL13SXL600N               U  
32:7      7 Onln   4   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
----------------------------------------------------------------------------------

The PERC controller is a MegaRAID with possibly some minor changes, there are reports of Linux MegaRAID management utilities working on it for similar functionality to perccli. The version of MegaRAID utilities I tried didn’t work on my PERC hardware. The smartctl utility works on those disks if you tell it you have a MegaRAID controller (so obviously there’s enough similarity that some MegaRAID utilities will work). Here are example smartctl commands for the first and last disks on my system. Note that the disk device node doesn’t matter as all device nodes associated with the PERC/MegaRAID are equal for smartctl.

# get model number etc on DID 0 (Samsung SSD)
smartctl -d megaraid,0 -i /dev/sda
# get all the basic information on DID 0
smartctl -d megaraid,0 -a /dev/sda
# get model number etc on DID 7 (Seagate 4TB disk)
smartctl -d megaraid,7 -i /dev/sda
# exactly the same output as the previous command
smartctl -d megaraid,7 -i /dev/sdc

I have uploaded etbemon version 1.3.5-6 to Debian which has support for monitoring smartctl status of MegaRAID devices and NVMe devices.

IDRAC

To update IDRAC on Linux there’s a bash script with the firmware in the same file (binary stuff at the end of a shell script). To make things a little more exciting the script insists that rpm be available (running “apt install rpm” fixes that for a Debian system). It also creates and runs other shell scripts which start with “#!/bin/sh” but depend on bash syntax. So I had to make /bin/sh a symlink to /bin/bash. You know you need this if you see errors like “typeset: not found” and “[: -eq: unexpected operator” and then the system reboots. Dell people, please test your scripts on dash (the Debian /bin/sh) or just specify #!/bin/bash.

If the IDRAC update works it will take about 8 minutes.

Lifecycle Controller

The Lifecycle Controller is apparently for installing OS and firmware updates. I use Linux tools to update Linux and I generally don’t plan to update the firmware after deployment (although I could do so from Linux if needed). So it doesn’t seem to offer anything useful to me.

Setting Up IDRAC

For extra excitement I decided to try to setup IDRAC from the Linux command-line. To install the RAC setup tool you run “apt install srvadmin-idracadm7 libargtable2-0” (because srvadmin-idracadm7 doesn’t have the right dependencies).

# srvadmin-idracadm7 is missing a dependency
apt install srvadmin-idracadm7 libargtable2-0
# set the IP address, netmask, and gatewat for IDRAC
idracadm7 setniccfg -s 192.168.0.2 255.255.255.0 192.168.0.1
# put my name on the front panel LCD
idracadm7 set System.LCD.UserDefinedString "Russell Coker"

Conclusion

This is a very nice deskside workstation/server. It’s extremely quiet with hardly any fan noise and the case is strong enough to contain the noise of hard drives. When running with 3* 3.5″ SATA disks and 2*10k 2.5″ SAS disks on a wooden floor it wasn’t annoyingly loud. Without the SAS disks it was as quiet as you can expect any PC to be, definitely not the volume you expect from a serious server! I bought the T320 systems loaded with SAS disks which made them quite loud, I immediately put the disks on ebay and installed SATA SSDs and hard drives which gives me more performance and more space than the SAS disks with less cost and almost no noise.

8*3.5″ drive bays gives room for expansion. I currently have 2*SATA SSDs and 3*SATA disks, the SSDs are for the root filesystem (including /home) and the disks are for a separate filesystem for large files.

Netflix and IPv6

It seems that Netflix has an ongoing issue of not working well with IPv6, apparently they have some sort of region checking code that doesn’t correctly identify IPv6 prefixes. To fix this I wrote the following script to make a small zone file with only A records for Netflix and no AAAA records. The $OUT.header file just has the SOA record for my fake netflix.com domain.

#!/bin/bash

OUT=/etc/bind/data/netflix.com
HEAD=$OUT.header

cp $HEAD $OUT
dig -t a www.netflix.com @8.8.8.8|sed -n -e "s/^.*IN/www IN/p"|grep [0-9]$ >> $OUT
dig -t a android.prod.cloud.netflix.com @8.8.8.8|sed -n -e "s/^.*IN/android.prod.cloud IN/p"|grep [0-9]$ >> $OUT
/usr/sbin/rndc reload > /dev/null

Update

I updated this post to add a line for android.prod.cloud.netflix.com which is the address used by Android devices.

Internode NBN with Arris CM8200 on Debian

I’ve recently signed up for Internode NBN while using the Arris CM8200 device supplied by Optus (previously used for a regular phone service). I took the configuration mostly from Dean’s great blog post on the topic [1]. One thing I changed was the /etc/networ/interfaces configuration, I used the following:

# VLAN ID 2 for Internode's NBN HFC.
auto eth1.2
iface eth1.2 inet manual
  vlan-raw-device eth1

auto nbn
iface nbn inet ppp
    pre-up /bin/ip link set eth1.2 up
    provider nbn

There is no need to have a section for eth1 when you have a section for eth1.2.

IPv6

IPv6 for only one system

With a line in /etc/ppp/options containing only “ipv6 ,” you get an IPv6 address automatically for the ppp0 interface after starting pppd.

IPv6 for your lan

Internode has documented how to configure the WIDE DHCPv6 client to get an IPv6 “prefix” (subnet) [2]. Just install the wide-dhcpv6-client package and put your interface names in a copy of the Internode example config and that works. That gets you a /64 assigned to your local Ethernet. Here’s an example of /etc/wide-dhcpv6/dhcp6c.conf:

interface ppp0 {
    send ia-pd 0;
    script "/etc/wide-dhcpv6/dhcp6c-script";
};

id-assoc pd {
    prefix-interface br0 {
        sla-id 0;
        sla-len 8;
    };
};

For providing addresses to other systems on your LAN they recommend radvd version 1.1 or greater, Debian/Bullseye will ship with version 2.18. Here is an example /etc/radvd.conf that will work with it. It seems that you have to manually (or with a script) set the value to use in place of “xxxx:xxxx:xxxx:xxxx” from the value that is assigned to eth0 (or whichever interface you are using) by the wide-dhcpv6-client.

interface eth0 { 
        AdvSendAdvert on;
        MinRtrAdvInterval 3; 
        MaxRtrAdvInterval 10;
        prefix xxxx:xxxx:xxxx:xxxx::/64 { 
                AdvOnLink on; 
                AdvAutonomous on; 
                AdvRouterAddr on; 
        };
};

Either the configuration of the wide dhcp client or radvd removes the default route from ppp0, so you need to run a command like
ip -6 route add default dev ppp0” to put it back. Probably having “ipv6 ,” is the wrong thing to do when using wide-dhcp-client and radvd.

On a client machine with bridging I needed to have “net.ipv6.conf.br0.accept_ra=2” in /etc/sysctl.conf to allow it to accept route advisory messages on the interface (in this case eth0), for machines without bridging I didn’t need that.

Firewalling

The default model for firewalling nowadays seems to be using NAT and only configuring specific ports to be forwarded to machines on the LAN. With IPv6 on the LAN every system can directly communicate with the rest of the world which may be a bad thing. The following lines in a firewall script will drop all inbound packets that aren’t in response to packets that are sent out. This will give an equivalent result to the NAT firewall people are used to and you can always add more rules to allow specific ports in.

ip6tables -A FORWARD -i ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT
ip6tables -A FORWARD -i ppp+ -i DROP

Some Ideas About Storage Reliability

Hard Drive Brands

When people ask for advice about what storage to use they often get answers like “use brand X, it works well for me and brand Y had a heap of returns a few years ago”. I’m not convinced there is any difference between the small number of manufacturers that are still in business.

One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won’t spread it’s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk).

Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can’t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don’t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn’t mean the next series of disks from the same manufacturer will have problems.

The issues with SSDs are similar to the issues with hard drives but a little different. I’m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don’t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk.

Hard Drive vs SSD

A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a “SSHD” or “Hybrid Drive” gets all the potential failures of hard drives and SSDs.

One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don’t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn’t production ready last time I tested it).

Some people are concerned that their filesystem won’t support “wear levelling” for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn’t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device.

Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD [1] (conclusion – it won’t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about “Wear_Leveling_Count” (old age).

Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays.

My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren’t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably.

NVMe

I think that NVMe isn’t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology.

Future Technology

The latest thing for high end servers is Optane Persistent memory [2] also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is “Memory Mode” where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of “RAM” not ridiculously expensive. Another way of using it is “App Direct Mode” where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is “Mixed Memory Mode” which has some DCPMM in “Memory Mode” and some in “App Direct Mode”.

This has much potential for use of backups and to make things extra exciting “App Direct Mode” has RAID-0 but no other form of RAID.

Conclusion

I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data.

Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode.

Backups are a good thing.

Wifi Performance on Linux

Wifi usually just works. In the past I haven’t had to worry much about performance as for home use things have always been bearable and at work it’s never been my job so I just file a bug report with the relevant people when things go wrong. But a few years ago I had some problems.

For my home network I got a free Wifi AP which wasn’t performing well.

My AP supported 802.11 modes b/g or g/n (b, g, and n are slow, medium, and fast speeds). I initially had the AP running in b/g mode because I had an 802.11b USB wifi device that I used. When I replaced that with one that did 802.11g I tried changing the AP to g/n mode but performance was even worse on my laptop (although quite good on phones) so I switched back.

For phones it appeared to work well giving 54Mb/s while on my laptop (a second hand Thinkpad X1 Carbon) it was giving 11Mb/s at best and often much less than that. The best demonstration of problems was to start transferring a large file while pinging a system on the LAN the AP was connected to. Usually it would give ping times of 1s or more, sometimes 5s+ ping times. While this was happening the “Invalid misc” count increased rapidly, often by more than 100 per second.

The results of Google searches suggest that “Invalid misc” is due to interference and recommend changing the channel. My AP had been on channel 1 which had performed poorly, channels 2-8 were ok, and channel 9 seemed reasonably good. As an aside trying all channels manually is not a good idea, it takes a lot of time and gives little useful data. After changing to channel 9 it still only gave about 500KB/s when transferring large files with ping times of about 100ms, but that’s a big improvement. I tried running “iwlist scanning” to scan the Wifi network for other APs, that showed that channel 1 was used a lot but didn’t make it clear what I should do other than that.

The next thing I tried was the Wifi Analyser app on Android [1] (which doesn’t work on my latest phone, I don’t know if it’s still being actively maintained, it will definitely work on older phones). That has a nice graph mode that shows which channels are used and how the frequencies spread and interfere with other channels. One thing I hadn’t realised before I looked at the graphs is that 802.11n uses 4 channels and interferes past that. If you have two 802.11n devices you don’t have much space left out of the 14 channels available. To make more space I configured the Wifi AP in my ADSL modem to 802.11b/g mode and assigned it a channel away from the others making 4 channels available with no interference.

After that iwconfig reported between 60 and 120Mb/s and I got consistent transfer rates over 1.5MB/s while ping times remained below 100ms.

The 5GHz frequency range is less congested. But at the time I didn’t feel like buying 5GHz equipment.

Since that time I had signed up with an ISP that had a good deal on a Wifi AP that had 5GHz. Now I have all my devices configured to use 5GHz or 2.4GHz depending on which they think is best. So there’s less devices on 2.4GHz and the AP is configured for “20MHz channel width” in the 2.4GHz range (which means 802.11b/g).

Conclusion

802.11n seems to be a bad idea unless you run the only AP in an area. In a suburban area you will have 3 other houses broadcasting in your area and 802.11n is bad for everyone. The worst case scenario would be one person using 802.11n and interfering with everyone else’s 802.11g and then having everyone else turn on 802.11n to try and make things faster.

5GHz is less congested as most people run old hardware. It also has a shorter range which has the upside of getting less interference from other people. I’m considering installing 5GHz APs at both ends of my house and configuring all my new devices to not use 2.4GHz.

Wifi spectrum analysis software is much better than manual testing of channels or trying to deduce things from the output if “iwlist scanning“.

USB Cables and Cameras

This page has summaries of some USB limits [1]. USB 2.0 has the longest cable segment limit of 5M (1.x, 3.x, and USB-C are all shorter), so USB 2.0 is what you want for long runs. The USB limit for daisy chained devices is 7 (including host and device), so that means a maximum of 5 hubs or a total distance between PC and device of 30M. There are lots of other ways of getting longer distances, the cheapest seems to be putting an old PC at the far end of an Ethernet cable.

Some (many? most?) laptops have USB for the interface to the built in camera, and these are sold from scrapped laptops. You could probably setup a home monitoring system in a typical home by having a centrally located PC with USB hubs fanning out to the corners. But old Android phones on a Wifi network seems like an easier option if you can prevent the phones from crashing all the time.

HP ML110 Gen9

I’ve just bought a HP ML110 Gen9 as a personal workstation, here are my notes about it and documentation on running Debian on it.

Why a Server?

I bought this is because the ML350p Gen8 turned out to be too noisy for my taste [1]. I’ve just been editing my page about Memtest86+ RAM speeds [2], over the course of 10 years (high end laptop in 2001 to low end server in 2011) RAM speed increased by a factor of 100. RAM speed has been increasing at a lower rate than CPU speed and is becoming an increasing bottleneck on system performance. So while I could get a faster white-box system the cost of a second-hand server isn’t that great and I’m getting a system that’s 100* faster than what was adequate for most tasks in 2001.

HP makes some nice workstation class machines with ECC RAM (think server without remote management, hot-swap disks, or redundant PSU but with sound hardware). But they are significantly more expensive on the second hand market than servers.

This server cost me $650 and came with 2*480G “DC” grade SSDs (Intel but with HPE stickers). I hope that more than half of the purchase price will be recovered from selling the SSDs (I will use NVMe). Also 64G of non-ECC RAM costs $370 from my local store. As I want lots of RAM for testing software on VMs it will probably turn out that the server cost me less than the cost of new RAM once I’ve sold the SSDs!

Monitoring

wget -O /usr/local/hpePublicKey2048_key1.pub https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub
echo "# HP monitoring" >> /etc/apt/sources.list
echo "deb [signed-by=/usr/local/hpePublicKey2048_key1.pub] http://downloads.linux.hpe.com/SDR/downloads/MCP/Debian/ stretch/current-gen9 non-free" >> /etc/apt/sources.list

The above commands will make the management utilities installable on Debian/Buster. If using Bullseye (Testing at the moment) then you need to have Buster repositories in APT for dependencies, HP doesn’t seem to have packaged all their utilities for Buster.

wget -r -np -A Contents-amd64.bz2 http://downloads.linux.hpe.com/SDR/repo/mcp/debian/dists

To find out which repositories had the programs I need I ran the above recursive wget and then uncompressed them for grep -R (as an aside it would be nice if bzgrep supported -R). I installed the hp-health package which has hpasmcli for viewing and setting many configuration options and hplog for viewing event log data and thermal data (among a few other things). I’ve added a new monitor to etbemon hp-temp.monitor to monitor HP server temperatures, I haven’t made a configuration option to change the thresholds for what is considered “normal” because I don’t expect server class systems to be routinely running above the warning temperature. For the linux-temp.monitor script I added a command-line option for the percentage of the “high” temperature that is an error condition as well as an option for the number of CPU cores that need to be over-temperature, having one core permanently over the “high” temperature due to a web browser seems standard for white-box workstations nowadays.

The hp-health package depends on “libc6-i686 | lib32gcc1” even though none of the programs it contains use lib32gcc1. Depending on lib32gcc1 instead of “lib32gcc1 | lib32gcc-s1” means that installing hp-health requires removing mesa-opencl-icd which probably means that BOINC can’t use the GPU among other things. I solved this by editing /var/lib/dpkg/status and changing the package dependencies to what I desired. Note that this is not something for a novice to do, make a backup and make sure you know what you are doing!

Issues

The “HPE Dynamic Smart Array B140i” is a software RAID device. While it’s convenient for some users that software RAID gets supported in the UEFI boot process, generally software RAID is a bad idea. Also my system has hot-swap drive caddies but the controller doesn’t support hot-swap. So the first thing to do was to configure the array controller to run in AHCI mode and give up on using hot-swap drive caddies for hot-swap. I tested all the documented ways of scanning for new devices and nothing other than a reboot made the kernel recognise a new SATA disk.

According to specs provided by Dell and HP the ML110 Gen9 makes less noise than the PowerEdge T320, according to my own observations the reverse is the case. I don’t know if this is because of Dell being more conservative in their specs than HP or because of how dBA is measured vs my own personal annoyance thresholds for sounds. As the system makes more noise than I’m comfortable with I plan to build a rubber enclosure for the rear of the system to reduce noise, that will be the subject of another post. For Australian readers Bunnings has some good deals on rubber floor mats that can be used to reduce server noise.

The server doesn’t have sound hardware, while one could argue that servers don’t need sound there are some server uses for sound hardware such as using line input as a source of entropy. Also for a manufacturer it might be a benefit to use the same motherboard for workstations and servers. Fortunately a friend gave me a nice set of Logitech USB speakers a few years ago that I hadn’t previously had a cause to use, so that will solve the problem for me (I don’t need line-in on a workstation).

UEFI and Memtest

I decided to try UEFI boot for something new (in the past I’d only used UEFI boot for a server that only had large disks). In the past I’ve booted all my own systems with BIOS boot because I’m familiar with it and they all have SSDs for booting which are less than 2TB in size (until recently 2TB SSDs weren’t affordable for my personal use). The Debian UEFI wiki page is worth reading [3]. The Debian Wiki page about ProLiant servers [4] is worth reading too.

Memtest86+ doesn’t support EFI booting (just goes to a black screen) even though Debian/Buster puts in a GRUB entry for it (Debian bug #695246 was filed for this in 2012). Also on my ML110 Memtest86+ doesn’t report the RAM speed (a known issue on Memtest86+). Comments on the net say that Memtest86+ hasn’t been maintained for a long time and Memtest86 (the non-free version) has been updated more recently. So far I haven’t seen a system with ECC RAM have a memory problem that could be detected by Memtest86+, the memory problems I’ve seen on ECC systems have been things that prevent booting (RAM not being recognised correctly), that are detected by the BIOS as ECC errors before booting, or that are reported by the kernel as ECC errors at run time (happened years ago and I can’t remember the details).

Overall I’m not a fan of EFI with the way it currently works in Debian. It seems to add some of the GRUB functionality into the BIOS and then use that to load GRUB. It seems that EFI can do everything you need and it would be better to just have a single boot loader not two of them chained.

Power Supply

There are a range of PSUs for the ML110, the one I have has the smallest available PSU (350W) and doesn’t have a PCIe power cable (the one used for video cards). Here is the HP document which shows the cabling for the various ML110 Gen8 PSUs [5], I have the 350W PSU. One thing I’ve considered is whether I could make an adaptor from the drive bay power to the PCIe connector. A quick web search indicates that 4 SAS disks when active can take up to 75W more power than a system with no disks. If that’s the case then the 2 spare drive bay connectors which can each handle 4 disks should be able to supply 150W. As a 6 pin PCIe power cable (GPU power cable) is rated at 75W that should be fine in theory (here’s a page with the pinouts for PCIe power connectors [6]). My video card is a Radeon R7 260X which apparently takes about 113W all up so should be taking less than 75W from the PCIe power cable.

All I really want is YouTube, Netflix, and text editing at 4K resolution. So I don’t need much in terms of 3D power. KDE uses some of the advanced features of modern video cards, but it doesn’t compare to 3D gaming. According to the Wikipedia page for Radeon RX 500 series [7] the RX560 supports DisplayPort 1.4 and HDMI 2.0 (both of which do 4K@60Hz) and has a TDP of 75W. So a RX560 video card seems like a good option that will work in any system that doesn’t have a spare PCIe power cable. I’ve just ordered one of those for $246 so hopefully that will arrive in a week or so.

PCI Fan

The ML110 Gen9 has an “optional” PCIe “fan and baffle” to cool PCIe cards (part number 784580-B21). Extra cooling of PCIe cards is a good thing, but $400 list price (and about $50 ebay price) for the fan and baffle is unpleasant. When I boot the system with a PCIe dual-ethernet card and two PCIe NVMe cards it gives a BIOS warning on boot, when I add a video card it refuses to boot without the extra fan. It’s nice that the system makes sure it doesn’t get into a thermal overload situation, but it would be nicer if they just shipped all necessary fans with it instead of trying to get more money out of customers. I just bought a PCI fan and baffle kit for $60.

Conclusion

In spite of the unexpected expense of a new video card and PCI fan the overall cost of this system is still low, particularly when considering that I’ll find another use for the video card which needs and extra power connector.

It is disappointing that HP didn’t supply a more capable PSU and fit all the fans to all models, the expectation of a server is that you can just do server stuff not have to buy extra bits before you can do server stuff. If you want to install Tesla GPUs or something then it’s expected that you might need to do something unusual with a server, but the basic stuff should just work. A single processor tower server should be designed to function as a deskside workstation and be able to handle an average video card.

Generally it’s a nice computer, I look forward to getting the next deliveries of parts so I can make it work properly.

Minikube and Debian

I just started looking at the Kubernetes documentation and interactive tutorial [1], which incidentally is really good. Everyone who is developing a complex system should look at this to get some ideas for online training. Here are some notes on setting it up on Debian.

Add Kubernetes Apt Repository

deb https://apt.kubernetes.io/ kubernetes-xenial main

First add the above to your apt sources configuration (/etc/apt/sources.list or some file under /etc/apt/sources.list.d) for the kubectl package. Ubuntu Xenial is near enough to Debian/Buster and Debian/Unstable that it should work well for both of them. Then install the GPG key “6A030B21BA07F4FB” for use by apt:

gpg --recv-key 6A030B21BA07F4FB
gpg --list-sigs 6A030B21BA07F4FB
gpg --export 6A030B21BA07F4FB | apt-key add -

The Google key in question is not signed.

Install Packages for the Tutorial

The online training is based on “minikube” which uses libvirt to setup a KVM virtual machine to do stuff. To get this running you need to have a system that is capable of running KVM (IE the BIOS is set to allow hardware virtualisation). It MIGHT work on QEMU software emulation without KVM support (technically it’s possible but it would be slow and require some code to handle that), I didn’t test if it does. Run the following command to install libvirt, kvm, and dnsmasq (which minikube requires) and kubectl on Debian/Buster:

apt install libvirt-clients libvirt-daemon-system qemu-kvm dnsmasq kubectl

For Debian/Unstable run the following command:

apt install libvirt-clients libvirt-daemon-system qemu-system-x86 dnsmasq kubectl

To run libvirt as non-root without needing a password for everything you need to add the user in question to the libvirt group. I recommend running things as non-root whenever possible. In this case entering a password for everything will probably be more pain than you want. The Debian Wiki page about KVM [2] is worth reading.

Install Minikube Test Environment

Here is the documentation for installing Minikube [3]. Basically just download a single executable from the net, put it in your $PATH, and run it. Best to use non-root for that. Also you need at least 3G of temporary storage space in the home directory of the user that runs it.

After installing minikube run “minikube start” which will download container image data and start it up. Then you can run commands like the following to see what it has done.

# get overview of virsh commands
virsh help
# list domains
virsh --connect qemu:///system list
# list block devices a domain uses
virsh --connect qemu:///system domblklist minikube
# show stats on block device usage
virsh --connect qemu:///system domblkstat minikube hda
# list virtual networks
virsh --connect qemu:///system net-list
# list dhcp leases on a virtual network
virsh --connect qemu:///system net-dhcp-leases minikube-net
# list network filters
virsh --connect qemu:///system nwfilter-list
# list real network interfaces
virsh --connect qemu:///system iface-list

Echo Chambers vs Epistemic Bubbles

C Thi Nguyen wrote an interesting article about the difficulty of escaping from Echo Chambers and also mentions Epistemic Bubbles [1].

An Echo Chamber is a group of people who reinforce the same ideas and who often preemptively strike against opposing ideas (for example the right wing denigrating “mainstream media” to prevent their followers from straying from their approved message). An Epistemic Bubble is a group of people who just happen to not have contact with certain different ideas.

When reading that article I wondered about what bubbles I and the people I associate with may be in. One obvious issue is that I have little communication with people who don’t write in English and also little communication with people who are poor. So people who are poor and who can’t write in English (which means significant portions of the population of India and Africa) are out of communication range for me. There are many situations that are claimed to be bubbles such as white people who are claimed to be innocent of racial issues because they only associate with other white people and men in the IT industry who are claimed to be innocent of sexism because they don’t associate with women in the IT industry. But I think they are more of an echo chamber issue, if a white American doesn’t access any of the variety of English language media produced by Afro Americans and realise that there’s a racial problem it’s because they don’t want to see it and deliberately avoid looking at evidence. If a man in the IT industry doesn’t access any of the media produced by women in tech and realise there are problems with sexism then it’s because they don’t want to see it.

When is it OK to Reject a Group?

The Ad Hominem Wikipedia page has a good analysis of different types of Ad Hominem arguments [2]. But the criteria for refuting a point in a debate are very different to the criteria used to determine which sources you should trust when learning about a topic.

For example it’s theoretically possible for someone to be good at computer science while also thinking the world is flat. In a debate about some aspect of computer programming it would be a fallacious Ad Hominem argument to say “you think the Earth is flat therefore you can’t program a computer”. But if you do a Google search for information on computer programming and one of the results is from earthisflat.com then it would probably save time to skip reading that one. If only one person supports an idea then it’s quite likely to be wrong. Good ideas tend to be supported by multiple people and for any good idea you will find a supporter who doesn’t appear to have any ideas that are obviously wrong.

One of the problems we have as a society now is determining the quality of data (ideas, claims about facts, opinions, communication/spam, etc). When humans have to do that it takes time and energy. Shortcuts can make things easier. Some shortcuts I use are that mainstream media articles are usually more reliable than social media posts (even posts by my friends) and that certain media outlets are untrustworthy (like Breitbart). The next step is that anyone who cites a bad site like Breitbart as factual (rather than just an indication of what some extremists believe) is unreliable. For most questions that you might search for on the Internet there is a virtually endless supply of answers, the challenge is not finding an answer but finding a correct answer. So eliminating answers that are unlikely to be correct is an important part of the search.

If someone is citing references to support their argument and they can only cite fringe or extremist sites then I won’t be convinced. Now someone could turn that argument around and claim that a site I reference such as the New York Times is wrong. If I find that my ideas were based on a claim that can only be found on the NYT then I will reconsider the issue. While I think that the NYT is generally accurate they are capable of making mistakes and if they are the sole source for claims that go against other claims then I will be hesitant to accept such claims. Newspapers often have exclusive articles based on their own research, but such articles always inspire investigation from other newspapers so other articles appear either supporting or questioning the claims in the exclusive.

Saving Time When Interacting With Members of Echo Chambers

Convincing a member of a cult or echo chamber of anything is not likely. When in discussions with them the focus should be on the audience and on avoiding wasting much time while also not giving them the impression that you agree with them.

A common thing that members of echo chambers say is “I don’t have time to read about that” when you ask if they have read a research paper or a news article. I don’t have time to listen to people who can’t or won’t learn before speaking, there just isn’t any value in that. Also if someone has a list of memes that takes more than 15 minutes to recite then they have obviously got time for reading things, just not reading outside their echo chamber.

Conversations with members of echo chambers seem to be state free. They make a claim and you reject it, but regardless of the logical flaws you point out or the counter evidence you cite they make the same claim again the next time you speak to them. This seems to be evidence supporting the claim that evangelism is not about converting other people but alienating cult members from the wider society [3] (the original Quora text seems unavailable so I’ve linked to a Reddit copy). Pointing out that they had made a claim previously and didn’t address the issues you had with it seems effective, such discussions seem to be more about performance so you want to perform your part quickly and efficiently.

Be aware of false claims about etiquette. It’s generally regarded as polite not to disagree much with someone who invites you to your home or who has done some favour for you, but that is no reason for tolerating an unwanted lecture about their echo chamber. Anyone who tries to create a situation where it seems rude of you not to listen to them saying things that they know will offend you is being rude, much ruder than telling them you are sick of it.

Look for specific claims that can be disproven easily. The claim that the “Roman Salute” is different from the “Hitler Salute” is one example that is easy to disprove. Then they have to deal with the issue of their echo chamber being wrong about something.

More EVM

This is another post about EVM/IMA which has it’s main purpose providing useful web search results for problems. However if reading it on a planet feed inspires someone to play with EVM/IMA then that’s good too, it’s interesting technology.

When using EVM/IMA in the Linux kernel if dmesg has errors like “op=appraise_data cause=missing-HMAC” the “missing-HMAC” means that the error code in the kernel source is INTEGRITY_NOLABEL which has a comment “No security.evm xattr“. You can check for the xattr on a file with the following command (this example has the security.evm xattr):

# getfattr -d -m - /etc/fstab 
getfattr: Removing leading '/' from absolute path names
# file: etc/fstab
security.evm=0sAwICqGOsfwCAvgE9y9OP74QxJ/I+3eOSF2n2dM51St98z/7LYHFd9rfGTvssvhTSYL9G8cTdRAH8ozggJu7VCzggW1REoTjnLcPeuMJsrMbW3DwVrB6ldDmJzyenLMjnIHmRDDeK309aRbLVn2ueJZ07aMDcSr+sxhOOAQ/GIW4SW8L1AKpKn4g=
security.ima=0sAT+Eivfxl+7FYI+Hr9K4sE6IieZ+
security.selinux="system_u:object_r:etc_t:s0"

If dmesg has errors like “op=appraise_data cause=invalid-HMAC” the “invalid-HMAC” means that the error code in the kernel source is INTEGRITY_FAIL which has a comment “Invalid HMAC/signature“.

These errors are from the evm_verifyxattr() function in Linux kernel 5.11.14.

The error “evm: HMAC key is not set” means that the evm key is not initialised, this means the key needs to be loaded into the kernel and EVM is initialised by the command “echo 1 > /sys/kernel/security/evm” (or possibly some equivalent from a utility like evmctl). When the key is loaded the kernel gives the message “evm: key initialized” and after that /sys/kernel/security/evm is read-only. If there is something wrong with the key the kernel gives the message “evm: key initialization failed“, it seems that the way to determine if your key is good is to try writing 1 to /sys/kernel/security/evm and see what happens. After that the command “cat /sys/kernel/security/evm” should return “3”.

The Gentoo wiki has good documentation on how to create and load the keys which has to be done before initialising EVM [1]. I’ll write more about that in another post.