1

Converting to UEFI

When I got my HP ML110 Gen9 working as a workstation I initially was under the impression that boot wasn’t supported on NVMe and booted it from USB. I found USB booting with legacy boot to be unreliable so decided to try EFI booting and noticed that the NVMe devices were boot candidates with UEFI. Making one of them bootable was more complex than expected because no-one seems to have documented such things. So here’s my documentation, it’s not great but this method has worked once for me.

Before starting major partitioning work it’s best to run “parted -l and save the output to a file, that can allow you to recreate partitions if you corrupt them. One thing I’m doing on systems I manage is putting “@reboot /usr/sbin/parted -l > /root/parted.log” in the root crontab, then when the system is backed up the backup server gets any recent changes to partitioning (I don’t backup /var/log on all my systems).

Firstly run parted on the device to create the EFI and /boot partitions, note that if you want to copy and paste from this you must do so one line at a time, a block paste seemed to confuse parted.

mklabel gpt
mkpart EFI fat32 1 99
mkpart boot ext3 99 300
toggle 1 boot
toggle 1 esp
p
# Model: CT1000P1SSD8 (nvme)
# Disk /dev/nvme1n1: 1000GB
# Sector size (logical/physical): 512B/512B
# Partition Table: gpt
# Disk Flags: 
#
# Number  Start   End     Size    File system  Name  Flags
#  1      1049kB  98.6MB  97.5MB  fat32        EFI   boot, esp
#  2      98.6MB  300MB   201MB   ext3         boot
q

Here are the commands needed to create the filesystems and install the necessary files. This is almost to the stage of being scriptable. Some minor changes need to be made to convert from NVMe device names to SATA/SAS but nothing serious.

mkfs.vfat /dev/nvme1n1p1
mkfs.ext3 -N 1000 /dev/nvme1n1p2
file -s /dev/nvme1n1p2 | sed -e s/^.*UUID/UUID/ -e "s/ .*$/ \/boot ext3 noatime 0 1/" >> /etc/fstab
file -s /dev/nvme1n1p1 | tr "[a-f]" "[A-F]" |sed -e s/^.*numBEr.0x/UUID=/ -e "s/, .*$/ \/boot\/efi vfat umask=0077 0 1/" >> /etc/fstab
# edit /etc/fstab to put a hyphen between the 2 groups of 4 chars for the VFAT filesystem UUID
mount /boot
mkdir -p /boot/efi /boot/grub
mount /boot/efi
mkdir -p /boot/efi/EFI/debian
apt install efibootmgr shim-unsigned grub-efi-amd64
cp /usr/lib/shim/* /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /boot/efi/EFI/debian
file -s /dev/nvme1n1p2 | sed -e "s/^.*UUID=/search.fs_uuid /" -e "s/ .needs.*$/ root hd0,gpt2/" > /boot/efi/EFI/debian/grub.cfg
echo "set prefix=(\$root)'/boot/grub'" >> /boot/efi/EFI/debian/grub.cfg
echo "configfile \$prefix/grub.cfg" >> /boot/efi/EFI/debian/grub.cfg
grub-install
update-grub

If someone would like to make a script that can handle the different partition names of regular SCSI/SATA disks, NVMe, CCISS, etc then that would be great. It would be good to have a script in Debian that creates the partitions and sets up the EFI files.

If you want to have a second bootable device then the following commands will copy a GPT partition table and give it new UUIDs, make very certain that $DISKB is the one you want to be wiped and refer to my previous mention of “parted -l“. Also note that parted has a rescue command which works very well.

sgdisk /dev/$DISKA -R /dev/$DISKB 
sgdisk -G /dev/$DISKB

To backup a GPT partition table run a command like this. Note that if sgdisk is told to backup a MBR partitioned disk it will say “Found invalid GPT and valid MBR; converting MBR to GPT forma” which is probably a viable way of converting MBR format to GPT.

sgdisk -b sda.bak /dev/sda
3

Strange Apache Reload Issue

I recently had to renew the SSL certificate for my web server, nothing exciting about that but Certbot created a new directory for the key because I had removed some domains (moved to a different web server). This normally isn’t a big deal, change the Apache configuration to the new file names and run the “reload” command. My monitoring system initially said that the SSL certificate wasn’t going to expire in the near future so it looked fine. Then an hour later my monitoring system told me that the certificate was about to expire, apparently the old certificate came back!

I viewed my site with my web browser and the new certificate was being used, it seemed strange. Then I did more tests with gnutls-cli which revealed that exactly half the connections got the new certificate and half got the old one. Because my web server isn’t doing anything particularly demanding the mpm_event configuration only starts 2 servers, and even that may be excessive for what it does. So it seems that the Apache reload command had reloaded the configuration on one mpm_event server but not the other!

Fortunately this was something that was easy to test and was something that was automatically tested. If the change that didn’t get accepted was something small it would be a particularly insidious bug.

I haven’t yet tried to reproduce this. But if I get the time I’ll do so and file a bug report.

Getting Started With Kali

Kali is a Debian based distribution aimed at penetration testing. I haven’t felt a need to use it in the past because Debian has packages for all the scanning tools I regularly use, and all the rest are free software that can be obtained separately. But I recently decided to try it.

Here’s the URL to get Kali [1]. For a VM you can get VMWare or VirtualBox images, I chose VMWare as it’s the most popular image format and also a much smaller download (2.7G vs 4G). For unknown reasons the torrent for it didn’t work (might be a problem with my torrent client). The download link for it was extremely slow in Australia, so I downloaded it to a system in Germany and then copied it from there.

I don’t want to use either VMWare or VirtualBox because I find KVM/Qemu sufficient to do everything I want and they are in the Main section of Debian, so I needed to convert the image files. Some of the documentation on converting image formats to use with QEMU/KVM says to use a program called “kvm-img” which doesn’t seem to exist, I used “qemu-img” from the qemu-utils package in Debian/Bullseye. The man page qemu-img(1) doesn’t list the types of output format supported by the “-O” option and the examples returned by a web search show using “-O qcow2“. It turns out that the following command will convert the image to “raw” format which is the format I prefer. I use BTRFS for storing all my VM images and that does all the copy-on-write I need.

qemu-img convert Kali-Linux-2021.3-vmware-amd64.vmdk ../kali

After converting it the file was 500M smaller than the VMWare files (10.2 vs 10.7G). Probably the Kali distribution file could be reduced in size by converting it to raw and then back to VMWare format. The Kali VMWare image is compressed with 7zip which has a good compression ratio, I waited almost 90 minutes for zstd to compress it with -19 and the result was 12% larger than the 7zip file.

VMWare apparently likes to use an emulated SCSI controller, I spent some time trying to get that going in KVM. Apparently recent versions of QEMU changed the way this works and therefore older web pages aren’t helpful. Also allegedly the SCSI emulation is buggy and unreliable (but I didn’t manage to get it going so can’t be sure). It turns out that the VM is configured to work with the virtio interface, the initramfs.conf has the configuration option “MODULES=most” which makes it boot on all common configurations (good work by the initramfs-tools maintainers). The image works well with the Spice display interface, so it doesn’t capture my mouse, the window for the VM works the same way as other windows on my desktop and doesn’t capture the mouse cursor. I don’t know if this level of Spice integration is in Debian now, last time I tested it didn’t work that way.

I also downloaded Metasploitable [2] which is a VM image designed to be full of security flaws for testing the tools that are in Kali. Again it worked nicely after converting from VMWare to raw format. One thing to note about Metasploitable is that you must not make it available on the public Internet. My home network has NAT for IPv4 but all systems get public IPv6 addresses. It’s usually nice that those things just work on VMs but not for this. So I added an iptables command to block IPv6 to /etc/rc.local.

Conclusion

Installing VMs for both these distributions was quite easy. Most of my time was spent downloading from a slow server, trying to get SCSI emulation working, working out how to convert image files, and testing different compression options. The time spent doing stuff once I knew what to do was very small.

Kali has zsh as the default shell, it’s quite nice. I’ve been happy with bash for decades, but I might end up trying zsh out on other machines.

2

Oracle Cloud Free Tier

It seems that every cloud service of note has a free tier nowadays and the Oracle Cloud is the latest that I’ve discovered (thanks to r/homelab which I highly recommend reading). Here’s Oracle’s summary of what they offer for free [1].

Oracle’s “always free” tier (where presumable “always” is defined as “until we change our contract”) currently offers ARM64 VMs to a total capacity of 4 CPU cores, 24G of RAM, and 200G of storage with a default VM size of 1/4 that (1 CPU core and 6G of RAM). It also includes 2 AMD64 VMs that each have 1G of RAM, but a 64bit VM with 1G of RAM isn’t that useful nowadays.

Web Interface

The first thing to note is that the management interface is a massive pain to use. When a login times out for security reasons it redirects to a web page that gives a 404 error, maybe the redirection works OK if you are using it when it times out, but if you go off and spend an hour doing something else you will return to a 404 page. A web interface should never refer you to a page with a 404.

There doesn’t seem to be a way of bookmarking the commonly used links (as AWS does) and the set of links on the left depend on the section you are in with no obvious way of going between sections. Sometimes I got stuck in a set of pages about authentication controls (the “identity cloud”) and there seems to be no link I could click on to get me back to cloud computing, I had to go to a bookmarked link for the main cloud login page. A web interface should never force the user to type in the main URL or go to a bookmark, you should be able to navigate from every page to every other page in a logical manner. An advanced user might have their own bookmarks in their browser to suit their workflow. But a beginner should be able to go to anywhere without breaking the session.

Some parts of the interface appear to be copied from AWS, but unfortunately not the good parts. The way AWS manages IP access control is not easy to manage and it’s not clear why packets are dropped, Oracle copies all this. On the upside Oracle has some good Datadog style analytics so for a new deployment you can debug IP access control by seeing records of rejected packets. Just to make it extra annoying when you create a rule with multiple ports specified the web interface will expand it out to multiple rules for one port each, having ports 80 and 443 on separate lines doesn’t make things easier. Also it forces you to have IPv4 and IPv6 as separate rules, so if you want HTTP and HTTPS on both IPv4 and IPv6 (a common requirement) then you need 4 separate rules.

One final annoying thing is that the web interface doesn’t make your previous settings a default. As I’ve created many ARM images and haven’t created a single AMD image it should know that the probability that I want to create an AMD image is very low and stop defaulting to that.

Recovery

When trying a new system you will inevitably break things and have to recover things. The way to recover from a configuration error that prevents your VM from booting and getting to a state of allowing a login is to go to stop the VM, then go to the “Boot volume” section under “Resources” and use the settings button to detach the boot volume. Then you go to another VM (which must be running), go to the “Attached block volumes” menu and attach it as Paravirtualised (not iSCSI and not default which will probably be iSCSI). After some time the block device will appear and you can mount it and do stuff to it. Then after umounting it you detach it from the recovery VM and attach it again to the original VM (where it will still have an entry in the “Boot volume” section) and boot the original VM.

As an aside it’s really annoying that you can’t attach a volume to a VM that isn’t running.

My first attempt at image recovery started with making a snapshot of the Boot volume, this didn’t work well because the image uses EFI and therefore GPT and because the snapshot was larger than the original block device (which incidentally was the default size). I admit that I might have made a mistake making the snapshot, but if so it shouldn’t be so easy to do. With GPT if you have a larger block device then partitioning tools complain about the backup partition table not being found, and they complain even more if you try to go back to the smaller size later on. Generally GPT partition tables are a bad idea for VMs, when I run the host I don’t use partition tables, I have a separate block device for each filesystem or swap space.

Snapshots aren’t needed for recovery, they don’t seem to work very well, and if it’s possible to attach a snapshot to a VM in place of it’s original “Boot volume” I haven’t figured out how to do it.

Console Connection

If you boot Oracle Linux a derivative of RHEL that has SE Linux enabled in enforcing mode (yay) then you can go to the “Console connection”. The console is a Javascript console which allows you to login on a virtual serial console on device /dev/ttyAMA0. It tells you to type “help” but that isn’t accepted, you have a straight Linux console login prompt.

If you boot Ubuntu then you don’t get a working serial console, it tells you to type “help” for help but doesn’t respond to that.

It seems that the Oracle Linux kernel 5.4.17-2102.204.4.4.el7uek.aarch64 is compiled with support for /dev/ttyAMA0 (the default ARM serial device) while the kernel 5.11.0-1016-oracle compiled by Oracle for their Ubuntu VMs doesn’t have it.

Performance

I haven’t done any detailed tests of VM performance. As a quick test I used zstd to compress a 154MB file, on my home workstation (E5-2620 v4 @ 2.10GHz) it took 11.3 seconds of CPU time to compress with zstd -9 and 7.2s to decompress. On the Oracle cloud it took 7.2s and 5.4s. So it seems that for some single core operations the ARM CPU used by the Oracle cloud is about 30% to 50% faster than a E5-2620 v4 (a slightly out of date server processor that uses DDR4 RAM).

If you ran all the free resources in a single VM that would make a respectable build server. If you want to contribute to free software development and only have a laptop with 4G of RAM then an ARM build/test server with 24G of RAM and 4 cores would be very useful.

Ubuntu Configuration

The advantage of using EFI is that you can manage the kernel from within the VM. The default Oracle kernel for Ubuntu has a lot of modules included and is compiled with a lot of security options including SE Linux.

Competitors

https://aws.amazon.com/free

AWS offers 750 hours (just over 31 days) per month of free usage of a t2.micro or t3.micro EC2 instance (which means 1GB of RAM). But that only lasts for 12 months and it’s still only 1GB of RAM. AWS has some other things that could be useful like 1 million free Lambda requests per month. If you want to run your personal web site on Lambda you shouldn’t hit that limit. They also apparently have some good offers for students.

https://cloud.google.com/free

The Google Cloud Project (GCP) offers $300 of credit.

https://cloud.google.com/free/docs/gcp-free-tier#free-tier-usage-limits

GCP also has ongoing free tier usage for some services. Some of them are pretty much unlimited use (50GB of storage for “Cloud Source Repositories” is a heap of source code). But for VMs you get the equivalent of 1*e2-micro instance running 24*7. A e2-micro has 1G of RAM. You also only get 30G of storage and 1GB of outbound data. It’s clearly not as generous an offer as Oracle, but Oracle is the underdog so they have to try harder.

https://azure.microsoft.com/en-us/free/

Azure appears to be much the same as AWS, free Linux VM for a year and then other less popular services free forever (or until they change the contract).

https://www.ibm.com/cloud/free

The IBM cloud free tier is the least generous offer, a VM is only free for 30 days. But what they offer for 30 days is pretty decent. If you want to try the IBM cloud and see if it can do what your company needs then this will do well. If you want to have free hosting for your hobby stuff then it’s no good.

Oracle seems like the most generous offer if you want to do stuff, but also one of the least valuable if you want to learn things that will help you at a job interview. For job interviews AWS seems the most useful and then GCP and Azure vying for second place.

1

Thoughts about RAM and Storage Changes

My first Linux system in 1992 was a 386 with 4MB of RAM and a 120MB hard drive which (for some reason I forgot) only was supported by Linux for about 90MB. My first hard drive was 70MB and could do 500KB/s for contiguous IO, my first Linux hard drive was probably a bit faster, maybe 1MB/s. My current Linux workstation has 64G of RAM and 2*1TB NVMe devices that can sustain about 1.1GB/s. The laptop I’m using right now has 8GB of RAM and a 180GB SSD that can do 380MB/s.

My laptop has 2000* the RAM of my first Linux system and maybe 400* the contiguous IO speed. Currently I don’t even run a VM with less than 4GB of RAM, NB I’m not saying that smaller VMs aren’t useful merely that I don’t happen to be using them now. Modern AMD64 CPUs support 2MB “huge pages”. As a proportion of system RAM if I used 2MB pages everywhere they would be a smaller portion of system RAM than the 4KB pages on my first Linux system!

I am not suggesting using 2MB pages for general systems. For my workstations the majority of processes are using less than 10MB of resident memory and given the different uses for memory mapped shared objects, memory mapped file IO, malloc(), stack, heap, etc there would be a lot of inefficiency having 2MB the limit for all allocation. But as systems worked with 4MB of RAM or less and 4K pages it would surely work to have only 2MB pages with 64GB or more of RAM.

Back in the 90s it seemed ridiculous to me to have 256 byte pages on a 68030 CPU, but 4K pages on a modern AMD64 system is even more ridiculous. Apparently AMD64 supports 1GB pages on some CPUs, that seems ridiculously large but when run on a system with 1TB of RAM that’s comparable to 4K pages on my first Linux system. Currently AWS offers 24TB EC2 instances and the Google Cloud Project offers 12TB virtual machines. It might even make sense to have the entire OS using 1GB pages for some usage scenarios on such systems, wasting tens of GB of RAM to save TLB thrashing might be a good trade-off.

My personal laptop has 2000* the RAM of my first Linux system and maybe 400* the contiguous IO speed. An employer recently assigned me a Thinkpad Carbon X1 Gen6 with an NVMe device that could sustain 5GB/s until the CPU overheated, that’s 5000* the contiguous IO speed of my first Linux hard drive. My Linux hard drive had a 28ms average access time and my first Linux hard drive probably was a little better, let’s call it 20ms for the sake of discussion. It’s generally quoted that access times for NVMe are at best 10us, that’s 2000* better than my first Linux hard drive. As seek times are the main factor for swap performance a laptop with 8GB of RAM and a fast NVMe device could be expected to give adequate performance with 2000* the swap of my first Linux system. For the work laptop in question I had 8G of swap and my personal laptop has 6G of swap which is somewhat comparable to the 4MB of swap on my first Linux system in that swap is about equal to RAM size, so I guess my personal laptop is performing better than it can be expected to.

These are just some idle thoughts about hardware changes over the years. Don’t take it as advice for purchasing hardware and don’t take it too seriously in general. Also when writing comments don’t restrict yourself to being overly serious, feel free to run the numbers on what systems with petabytes of Optane might be like, speculate on what NUMA systems in laptops might be like, etc. Go wild.

1

Dell PowerEdge T320 and Linux

I recently bought a couple of PowerEdge T320 servers, so now to learn about setting them up. They are a little newer than the R710 I recently setup (which had iDRAC version 6), they have iDRAC version 7.

RAM Speed

One system has a E5-2440 CPU with 2*16G DDR3 DIMMs and a Memtest86+ speed of 13,043MB/s, the other is essentially identical but with a E5-2430 CPU and 4*16G DDR3 DIMMs and a Memtest86+ speed of 8,270MB/s. I had expected that more DIMMs means better RAM performance but this isn’t what happened. I firstly upgraded the BIOS, as I expected it didn’t make a difference but it’s a good thing to try first.

On the E5-2430 I tried removing a DIMM after it was pointed out on Facebook that the CPU has 3 memory channels (here’s a link to a great site with information on that CPU and many others [1]). When I did that I was prompted to disable advanced ECC (which treats pairs of DIMMs as a single unit for ECC allowing correcting more than 1 bit errors) and I had to move the 3 remaining DIMMS to different slots. That improved the performance to 13,497MB/s. I then put the spare DIMM into the E5-2440 system and the performance increased to 13,793MB/s, when I installed 4 DIMMs in the E5-2440 system the performance remained at 13,793MB/s and the E5-2430 went down to 12,643MB/s.

This is a good result for me, I now have the most RAM and fastest RAM configuration in the system with the fastest CPU. I’ll sell the other one to someone who doesn’t need so much RAM or performance (it will be really good for a small office mail server and NAS).

Firmware Update

BIOS

The first issue is updating the BIOS, unfortunately the first link I found to the Dell web site didn’t have a link to download the Linux installer. It offered a Windows binary, an EFI program, and a DOS binary. I’m not about to install Windows if there is any other option and EFI is somewhat annoying, so that leaves DOS. The first Google result for installing FreeDOS advised using “unetbootin”, that didn’t work at all for me (created a USB image that the Dell BIOS didn’t recognise as bootable) and even if it did it wouldn’t have been a good solution.

I went to the FreeDOS download page [2] and got the “Lite USB” zip file. That contained “FD12LITE.img” which I could just dd to a USB stick. I then used fdisk to create a second 32MB partition, used mkfs.fat to format it, and then copied the BIOS image file to it. I booted the USB stick and then ran the BIOS update program from drive D:. After the BIOS update this became the first system I’ve seen get a totally green result from “spectre-meltdown-checker“!

I found the link to the Linux installer for the new Dell BIOS afterwards, but it was still good to play with FreeDOS.

PERC Driver

I probably didn’t really need to update the PERC (PowerEdge Raid Controller) firmware as I’m just going to run it in JBOD mode. But it was easy to do, a simple bash shell script to update it.

Here are the perccli commands needed to access disks, it’s all hot-plug so you can insert disks and do all this without a reboot:

# show overview
perccli show
# show controller 0 details
perccli /c0 show all
# show controller 0 info with less detail
perccli /c0 show
# clear all "foreign" RAID members
perccli /c0 /fall delete
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

The “perccli /c0 show” command gives the following summary of disk (“PD” in perccli terminology) information amongst other information. The EID is the enclosure, Slt is the “slot” (IE the bay you plug the disk into) and the DID is the disk identifier (not sure what happens if you have multiple enclosures). The allocation of device names (sda, sdb, etc) will be in order of EID:Slt or DID at boot time, and any drives added at run time will get the next letters available.

----------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                     Sp 
----------------------------------------------------------------------------------
32:0      0 Onln   0  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:1      1 Onln   1  465.25 GB SATA SSD Y   N  512B Samsung SSD 850 EVO 500GB U  
32:3      3 Onln   2   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
32:4      4 Onln   3   3.637 TB SATA HDD N   N  512B WDC WD40EURX-64WRWY0      U  
32:5      5 Onln   5 278.875 GB SAS  HDD Y   N  512B ST300MM0026               U  
32:6      6 Onln   6 558.375 GB SAS  HDD N   N  512B AL13SXL600N               U  
32:7      7 Onln   4   3.637 TB SATA HDD N   N  512B ST4000DM000-1F2168        U  
----------------------------------------------------------------------------------

The PERC controller is a MegaRAID with possibly some minor changes, there are reports of Linux MegaRAID management utilities working on it for similar functionality to perccli. The version of MegaRAID utilities I tried didn’t work on my PERC hardware. The smartctl utility works on those disks if you tell it you have a MegaRAID controller (so obviously there’s enough similarity that some MegaRAID utilities will work). Here are example smartctl commands for the first and last disks on my system. Note that the disk device node doesn’t matter as all device nodes associated with the PERC/MegaRAID are equal for smartctl.

# get model number etc on DID 0 (Samsung SSD)
smartctl -d megaraid,0 -i /dev/sda
# get all the basic information on DID 0
smartctl -d megaraid,0 -a /dev/sda
# get model number etc on DID 7 (Seagate 4TB disk)
smartctl -d megaraid,7 -i /dev/sda
# exactly the same output as the previous command
smartctl -d megaraid,7 -i /dev/sdc

I have uploaded etbemon version 1.3.5-6 to Debian which has support for monitoring smartctl status of MegaRAID devices and NVMe devices.

IDRAC

To update IDRAC on Linux there’s a bash script with the firmware in the same file (binary stuff at the end of a shell script). To make things a little more exciting the script insists that rpm be available (running “apt install rpm” fixes that for a Debian system). It also creates and runs other shell scripts which start with “#!/bin/sh” but depend on bash syntax. So I had to make /bin/sh a symlink to /bin/bash. You know you need this if you see errors like “typeset: not found” and “[: -eq: unexpected operator” and then the system reboots. Dell people, please test your scripts on dash (the Debian /bin/sh) or just specify #!/bin/bash.

If the IDRAC update works it will take about 8 minutes.

Lifecycle Controller

The Lifecycle Controller is apparently for installing OS and firmware updates. I use Linux tools to update Linux and I generally don’t plan to update the firmware after deployment (although I could do so from Linux if needed). So it doesn’t seem to offer anything useful to me.

Setting Up IDRAC

For extra excitement I decided to try to setup IDRAC from the Linux command-line. To install the RAC setup tool you run “apt install srvadmin-idracadm7 libargtable2-0” (because srvadmin-idracadm7 doesn’t have the right dependencies).

# srvadmin-idracadm7 is missing a dependency
apt install srvadmin-idracadm7 libargtable2-0
# set the IP address, netmask, and gatewat for IDRAC
idracadm7 setniccfg -s 192.168.0.2 255.255.255.0 192.168.0.1
# put my name on the front panel LCD
idracadm7 set System.LCD.UserDefinedString "Russell Coker"

Conclusion

This is a very nice deskside workstation/server. It’s extremely quiet with hardly any fan noise and the case is strong enough to contain the noise of hard drives. When running with 3* 3.5″ SATA disks and 2*10k 2.5″ SAS disks on a wooden floor it wasn’t annoyingly loud. Without the SAS disks it was as quiet as you can expect any PC to be, definitely not the volume you expect from a serious server! I bought the T320 systems loaded with SAS disks which made them quite loud, I immediately put the disks on ebay and installed SATA SSDs and hard drives which gives me more performance and more space than the SAS disks with less cost and almost no noise.

8*3.5″ drive bays gives room for expansion. I currently have 2*SATA SSDs and 3*SATA disks, the SSDs are for the root filesystem (including /home) and the disks are for a separate filesystem for large files.

Scanning with a MFC-9120CN on Bullseye

I previously wrote about getting a Brother MFC-9120CN multifunction printer/scanner to print on Linux [1]. I had also got it scanning which I didn’t blog about.

found USB scanner (vendor=0x04f9, product=0x021d) at libusb:003:002

I recently upgraded that Linux system to Debian/Testing (which will soon be released as Debian/Bullseye) and scanning broke. The command sane-find-scanner would find the USB connected scanner (with the above output), but “scanimage -L” didn’t.

It turned out that I had to edit /etc/sane.d/dll.d/hplip which had a single uncommented line of “hpaio” and replace that with “brother3” to make SANE load the driver /usr/lib64/sane/libsane-brother3.so from the brscan3 package (which Brother provided from their web site years ago).

I have the following script to do the scanning (which can run as non-root):

#!/bin/bash
set -e
if [ "$1" == "" ]; then
  echo "specify output filename"
  exit 1
fi

TMP=$(mktemp)

scanimage > $TMP
convert $TMP $1
rm $TMP

Final Note

This blog post doesn’t describe everything that needs to be done to setup a scanner, I already had part of it setup from 10 years ago. But for anyone who finds this after having trouble, /etc/sane.d/dll.d is one place you should look for important configuration (especially if sane-find-scanner works and “scanimage -L” fails). Also the Brother drivers are handy to have although I apparently had it working in the past with the hpaio driver from HP (the Brother device emulates a HP device).

1

HP ML350P Gen8

I’m playing with a HP Proliant ML350P Gen8 server (part num 646676-011). For HP servers “ML” means tower (see the ProLiant Wikipedia page for more details [1]). For HP servers the “generation” indicates how old the server is, Gen8 was announced in 2012 and Gen10 seems to be the current generation.

Debian Packages from HP

wget -O /usr/local/hpePublicKey2048_key1.pub https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub
echo "# HP RAID" >> /etc/apt/sources.list
echo "deb [signed-by=/usr/local/hpePublicKey2048_key1.pub] http://downloads.linux.hpe.com/SDR/downloads/MCP/Debian/ buster/current non-free" >> /etc/apt/sources.list

The above commands will setup the APT repository for Debian/Buster. See the HP Downloads FAQ [2] for more information about their repositories.

hponcfg

This package contains the hponcfg program that configures ILO (the HP remote management system) from Linux. One noteworthy command is “hponcfg -r” to reset the ILO, something you should do before selling an old system.

ssacli

This package contains the ssacli program to configure storage arrays, here are some examples of how to use it:

# list controllers and show slot numbers
ssacli controller all show
# list arrays on controller identified by slot and give array IDs
ssacli controller slot=0 array all show
# show details of one array
ssacli controller slot=0 array A show
# show all disks on one controller
ssacli controller slot=0 physicaldrive all show
# show config of a controller, this gives RAID level etc
ssacli controller slot=0 show config
# delete array B (you can immediately pull the disks from it)
ssacli controller slot=0 array B delete
# create an array type RAID0 with specified drives, do this with one drive per array for BTRFS/ZFS
ssacli controller slot=0 create type=arrayr0 drives=1I:1:1

When a disk is used in JBOD mode just under 33MB will be used at the end of the disk for the RAID metadata. If you have existing disks with a DOS partition table you can put it in a HP array as a JBOD and it will work with all data intact (GPT partition table is more complicated). When all disks are removed from the server the cooling fans run at high speed, this would be annoying if you wanted to have a diskless workstation or server using only external storage.

ssaducli

This package contains the ssaducli diagnostic utility for storage arrays. The SSD “wear gauge report” doesn’t work for the 2 SSDs I tested it on, maybe it only supports SAS SSDs not SATA SSDs. It doesn’t seem to do anything that I need.

storcli

This package contains both 32bit and 64bit versions of the MegaRAID utility and deletes whichever one doesn’t match the installation in the package postinst, so it fails debsums checks etc. The MegaRAID utility is for a different type of RAID controller to the “Smart Storage Array” (AKA SSA) that the other utilities work with. As an aside it seems that there are multiple types of MegaRAID controller, the management program from the storcli package doesn’t work on a Dell server with MegaRAID. They should have made separate 32bit and 64bit versions of this package.

Recommendations

Here is HP page for downloading firmware updates (including security updates) [3], you have to login first and have a warranty. This is legal but poor service. Dell servers have comparable prices (on the second hand marker) and comparable features but give free firmware updates to everyone. Dell have overall lower quality of Debian packages for supporting utilities, but a wider range of support so generally Dell support seems better in every way. Dell and HP hardware seems of equal quality so overall I think it’s best to buy Dell.

Suggestions for HP

Finding which of the signing keys to use is unreasonably difficult. You should get some HP employees to sign the HP keys used for repositories with their personal keys and then go to LUG meetings and get their personal keys well connected to the web of trust. Then upload the HP keys to the public key repositories. You should also use the same keys for signing all versions of the repositories. Having different keys for the different versions of Debian wastes people’s time.

Please provide firmware for all users, even if they buy systems second hand. It is in your best interests to have systems used long-term and have them run securely. It is not in your best interests to have older HP servers perform badly.

Having all the fans run at maximum speed when power is turned on is a standard server feature. Some servers can throttle the fan when the BIOS is running, it would be nice if HP servers did that. Having ridiculously loud fans until just before GRUB starts is annoying.

Basics of Linux Kernel Debugging

Firstly a disclaimer, I’m not an expert on this and I’m not trying to instruct anyone who is aiming to become an expert. The aim of this blog post is to help someone who has a single kernel issue they want to debug as part of doing something that’s mostly not kernel coding. I welcome comments about the second step to kernel debugging for the benefit of people who need more than this (which might include me next week). Also suggestions for people who can’t use a kvm/qemu debugger would be good.

Below is a command to run qemu with GDB. It should be run from the Linux kernel source directory. You can add other qemu options for a blog device and virtual networking if necessary, but the bug I encountered gave an oops from the initrd so I didn’t need to go further. The “nokaslr” is to avoid address space randomisation which deliberately makes debugging tasks harder (from a certain perspective debugging a kernel and compromising a kernel are fairly similar). Loading the bzImage is fine, gdb can map that to the different file it looks at later on.

qemu-system-x86_64 -kernel arch/x86/boot/bzImage -initrd ../initrd-$KERN_VER -curses -m 2000 -append "root=/dev/vda ro nokaslr" -gdb tcp::1200

The command to run GDB is “gdb vmlinux“, when at the GDB prompt you can run the command “target remote localhost:1200” to connect to the GDB server port 1200. Note that there is nothing special about port 1200, it was given in an example I saw and is as good as any other port. It is important that you run GDB against the “vmlinux” file in the main directory not any of the several stripped and packaged files, GDB can’t handle a bzImage file but that’s OK, it ends up much the same in RAM.

When the “target remote” command is processed the kernel will be suspended by the debugger, if you are looking for a bug early in the boot you may need to be quick about this. Using “qemu-system-x86_64” instead of “kvm” slows things down and can help in that regard. The bug I was hunting happened 1.6 seconds after kernel load with KVM and 7.8 seconds after kernel load with qemu. I am not aware of all the implications of the kvm vs qemu decision on debugging. If your bug is a race condition then trying both would be a good strategy.

After the “target remote” command you can debug the kernel just like any other program.

If you put a breakpoint on print_modules() that will catch the operation of printing an Oops which can be handy.

Update: Address Space Randomisation

(gdb) b setxattr
Breakpoint 1 at 0xffffffff81332bf0: file fs/xattr.c, line 546.
(gdb) c
Continuing.
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0xffffffff81332bf0

If you get an error like the above while trying to set a breakpoint then it’s probably Address Space Randomisation (known as “KASLR”). Put the parameter “nokaslr” on the kernel command line to stop this. Note that KASLR is a REALLY good thing to have in a normal system as it makes attacks on the kernel security a lot harder, only do this for debugging purposes.

Update: Breakpoints Not Applied

If your gdb session is using a vmlinux that doesn’t match the kernel booted in the VM then things will appear to work, breakpoints can be set, but the running kernel will never break (presumably it would break on some other random kernel code that has the same addresses as the requested function in the other kernel).

So if breakpoints mysteriously don’t work double check that you have a matching vmlinux for the kernel being debugged.

1

PSI and Cgroup2

In the comments on my post about Load Average Monitoring [1] an anonymous person recommended that I investigate PSI. As an aside, why do I get so many great comments anonymously? Don’t people want to get credit for having good ideas and learning about new technology before others?

PSI is the Pressure Stall Information subsystem for Linux that is included in kernels 4.20 and above, if you want to use it in Debian then you need a kernel from Testing or Unstable (Bullseye has kernel 4.19). The place to start reading about PSI is the main Facebook page about it, it was originally developed at Facebook [2].

I am a little confused by the actual numbers I get out of PSI, while for the load average I can often see where they come from (EG have 2 processes each taking 100% of a core and the load average will be about 2) it’s difficult to work out where the PSI numbers come from. For my own use I decided to treat them as unscaled numbers that just indicate problems, higher number is worse and not worry too much about what the number really means.

With the cgroup2 interface which is supported by the version of systemd in Testing (and which has been included in Debian backports for Buster) you get PSI files for each cgroup. I’ve just uploaded version 1.3.5-2 of etbemon (package mon) to Debian/Unstable which displays the cgroups with PSI numbers greater than 0.5% when the load average test fails.

System CPU Pressure: avg10=0.87 avg60=0.99 avg300=1.00 total=20556310510
/system.slice avg10=0.86 avg60=0.92 avg300=0.97 total=18238772699
/system.slice/system-tor.slice avg10=0.85 avg60=0.69 avg300=0.60 total=11996599996
/system.slice/system-tor.slice/tor@default.service avg10=0.83 avg60=0.69 avg300=0.59 total=5358485146

System IO Pressure: avg10=18.30 avg60=35.85 avg300=42.85 total=310383148314
 full avg10=13.95 avg60=27.72 avg300=33.60 total=216001337513
/system.slice avg10=2.78 avg60=3.86 avg300=5.74 total=51574347007
/system.slice full avg10=1.87 avg60=2.87 avg300=4.36 total=35513103577
/system.slice/mariadb.service avg10=1.33 avg60=3.07 avg300=3.68 total=2559016514
/system.slice/mariadb.service full avg10=1.29 avg60=3.01 avg300=3.61 total=2508485595
/system.slice/matrix-synapse.service avg10=2.74 avg60=3.92 avg300=4.95 total=20466738903
/system.slice/matrix-synapse.service full avg10=2.74 avg60=3.92 avg300=4.95 total=20435187166

Above is an extract from the output of the loadaverage check. It shows that tor is a major user of CPU time (the VM runs a ToR relay node and has close to 100% of one core devoted to that task). It also shows that Mariadb and Matrix are the main users of disk IO. When I installed Matrix the Debian package told me that using SQLite would give lower performance than MySQL, but that didn’t seem like a big deal as the server only has a few users. Maybe I should move Matrix to the Mariadb instance. to improve overall system performance.

So far I have not written any code to display the memory PSI files. I don’t have a lack of RAM on systems I run at the moment and don’t have a good test case for this. I welcome patches from people who have the ability to test this and get some benefit from it.

We are probably about 6 months away from a new release of Debian and this is probably the last thing I need to do to make etbemon ready for that.