Archives

Categories

Video Decoding

I’ve had a saga of getting 4K monitors to work well. My latest issue has been video playing, the dreaded mplayer error about the system being too slow. My previous post about 4K was about using DisplayPort to get more than 30Hz scan rate at 4K [1]. I now have a nice 60Hz scan rate which makes WW2 documentaries display nicely among other things.

But when running a 4K monitor on a 3.3GHz i5-2500 quad-core CPU I can’t get a FullHD video to display properly. Part of the process of decoding the video and scaling it to 4K resolution is too slow, so action scenes in movies lag. When running a 2560*1440 monitor on a 2.4GHz E5-2440 hex-core CPU with the mplayer option “-lavdopts threads=3” everything is great (but it fails if mplayer is run with no parameters). In doing tests with apparent performance it seemed that the E5-2440 CPU gains more from the threaded mplayer code than the i5-2500, maybe the E5-2440 is more designed for server use (it’s in a Dell PowerEdge T320 while the i5-2500 is in a random white-box system) or maybe it’s just because it’s newer. I haven’t tested whether the i5-2500 system could perform adequately at 2560*1440 resolution.

The E5-2440 system has an ATI HD 6570 video card which is old, slow, and only does PCIe 2.1 which gives 5GT/s or 8GB/s. The i5-2500 system has a newer ATI video card that is capable of PCIe 3.0, but “lspci -vv” as root says “LnkCap: Port #0, Speed 8GT/s, Width x16” and “LnkSta: Speed 5GT/s (downgraded), Width x16 (ok)”. So for reasons unknown to me the system with a faster PCIe 3.0 video card is being downgraded to PCIe 2.1 speed. A quick check of the web site for my local computer store shows that all ATI video cards costing less than $300 have PCI3 3.0 interfaces and the sole ATI card with PCIe 4.0 (which gives double the PCIe speed if the motherboard supports it) costs almost $500. I’m not inclined to spend $500 on a new video card and then a greater amount of money on a motherboard supporting PCIe 4.0 and CPU and RAM to go in it.

According to my calculations 3840*2160 resolution at 24bpp (probably 32bpp data transfers) at 30 frames/sec means 3840*2160*4*30/1024/1024=950MB/s. PCIe 2.1 can do 8GB/s so that probably isn’t a significant problem.

I’d been planning on buying a new video card for the E5-2440 system, but due to some combination of having a better CPU and lower screen resolution it is working well for video playing so I can save my money.

As an aside the T320 is a server class system that had been running for years in a corporate DC. When I replaced the high speed SAS disks with SSDs SATA disks it became quiet enough for a home workstation. It works very well at that task but the BIOS is quite determined to keep the motherboard video running due to the remote console support. So swapping monitors around was more pain than I felt like going through, I just got it working and left it. I ordered a special GPU power cable but found that the older video card that doesn’t need an extra power cable performs adequately before the cable arrived.

Here is a table comparing the systems.

2560*1440 works well 3840*2160 goes slow
System Dell PowerEdge T320 White Box PC from rubbish
CPU 2.4GHz E5-2440 3.3GHz i5-2500
Video Card ATI Radeon HD 6570 ATI Radeon R7 260X
PCIe Speed PCIe 2.1 – 8GB/s PCIe 3.0 downgraded to PCIe 2.1 – 8GB/s

Conclusion

The ATI Radeon HD 6570 video card is one that I had previously tested and found inadequate for 4K support, I can’t remember if it didn’t work at that resolution or didn’t support more than 30Hz scan rate. If the 2560*1440 monitor dies then it wouldn’t make sense to buy anything less than a 4K monitor to replace it which means that I’d need to get a new video card to match. But for the moment 2560*1440 is working well enough so I won’t upgrade it any time soon. I’ve already got the special power cable (specified as being for a Dell PowerEdge R610 for anyone else who wants to buy one) so it will be easy to install a powerful video card in a hurry.

First Attempt at Gnocchi-Statsd

I’ve been investigating the options for tracking system statistics to diagnose performance problems. The idea is to track all sorts of data about the system (network use, disk IO, CPU, etc) and look for correlations at times of performance problems. DataDog is pretty good for this but expensive, it’s apparently based on or inspired by the Etsy Statsd. It’s claimed that the gnocchi-statsd is the best implementation of the protoco used by the Etsy Statsd, so I decided to install that.

I use Debian/Buster for this as that’s what I’m using for the hardware that runs KVM VMs. Here is what I did:

# it depends on a local MySQL database
apt -y install mariadb-server mariadb-client
# install the basic packages for gnocchi
apt -y install gnocchi-common python3-gnocchiclient gnocchi-statsd uuid

In the Debconf prompts I told it to “setup a database” and not to manage keystone_authtoken with debconf (because I’m not doing a full OpenStack installation).

This gave a non-working configuration as it didn’t configure the MySQL database for the [indexer] section and the sqlite database that was configured didn’t work for unknown reasons. I filed Debian bug #971996 about this [1]. To get this working you need to edit /etc/gnocchi/gnocchi.conf and change the url line in the [indexer] section to something like the following (where the password is taken from the [database] section).

url = mysql+pymysql://gnocchi-common:PASS@localhost:3306/gnocchidb

To get the statsd interface going you have to install the gnocchi-statsd package and edit /etc/gnocchi/gnocchi.conf to put a UUID in the resource_id field (the Debian package uuid is good for this). I filed Debian bug #972092 requesting that the UUID be set by default on install [2].

Here’s an official page about how to operate Gnocchi [3]. The main thing I got from this was that the following commands need to be run from the command-line (I ran them as root in a VM for test purposes but would do so with minimum privs for a real deployment).

gnocchi-api
gnocchi-metricd

To communicate with Gnocchi you need the gnocchi-api program running, which uses the uwsgi program to provide the web interface by default. It seems that this was written for a version of uwsgi different than the one in Buster. I filed Debian bug #972087 with a patch to make it work with uwsgi [4]. Note that I didn’t get to the stage of an end to end test, I just got it to basically run without error.

After getting “gnocchi-api” running (in a terminal not as a daemon as Debian doesn’t seem to have a service file for it), I ran the client program “gnocchi” and then gave it the “status” command which failed (presumably due to the metrics daemon not running), but at least indicated that the client and the API could communicate.

Then I ran the “gnocchi-metricd” and got the following error:

2020-10-12 14:59:30,491 [9037] ERROR    gnocchi.cli.metricd: Unexpected error during processing job
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 87, in run
    self._run_job()
  File "/usr/lib/python3/dist-packages/gnocchi/cli/metricd.py", line 248, in _run_job
    self.coord.update_capabilities(self.GROUP_ID, self.store.statistics)
  File "/usr/lib/python3/dist-packages/tooz/coordination.py", line 592, in update_capabilities
    raise tooz.NotImplemented
tooz.NotImplemented

At this stage I’ve had enough of gnocchi. I’ll give the Etsy Statsd a go next.

Update

Thomas has responded to this post [5]. At this stage I’m not really interested in giving Gnocchi another go. There’s still the issue of the indexer database which should be different from the main database somehow and sqlite (the config file default) doesn’t work.

I expect that if I was to persist with Gnocchi I would encounter more poorly described error messages from the code which either don’t have Google hits when I search for them or have Google hits to unanswered questions from 5+ years ago.

The Gnocchi systemd config files are in different packages to the programs, this confused me and I thought that there weren’t any systemd service files. I had expected that installing a package with a daemon binary would also get the systemd unit file to match.

The cluster features of Gnocchi are probably really good if you need that sort of thing. But if you have a small instance (EG a single VM server) then it’s not needed. Also one of the original design ideas of the Etsy Statsd was that UDP was used because data could just be dropped if there was a problem. I think for many situations the same concept could apply to the entire stats service.

If the other statsd programs don’t do what I need then I may give Gnocchi another go.

Bandwidth for Video Conferencing

For the Linux Users of Victoria (LUV) I’ve run video conferences on Jitsi and BBB (see my previous post about BBB vs Jitsi [1]). One issue with video conferences is the bandwidth requirements.

The place I’m hosting my video conference server has a NBN link with allegedly 40Mb/s transmission speed and 100Mb/s reception speed. My tests show that it can transmit at about 37Mb/s and receive at speeds significantly higher than that but also quite a bit lower than 100Mb/s (around 60 or 70Mb/s). For a video conference server you have a small number of sources of video and audio and a larger number of targets as usually most people will have their microphones muted and video cameras turned off. This means that the transmission speed is the bottleneck. In every test the reception speed was well below half the transmission speed, so the tests confirmed my expectation that transmission was the only bottleneck, but the reception speed was higher than I had expected.

When we tested bandwidth use the maximum upload speed we saw was about 4MB/s (32Mb/s) with 8+ video cameras and maybe 20 people seeing some of the video (with a bit of lag). We used 3.5MB/s (28Mb/s) when we only had 6 cameras which seemed to be the maximum for good performance.

In another test run we had 4 people all sending video and the transmission speed was about 260KB/s.

I don’t know how BBB manages the small versions of video streams. It might reduce the bandwidth when the display window is smaller.

I don’t know the resolutions of the cameras. When you start sending video in BBB you are prompted for the “quality” with “medium” being default. I don’t know how different camera hardware and different choices about “quality” affect bandwidth.

These tests showed that for the cameras we had available a small group of people video chatting a 100/40 NBN link (the fastest Internet link in Australia that’s not really expensive) a small group of people can be all sending video or a medium size group of people can watch video streams from a small group.

For meetings of the typical size of LUV meetings we won’t have a bandwidth problem.

There is one common case that I haven’t yet tested, where there is a single video stream that many people are watching. If 4 people are all sending video with 260KB/s transmission bandwidth then 1 person could probably send video to 4 for 65KB/s. Doing some simple calculations on those numbers implies that we could have 1 person sending video to 240 people without running out of bandwidth. I really doubt that would work, but further testing is needed.

Qemu (KVM) and 9P (Virtfs) Mounts

I’ve tried setting up the Qemu (in this case KVM as it uses the Qemu code in question) 9P/Virtfs filesystem for sharing files to a VM. Here is the Qemu documentation for it [1].

VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=mapped-xattr,id=zz,writeout=immediate,fmode=0600,dmode=0700,mount_tag=zz"
VIRTFS="-virtfs local,path=/vmstore/virtfs,security_model=passthrough,id=zz,writeout=immediate,mount_tag=zz"

Above are the 2 configuration snippets I tried on the server side. The first uses mapped xattrs (which means that all files will have the same UID/GID and on the host XATTRs will be used for storing the Unix permissions) and the second uses passthrough which requires KVM to run as root and gives the same permissions on the host as on the VM. The advantages of passthrough are better performance through writing less metadata and having the same permissions in host and VM. The advantages of mapped XATTRs are running KVM/Qemu as non-root and not having a SUID file in the VM imply a SUID file in the host.

Here is the link to Bonnie++ output comparing Ext3 on a KVM block device (stored on a regular file in a BTRFS RAID-1 filesystem on 2 SSDs on the host), a NFS share from the host from the same BTRFS filesystem, and virtfs shares of the same filesystem. The only tests that Ext3 doesn’t win are some of the latency tests, latency is based on the worst-case not the average. I expected Ext3 to win most tests, but didn’t expect it to lose any latency tests.

Here is a link to Bonnie++ output comparing just NFS and Virtfs. It’s obvious that Virtfs compares poorly, giving about half the performance on many tests. Surprisingly the only tests where Virtfs compared well to NFS were the file creation tests which I expected Virtfs with mapped XATTRs to do poorly due to the extra metadata.

Here is a link to Bonnie++ output comparing only Virtfs. The options are mapped XATTRs with default msize, mapped XATTRs with 512k msize (I don’t know if this made a difference, the results are within the range of random differences), and passthrough. There’s an obvious performance benefit in passthrough for the small file tests due to the less metadata overhead, but as creating small files isn’t a bottleneck on most systems a 20% to 30% improvement in that area probably doesn’t matter much. The result from the random seeks test in passthrough is unusual, I’ll have to do more testing on that.

SE Linux

On Virtfs the XATTR used for SE Linux labels is passed through to the host. So every label used in a VM has to be valid on the host and accessible to the context of the KVM/Qemu process. That’s not really an option so you have to use the context mount option. Having the mapped XATTR mode work for SE Linux labels is a necessary feature.

Conclusion

The msize mount option in the VM doesn’t appear to do anything and it doesn’t appear in /proc/mounts, I don’t know if it’s even supported in the kernel I’m using.

The passthrough and mapped XATTR modes give near enough performance that there doesn’t seem to be a benefit of one over the other.

NFS gives significant performance benefits over Virtfs while also using less CPU time in the VM. It has the issue of files named .nfs* hanging around if the VM crashes while programs were using deleted files. It’s also more well known, ask for help with an NFS problem and you are more likely to get advice than when asking for help with a virtfs problem.

Virtfs might be a better option for accessing databases than NFS due to it’s internal operation probably being a better map to Unix filesystem semantics, but running database servers on the host is probably a better choice anyway.

Virtfs generally doesn’t seem to be worth using. I had hoped for performance that was better than NFS but the only benefit I seemed to get was avoiding the .nfs* file issue.

The best options for storage for a KVM/Qemu VM seem to be Ext3 for files that are only used on one VM and for which the size won’t change suddenly or unexpectedly (particularly the root filesystem) and NFS for everything else.

Links September 2020

MD5 cracker, find plain text that matches MD5 hash [1].

Debian Quick Image Baker – Debian VM images for various architectures [2].

Insightful article on Scientific American about how dental and orthodontic problems are caused by our modern lifestyle [3].

Cory Doctorow wrote an insightful article for Locus Magazine about Intellectual Property [4]. He makes the case that we are losing our rights due to changes in the legal system that benefits corporations.

Anarcat has an informative blog post about how to stop people using your Mailman installation to attack others [5]. Since version 1:2.1.27-1 the Debian package of Mailman has created a SUBSCRIBE_FORM_SECRET by default on install, but any installation upgraded from an older version of the Debian package won’t have it.

Burning Lithium Ion Batteries

I had an old Nexus 4 phone that was expanding and decided to test some of the theories about battery combustion.

The first claim that often gets made is that if the plastic seal on the outside of the battery is broken then the battery will catch fire. I tested this by cutting the battery with a craft knife. With every cut the battery sparked a bit and then when I levered up layers of the battery (it seems to be multiple flat layers of copper and black stuff inside the battery) there were more sparks. The battery warmed up, it’s plausible that in a confined environment that could get hot enough to set something on fire. But when the battery was resting on a brick in my backyard that wasn’t going to happen.

The next claim is that a Li-Ion battery fire will be increased with water. The first thing to note is that Li-Ion batteries don’t contain Lithium metal (the Lithium high power non-rechargeable batteries do). Lithium metal will seriously go off it exposed to water. But lots of other Lithium compounds will also react vigorously with water (like Lithium oxide for example). After cutting through most of the center of the battery I dripped some water in it. The water boiled vigorously and the corners of the battery (which were furthest away from the area I cut) felt warmer than they did before adding water. It seems that significant amounts of energy are released when water reacts with whatever is inside the Li-Ion battery. The reaction was probably giving off hydrogen gas but didn’t appear to generate enough heat to ignite hydrogen (which is when things would really get exciting). Presumably if a battery was cut in the presence of water while in an enclosed space that traps hydrogen then the sparks generated by the battery reacting with air could ignite hydrogen generated from the water and give an exciting result.

It seems that a CO2 fire extinguisher would be best for a phone/tablet/laptop fire as that removes oxygen and cools it down. If that isn’t available then a significant quantity of water will do the job, water won’t stop the reaction (it can prolong it), but it can keep the reaction to below 100C which means it won’t burn a hole in the floor and the range of toxic chemicals released will be reduced.

The rumour that a phone fire on a plane could do a “China syndrome” type thing and melt through the Aluminium body of the plane seems utterly bogus. I gave it a good try and was unable to get a battery to burn through it’s plastic and metal foil case. A spare battery for a laptop in checked luggage could be a major problem for a plane if it ignited. But a battery in the passenger area seems unlikely to be a big problem if plenty of water is dumped on it to prevent the plastic case from burning and polluting the air.

I was not able to get a result that was even worthy of a photograph. I may do further tests with laptop batteries.

Dell BIOS Updates

I have just updated the BIOS on a Dell PowerEdge T110 II. The process isn’t too difficult, Google for the machine name and BIOS, download a shell script encoded firmware image and GPG signature, then run the script on the system in question.

One problem is that the Dell GPG key isn’t signed by anyone. How hard would it be to get a few well connected people in the Linux community to sign the key used for signing Linux scripts for updating the BIOS? I would be surprised if Dell doesn’t employ a few people who are well connected in the Linux community, they should just ask all employees to sign such GPG keys! Failing that there are plenty of other options. I’d be happy to sign the Dell key if contacted by someone who can prove that they are a responsible person in Dell. If I could phone Dell corporate and ask for the engineering department and then have someone tell me the GPG fingerprint I’ll sign the key and that problem will be partially solved (my key is well connected but you need more than one signature).

The next issue is how to determine that a BIOS update works. What you really don’t want is to have a BIOS update fail and brick your system! So the Linux update process loads the image into something (special firmware RAM maybe) and then reboots the system and the reboot then does a critical part of the update. If the reboot doesn’t work then you end up with the old version of the BIOS. This is overall a good thing.

The PowerEdge T110 II is a workstation with an NVidia video card (I tried an ATI card but that wouldn’t boot for unknown reasons). The Nouveau driver has some issues. One thing I have done to work around some Nouveau issues is to create a file “~/.config/plasma-workspace/env/nouveau-broken.sh” (for KDE sessions) with the following contents:

export LIBGL_ALWAYS_SOFTWARE=1

I previously wrote about using this just for Kmail to stop it crashing [1]. But after doing that I still had other problems with video and disabling all GL on the NVidia card was necessary.

The latest problem I’ve had is that even when using that configuration things don’t go well. When I run the “reboot” command I end up with a kernel message about the GPU not responding and then it doesn’t reboot. That means that the BIOS update doesn’t apply, a hard reboot signals to the system that the new BIOS wasn’t good and I end up with the old BIOS again. I discovered that disabling sddm (the latest xdm program in Debian) from starting on boot meant that a reboot command would work. Then I ran the BIOS update script and it’s reboot command worked and gave a successful BIOS update.

So I’ve gone from a 2013 BIOS to a 2018 BIOS! The update list says that some CVEs have been addressed, but the spectre-meltdown-checker doesn’t report any fewer vulnerabilities.

More About the PowerEdge R710

I’ve got the R710 (mentioned in my previous post [1]) online. When testing the R710 at home I noticed that sometimes the VGA monitor I was using would start flickering when in some parts of the BIOS setup, it seemed that the horizonal sync wasn’t working properly. It didn’t seem to be a big deal at the time. When I deployed it the KVM display that I had planned to use with it mostly didn’t display anything. When the display was working the KVM keyboard wouldn’t work (and would prevent a regular USB keyboard from working if they were both connected at the same time). The VGA output of the R710 also wouldn’t work with my VGA->HDMI device so I couldn’t get it working with my portable monitor.

Fortunately the Dell front panel has a display and tiny buttons that allow configuring the IDRAC IP address, so I was able to get IDRAC going. One thing Dell really should do is allow the down button to change 0 to 9 when entering numbers, that would make it easier to enter 8.8.8.8 for the DNS server. Another thing Dell should do is make the default gateway have a default value according to the IP address and netmask of the server.

When I got IDRAC going it was easy to setup a serial console, boot from a rescue USB device, create a new initrd with the driver for the MegaRAID controller, and then reboot into the server image.

When I transferred the SSDs from the old server to the newer Dell server the problem I had was that the Dell drive caddies had no holes in suitable places for attaching SSDs. I ended up just pushing the SSDs in so they are hanging in mid air attached only by the SATA/SAS connectors. Plugging them in took the space from the above drive, so instead of having 2*3.5″ disks I have 1*2.5″ SSD and need the extra space to get my hand in. The R710 is designed for 6*3.5″ disks and I’m going to have trouble if I ever want to have more than 3*2.5″ SSDs. Fortunately I don’t think I’ll need more SSDs.

After booting the system I started getting alerts about a “fault” in one SSD, with no detail on what the fault might be. My guess is that the SSD in question is M.2 and it’s in a M.2 to regular SATA adaptor which might have some problems. The data seems fine though, a BTRFS scrub found no checksum errors. I guess I’ll have to buy a replacement SSD soon.

I configured the system to use the “nosmt” kernel command line option to disable hyper-threading (which won’t provide much performance benefit but which makes certain types of security attacks much easier). I’ve configured BOINC to run on 6/8 CPU cores and surprisingly that didn’t cause the fans to be louder than when the system was idle. It seems that a system that is designed for 6 SAS disks doesn’t need a lot of cooling when run with SSDs.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

Setting Up a Dell R710 Server with DRAC6

I’ve just got a Dell R710 server for LUV and I’ve been playing with the configuration options. Here’s a list of what I’ve done and recommendations on how to do things. I decided not to try to write a step by step guide to doing stuff as the situation doesn’t work for that. I think that a list of how things are broken and what to avoid is more useful.

BIOS

Firstly with a Dell server you can upgrade the BIOS from a Linux root shell. Generally when a server is deployed you won’t want to upgrade the BIOS (why risk breaking something when it’s working), so before deployment you probably should install the latest version. Dell provides a shell script with encoded binaries in it that you can just download and run, it’s ugly but it works. The process of loading the firmware takes a long time (a few minutes) with no progress reports, so best to plug both PSUs in and leave it alone. At the end of loading the firmware a hard reboot will be performed, so upgrading your distribution while doing the install is a bad idea (Debian is designed to not lose data in this situation so it’s not too bad).

IDRAC

IDRAC is the Integrated Dell Remote Access Controller. By default it will listen on all Ethernet ports, get an IP address via DHCP (using a different Ethernet hardware address to the interface the OS sees), and allow connections. Configuring it to be restrictive as to which ports it listens on may be a good idea (the R710 had 4 ports built in so having one reserved for management is usually OK). You need to configure a username and password for IDRAC that has administrative access in the BIOS configuration.

Web Interface

By default IDRAC will run a web server on the IP address it gets from DHCP, you can connect to that from any web browser that allows ignoring invalid SSL keys. Then you can use the username and password configured in the BIOS to login. IDRAC 6 on the PowerEdge R710 recommends IE 6.

To get a ssl cert that matches the name you want to use (and doesn’t give browser errors) you have to generate a CSR (Certificate Signing Request) on the DRAC, the only field that matters is the CN (Common Name), the rest have to be something that Letsencrypt will accept. Certbot has the option “--config-dir /etc/letsencrypt-drac” to specify an alternate config directory, the SSL key for DRAC should be entirely separate from the SSL key for other stuff. Then use the “--csr” option to specify the path of the CSR file. When you run letsencrypt the file name of the output file you want will be in the format “*_chain.pem“. You then have to upload that to IDRAC to have it used. This is a major pain for the lifetime of letsencrypt certificates. Hopefully a more recent version of IDRAC has Certbot built in.

When you access RAC via ssh (see below) you can run the command “racadm sslcsrgen” to generate a CSR that can be used by certbot. So it’s probably possible to write expect scripts to get that CSR, run certbot, and then put the ssl certificate back. I don’t expect to use IDRAC often enough to make it worth the effort (I can tell my browser to ignore an outdated certificate), but if I had dozens of Dells I’d script it.

SSH

The web interface allows configuring ssh access which I strongly recommend doing. You can configure ssh access via password or via ssh public key. For ssh access set TERM=vt100 on the host to enable backspace as ^H. Something like “TERM=vt100 ssh root@drac“. Note that changing certain other settings in IDRAC such as enabling Smartcard support will disable ssh access.

There is a limit to the number of open “sessions” for managing IDRAC, when you ssh to the IDRAC you can run “racadm getssninfo” to get a list of sessions and “racadm closessn -i NUM” to close a session. The closessn command takes a “-a” option to close all sessions but that just gives an error saying that you can’t close your own session because of programmer stupidity. The IDRAC web interface also has an option to close sessions. If you get to the limits of both ssh and web sessions due to network issues then you presumably have a problem.

I couldn’t find any documentation on how the ssh host key is generated. I Googled for the key fingerprint and didn’t get a match so there’s a reasonable chance that it’s unique to the server (please comment if you know more about this).

Don’t Use Graphical Console

The R710 is an older server and runs IDRAC6 (IDRAC9 is the current version). The Java based graphical console access doesn’t work with recent versions of Java. The Debian package icedtea-netx has has the javaws command for running the .jnlp command for the console, by default the web browser won’t run this, you download the .jnlp file and pass that as the first parameter to the javaws program which then downloads a bunch of Java classes from the IDRAC to run. One error I got with Java 11 was ‘Exception in thread “Timer-0” java.util.ServiceConfigurationError: java.nio.charset.spi.CharsetProvider: Provider sun.nio.cs.ext.ExtendedCharsets could not be instantiated‘, Google didn’t turn up any solutions to this. Java 8 didn’t have that problem but had a “connection failed” error that some people reported as being related to the SSL key, but replacing the SSL key for the web server didn’t help. The suggestion of running a VM with an old web browser to access IDRAC didn’t appeal. So I gave up on this. Presumably a Windows VM running IE6 would work OK for this.

Serial Console

Fortunately IDRAC supports a serial console. Here’s a page summarising Serial console setup for DRAC [1]. Once you have done that put “console=tty0 console=ttyS1,115200n8” on the kernel command line and Linux will send the console output to the virtual serial port. To access the serial console from remote you can ssh in and run the RAC command “console com2” (there is no option for using a different com port). The serial port seems to be unavailable through the web interface.

If I was supporting many Dell systems I’d probably setup a ssh to JavaScript gateway to provide a web based console access. It’s disappointing that Dell didn’t include this.

If you disconnect from an active ssh serial console then the RAC might keep the port locked, then any future attempts to connect to it will give the following error:

/admin1-> console com2
console: Serial Device 2 is currently in use

So far the only way I’ve discovered to get console access again after that is the command “racadm racreset“. If anyone knows of a better way please let me know. As an aside having “racreset” being so close to “racresetcfg” (which resets the configuration to default and requires a hard reboot to configure it again) seems like a really bad idea.

Host Based Management

deb http://linux.dell.com/repo/community/ubuntu xenial openmanage

The above apt sources.list line allows installing Dell management utilities (Xenial is old but they work on Debian/Buster). Probably the packages srvadmin-storageservices-cli and srvadmin-omacore will drag in enough dependencies to get it going.

Here are some useful commands:

# show hardware event log
omreport system esmlog
# show hardware alert log
omreport system alertlog
# give summary of system information
omreport system summary
# show versions of firmware that can be updated
omreport system version
# get chassis temp
omreport chassis temps
# show physical disk details on controller 0
omreport storage pdisk controller=0

RAID Control

The RAID controller is known as PERC (PowerEdge Raid Controller), the Dell web site has an rpm package of the perccli tool to manage the RAID from Linux. This is statically linked and appears to have different versions of libraries used to make it. The command “perccli show” gives an overview of the configuration, but the command “perccli /c0 show” to give detailed information on controller 0 SEGVs and the kernel logs a “vsyscall attempted with vsyscall=none” message. Here’s an overview of the vsyscall enmulation issue [2]. Basically I could add “vsyscall=emulate” to the kernel command line and slightly reduce security for the system to allow system calls from perccli that are called from old libc code to work, or I could run code from a dubious source as root.

Some versions of IDRAC have a “racadm raid” command that can be run from a ssh session to perform RAID administration remotely, mine doesn’t have that. As an aside the help for the RAC system doesn’t list all commands and the Dell documentation is difficult to find so blog posts from other sysadmins is the best source of information.

I have configured IDRAC to have all of the BIOS output go to the virtual serial console over ssh so I can see the BIOS prompt me for PERC administration but it didn’t accept my key presses when I tried to do so. In any case this requires downtime and I’d like to be able to change hard drives without rebooting.

I found vsyscall_trace on Github [3], it uses the ptrace interface to emulate vsyscall on a per process basis. You just run “vsyscall_trace perccli” and it works! Thanks Geoffrey Thomas for writing this!

Here are some perccli commands:

# show overview
perccli show
# help on adding a vd (RAID)
perccli /c0 add help
# show controller 0 details
perccli /c0 show
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

When a disk is added to a PERC controller about 525MB of space is used at the end for RAID metadata. So when you create a RAID-0 with a single device as in the above example all disk data is preserved by default except for the last 525MB. I have tested this by running a BTRFS scrub on a disk from another system after making it into a RAID-0 on the PERC.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

BBB vs Jitsi

I previously wrote about how I installed the Jitsi video-conferencing system on Debian [1]. We used that for a few unofficial meetings of LUV to test it out. Then we installed Big Blue Button (BBB) [2]. The main benefit of Jitsi over BBB is that it supports live streaming to YouTube. The benefits of BBB are a better text chat system and a “whiteboard” that allows conference participants to draw shared diagrams. So if you have the ability to run both systems then it’s best to use Jitsi when you have so many viewers that a YouTube live stream is needed and to use BBB in all other situations.

One problem is with the ability to run both systems. Jitsi isn’t too hard to install if you are installing it on a VM that is not used for anything else. BBB is a major pain no matter what you do. The latest version of BBB is 2.2 which was released in March 2020 and requires Ubuntu 16.04 (which was released in 2016 and has “standard support” until April next year) and doesn’t support Ubuntu 18.04 (released in 2018 and has “standard support” until 2023). The install script doesn’t check for correct apt repositories and breaks badly with no explanation if you don’t have Ubuntu Multiverse enabled.

I expect that they rushed a release because of the significant increase in demand for video conferencing this year. But that’s no reason for demanding the 2016 version of Ubuntu, why couldn’t they have developed on version 18.04 for the last 2 years? Since that release they have had 6 months in which they could have released a 2.2.1 version supporting Ubuntu 18.04 or even 20.04.

The dependency list for BBB is significant, among other things it uses LibreOffice for the whiteboard. This adds to the pain of installing and maintaining it. It wouldn’t surprise me if some of the interactions between all the different components have security issues.

Conclusion

If you want something that’s not really painful to install and run then use Jitsi.

If you need YouTube live streaming use Jitsi.

If you need whiteboards and a good text chat system or if you generally need to run things like a classroom then BBB is a good option. But only if you can manage it, know someone who can manage it for you, or are happy to pay for a managed service provider to do it for you.