Archives

Categories

Setting Up a Dell R710 Server with DRAC6

I’ve just got a Dell R710 server for LUV and I’ve been playing with the configuration options. Here’s a list of what I’ve done and recommendations on how to do things. I decided not to try to write a step by step guide to doing stuff as the situation doesn’t work for that. I think that a list of how things are broken and what to avoid is more useful.

BIOS

Firstly with a Dell server you can upgrade the BIOS from a Linux root shell. Generally when a server is deployed you won’t want to upgrade the BIOS (why risk breaking something when it’s working), so before deployment you probably should install the latest version. Dell provides a shell script with encoded binaries in it that you can just download and run, it’s ugly but it works. The process of loading the firmware takes a long time (a few minutes) with no progress reports, so best to plug both PSUs in and leave it alone. At the end of loading the firmware a hard reboot will be performed, so upgrading your distribution while doing the install is a bad idea (Debian is designed to not lose data in this situation so it’s not too bad).

IDRAC

IDRAC is the Integrated Dell Remote Access Controller. By default it will listen on all Ethernet ports, get an IP address via DHCP (using a different Ethernet hardware address to the interface the OS sees), and allow connections. Configuring it to be restrictive as to which ports it listens on may be a good idea (the R710 had 4 ports built in so having one reserved for management is usually OK). You need to configure a username and password for IDRAC that has administrative access in the BIOS configuration.

Web Interface

By default IDRAC will run a web server on the IP address it gets from DHCP, you can connect to that from any web browser that allows ignoring invalid SSL keys. Then you can use the username and password configured in the BIOS to login. IDRAC 6 on the PowerEdge R710 recommends IE 6.

To get a ssl cert that matches the name you want to use (and doesn’t give browser errors) you have to generate a CSR (Certificate Signing Request) on the DRAC, the only field that matters is the CN (Common Name), the rest have to be something that Letsencrypt will accept. Certbot has the option “--config-dir /etc/letsencrypt-drac” to specify an alternate config directory, the SSL key for DRAC should be entirely separate from the SSL key for other stuff. Then use the “--csr” option to specify the path of the CSR file. When you run letsencrypt the file name of the output file you want will be in the format “*_chain.pem“. You then have to upload that to IDRAC to have it used. This is a major pain for the lifetime of letsencrypt certificates. Hopefully a more recent version of IDRAC has Certbot built in.

When you access RAC via ssh (see below) you can run the command “racadm sslcsrgen” to generate a CSR that can be used by certbot. So it’s probably possible to write expect scripts to get that CSR, run certbot, and then put the ssl certificate back. I don’t expect to use IDRAC often enough to make it worth the effort (I can tell my browser to ignore an outdated certificate), but if I had dozens of Dells I’d script it.

SSH

The web interface allows configuring ssh access which I strongly recommend doing. You can configure ssh access via password or via ssh public key. For ssh access set TERM=vt100 on the host to enable backspace as ^H. Something like “TERM=vt100 ssh root@drac“. Note that changing certain other settings in IDRAC such as enabling Smartcard support will disable ssh access.

There is a limit to the number of open “sessions” for managing IDRAC, when you ssh to the IDRAC you can run “racadm getssninfo” to get a list of sessions and “racadm closessn -i NUM” to close a session. The closessn command takes a “-a” option to close all sessions but that just gives an error saying that you can’t close your own session because of programmer stupidity. The IDRAC web interface also has an option to close sessions. If you get to the limits of both ssh and web sessions due to network issues then you presumably have a problem.

I couldn’t find any documentation on how the ssh host key is generated. I Googled for the key fingerprint and didn’t get a match so there’s a reasonable chance that it’s unique to the server (please comment if you know more about this).

Don’t Use Graphical Console

The R710 is an older server and runs IDRAC6 (IDRAC9 is the current version). The Java based graphical console access doesn’t work with recent versions of Java. The Debian package icedtea-netx has has the javaws command for running the .jnlp command for the console, by default the web browser won’t run this, you download the .jnlp file and pass that as the first parameter to the javaws program which then downloads a bunch of Java classes from the IDRAC to run. One error I got with Java 11 was ‘Exception in thread “Timer-0” java.util.ServiceConfigurationError: java.nio.charset.spi.CharsetProvider: Provider sun.nio.cs.ext.ExtendedCharsets could not be instantiated‘, Google didn’t turn up any solutions to this. Java 8 didn’t have that problem but had a “connection failed” error that some people reported as being related to the SSL key, but replacing the SSL key for the web server didn’t help. The suggestion of running a VM with an old web browser to access IDRAC didn’t appeal. So I gave up on this. Presumably a Windows VM running IE6 would work OK for this.

Serial Console

Fortunately IDRAC supports a serial console. Here’s a page summarising Serial console setup for DRAC [1]. Once you have done that put “console=tty0 console=ttyS1,115200n8” on the kernel command line and Linux will send the console output to the virtual serial port. To access the serial console from remote you can ssh in and run the RAC command “console com2” (there is no option for using a different com port). The serial port seems to be unavailable through the web interface.

If I was supporting many Dell systems I’d probably setup a ssh to JavaScript gateway to provide a web based console access. It’s disappointing that Dell didn’t include this.

If you disconnect from an active ssh serial console then the RAC might keep the port locked, then any future attempts to connect to it will give the following error:

/admin1-> console com2
console: Serial Device 2 is currently in use

So far the only way I’ve discovered to get console access again after that is the command “racadm racreset“. If anyone knows of a better way please let me know. As an aside having “racreset” being so close to “racresetcfg” (which resets the configuration to default and requires a hard reboot to configure it again) seems like a really bad idea.

Host Based Management

deb http://linux.dell.com/repo/community/ubuntu xenial openmanage

The above apt sources.list line allows installing Dell management utilities (Xenial is old but they work on Debian/Buster). Probably the packages srvadmin-storageservices-cli and srvadmin-omacore will drag in enough dependencies to get it going.

Here are some useful commands:

# show hardware event log
omreport system esmlog
# show hardware alert log
omreport system alertlog
# give summary of system information
omreport system summary
# show versions of firmware that can be updated
omreport system version
# get chassis temp
omreport chassis temps
# show physical disk details on controller 0
omreport storage pdisk controller=0

RAID Control

The RAID controller is known as PERC (PowerEdge Raid Controller), the Dell web site has an rpm package of the perccli tool to manage the RAID from Linux. This is statically linked and appears to have different versions of libraries used to make it. The command “perccli show” gives an overview of the configuration, but the command “perccli /c0 show” to give detailed information on controller 0 SEGVs and the kernel logs a “vsyscall attempted with vsyscall=none” message. Here’s an overview of the vsyscall enmulation issue [2]. Basically I could add “vsyscall=emulate” to the kernel command line and slightly reduce security for the system to allow system calls from perccli that are called from old libc code to work, or I could run code from a dubious source as root.

Some versions of IDRAC have a “racadm raid” command that can be run from a ssh session to perform RAID administration remotely, mine doesn’t have that. As an aside the help for the RAC system doesn’t list all commands and the Dell documentation is difficult to find so blog posts from other sysadmins is the best source of information.

I have configured IDRAC to have all of the BIOS output go to the virtual serial console over ssh so I can see the BIOS prompt me for PERC administration but it didn’t accept my key presses when I tried to do so. In any case this requires downtime and I’d like to be able to change hard drives without rebooting.

I found vsyscall_trace on Github [3], it uses the ptrace interface to emulate vsyscall on a per process basis. You just run “vsyscall_trace perccli” and it works! Thanks Geoffrey Thomas for writing this!

Here are some perccli commands:

# show overview
perccli show
# help on adding a vd (RAID)
perccli /c0 add help
# show controller 0 details
perccli /c0 show
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0

When a disk is added to a PERC controller about 525MB of space is used at the end for RAID metadata. So when you create a RAID-0 with a single device as in the above example all disk data is preserved by default except for the last 525MB. I have tested this by running a BTRFS scrub on a disk from another system after making it into a RAID-0 on the PERC.

Update: It’s a R710 not a T710. I mostly deal with Dell Tower servers and typed the wrong letter out of habit.

1 comment to Setting Up a Dell R710 Server with DRAC6

  • grawity

    iDRAC6 works well with JRE 7. On Linux, I have a wrapper script that starts a custom .jnlp file (tested with the latest iDRAC6 firmware 2.92, I think the paths may have been different in older versions): https://github.com/grawity/code/blob/master/net/pydrac#L14-L81

    The last time I needed it on Windows, though, I didn’t bother with JavaWS and manually downloaded the necessary files so that the client could be started entirely locally: https://nullroute.eu.org/tmp/2020/iDRAC6.zip

    Also I seem to remember that modern-ish TLS support (which the JRE migth need) was also added in a firmware update. https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=kpccc

    As for SSH serial port access, I’ve found that IPMI v2.0 Serial-over-LAN (either ‘ipmiconsole’ or ‘ipmitool sol’) works more reliably and has never caused any “stuck” console ports so far. It’s… decently safe as long as you turn off the legacy IPMI v1.5 auth types (MD2/MD5) and the majority of RMCP ciphersuites (keeping just ‘3’ for AES-CBC-128). You can also configure a preshared encryption key (RMCP+ key) through the GUI.