Archives

Categories

T320 iDRAC Failure and new HP Z640

The Dell T320

Almost 2 years ago I made a Dell PowerEdge T320 my home server [1]. It was a decent upgrade from the PowerEdge T110 II that I had used previously. One benefit of that system was that I needed more RAM and the PowerEdge T1xx series use unbuffered ECC RAM which is unreasonably expensive as well as the DIMMs tending to be smaller (no Load Reduced DIMMS) and only having 4 slots. As I had bought two T320s I put all the RAM in a single server getting a total of 96G and then put some cheap DIMMs in the other one and sold it with 48G.

The T320 has all the server reliability features including hot-swap redundant PSUs and hot-swap hard drives. One thing it doesn’t have redundancy on is the motherboard management system known as iDRAC. 3 days ago my suburb had a power outage and when power came back on the T320 gave an error message about a failure to initialise the iDRAC and put all the fans on maximum speed, which is extremely loud. When a T320 is running in a room that’s not particularly hot and it doesn’t have SAS disks it’s a very quiet server, one of the quietest I’ve ever owned. When it goes into emergency cooling mode due to iDRAC failure it’s loud enough to be heard from the other end of the house with doors closed in between.

Googling this failure gave a few possible answers. One was for some combinations of booting with the iDRAC button held down, turning off for a while and booting with the iDRAC button held down, etc (this didn’t work). One was for putting a iDRAC firmware file on the SD card so iDRAC could automatically load it (which I tested even though I didn’t have the flashing LED which indicates that it is likely to work, but it didn’t do anything). The last was to enable serial console and configure the iDRAC to load new firmware via TFTP, I didn’t get a iDRAC message from the serial console just the regular BIOS stuff.

So it looks like I’ll have to sell the T320 for parts or find someone who wants to run it in it’s current form. Currently to boot it I have to press F1 a few times to bypass BIOS messages (someone on the Internet reported making a device to key-jam F1). Then when it boots it’s unreasonably loud, but apparently if you are really keen you can buy fans that have temperature sensors to control their own speed and bypass the motherboard control.

I’d appreciate any advice on how to get this going. At this stage I’m not going to go back to it but if I can get it working properly I can sell it for a decent price.

The HP Z640

I’ve replaced the T320 with a HP Z640 workstation with 32G of RAM which I had recently bought to play with Stable Diffusion. There were hundreds of Z640 workstations with NVidia Quadro M6000 GPUs going on eBay for under $400 each, it looked like a company that did a lot of ML work had either gone bankrupt or upgraded all their employees systems. The price for the systems was surprisingly cheap, at regular eBay prices it seems that the GPU and the RAM go for about the same price as the system. It turned out that Stable Diffusion didn’t like the video card in my setup for unknown reasons but also that the E5-1650v3 CPU could render an image in 15 minutes which is fast enough to test it out but not fast enough for serious use. I had been planning to blog about that.

When I bought the T320 server the DDR3 Registered ECC RAM it uses cost about $100 for 8*8G DIMMs, with 16G DIMMs being much more expensive. Now the DDR4 Registered ECC RAM used by my Z640 goes for about $120 for 2*16G DIMMs. In the near future I’ll upgrade that system to 64G of RAM. It’s disappointing that the Z640 only has 4 DIMM sockets per CPU so if you get a single-CPU version (as I did) and don’t get the really expensive Load Reduced RAM then you are limited to 64G. So the supposed capacity benefit of going from DDR3 to DDR4 doesn’t seem to apply to this upgrade.

The Z640 I got has 4 bays for hot-swap SAS/SATA 2.5″ SSD/HDDs and 2 internal bays for 3.5″ hard drives. The T320 has 8*3.5″ hot swap bays and I had 3 hard drives in them in a BTRFS RAID-10 configuration. Currently I’ve got one hard drive attached via USB but that’s obviously not a long-term solution. The 3 hard drives are 4TB, they have worked since 4TB was a good size. I have a spare 8TB disk so I could buy a second ($179 for a shingle HDD) to make a 8TB RAID-1 array. The other option is to pay $369 for a 4TB SSD (or $389 for a 4TB NVMe + $10 for the PCIe card) to keep the 3 device RAID-10. As tempting as 4TB SSDs are I’ll probably get a cheap 8TB disk which will take capacity from 6TB to 8TB and I could use some extra 4TB disks for backups.

I haven’t played with the AMT/MEBX features on this system, I presume that they will work the same way as AMT/MEBX on the HP Z420 I’ve used previously [2].

Update:

HP has free updates for the BIOS etc available here [3]. Unfortunately it seems to require loading a kernel module supplied by HP to do this. This is a bad thing, kernel code that isn’t in the mainline kernel is either of poor quality or isn’t licensed correctly.

I had to change my monitoring system to alert on temperatures over 100% of the “high” range while on the T320 I had it set at 95% of “high” and never got warnings. This is disappointing, enterprise class gear running in a reasonably cool environment (ambient temperature of about 22C) should be able to run all CPU cores at full performance without hitting 95% of the “high” temperature level.

Comments are closed.