Ha – etbe – Russell Coker

Dell T320 H310 RAID and IT Mode

etbe — Fri, 22 Aug 2025 15:57:05 +0000

The Problem

Just over 2 years ago my Dell T320 server had a motherboard failure [1]. I recently bought another T320 that had been gutted (no drives, PSUs, or RAM) and put the bits from my one in it.

I installed Debian and the resulting installation wouldn’t boot, I tried installing with both UEFI and BIOS modes with the same result. Then I realised that the disks I had installed were available even though I hadn’t gone through the RAID configuration (I usually make a separate RAID-0 for each disk to work best with BTRFS or ZFS). I tried changing the BIOS setting for SATA disks between “RAID” and “AHCI” modes which didn’t change things and realised that the BIOS setting in question probably applies to the SATA connector on the motherboard and that the RAID card was in “IT” mode which means that each disk is seen separately.

If you are using ZFS or BTRFS you don’t want to use a RAID-1, RAID-5, or RAID-6 on the hardware RAID controller, if there are different versions of the data on disks in the stripe then you want the filesystem to be able to work out which one is correct. To use “IT” mode you have to flash a different unsupported firmware on the RAID controller and then you either have to go to some extra effort to make it bootable or have a different device to boot from.

The Root Causes

Dell has no reason to support unusual firmware on their RAID controllers. Installing different firmware on a device that is designed for high availability is going to have some probability of data loss and perhaps more importantly for Dell some probability of customers returning hardware during the support period and acting innocent about why it doesn’t work. Dell has a great financial incentive to make it difficult to install Dell firmware on LSI cards from other vendors which have equivalent hardware as they don’t want customers to get all the benefits of iDRAC integration etc without paying the Dell price premium.

All the other vendors have similar financial incentives so there is no official documentation or support on converting between different firmware images. Dell’s support for upgrading the Dell version is pretty good, but it aborts if it sees something different.

The Attempts

I tried following the instructions in this document to flash back to Dell firmware [2]. This document is about the H310 RAID card in my Dell T320 AKA a “LSI SAS 9211-8i”. The sas2flash.efi program didn’t seem to do anything, it returned immediately and didn’t give an error message.

This page gives a start of how to get inside the Dell firmware package but doesn’t work [3]. It didn’t cover the case where sasdupie aborts with an error because it detects the current version as “00.00.00.00” not something that the upgrade program is prepared to upgrade from. But it’s a place to start looking for someone who wants to try harder at this.

This forum post has some interesting information, I gave up before trying it, but it may be useful for someone else [4].

The Solution

Dell tower servers have as a standard feature an internal USB port for a boot device. So I created a boot image on a spare USB stick and installed it there and it then loads the kernel and mounts the filesystem from a SATA hard drive. Once I got that working everything was fine. The Debian/Trixie installer would probably have allowed me to install an EFI device on the internal USB stick as part of the install if I had known what was going to happen.

The system is now fully working and ready to sell. Now I just need to find someone who wants “IT” mode on the RAID controller and hopefully is willing to pay extra for it.

Whatever I sell the system for it seems unlikely to cover the hours I spent working on this. But I learned some interesting things about RAID firmware and hopefully this blog post will be useful to other people, even if only to discourage them from trying to change firmware.

Service Setup Difficulties

etbe — Fri, 30 May 2025 07:32:00 +0000

Marco wrote a blog post opposing hyperscale systems which included “We want to use an hyperscaler cloud because our developers do not want to operate a scalable and redundant database just means that you need to hire competent developers and/or system administrators.” [1].

I previously wrote a blog post Why Clusters Usually Don’t Work [2] and I believe that all the points there are valid today – and possibly exacerbated by clusters getting less direct use as clustering is increasingly being done by hyperscale providers.

Take a basic need, a MySQL or PostgreSQL database for example. You want it to run and basically do the job and to have good recovery options. You could set it up locally, run backups, test the backups, have a recovery plan for failures, maybe have a hot-spare server if it’s really important, have tests for backups and hot-spare server, etc. Then you could have documentation for this so if the person who set it up isn’t available when there’s a problem they will be able to find out what to do. But the hyperscale option is to just select a database in your provider and have all this just work. If the person who set it up isn’t available for recovery in the event of failure the company can just put out a job advert for “person with experience on cloud company X” and have them just immediately go to work on it.

I don’t like hyperscale providers as they are all monopolistic companies that do anti-competitive actions. Google should be broken up, Android development and the Play Store should be separated from Gmail etc which should be separated from search and adverts, and all of them should be separated from the GCP cloud service. Amazon should be broken up, running the Amazon store should be separated from selling items on the store, which should be separated from running a video on demand platform, and all of them should be separated from the AWS cloud. Microsoft should be broken up, OS development should be separated from application development all of that should be separated from cloud services (Teams and Office 365), and everything else should be separate from the Azure cloud system.

But the cloud providers offer real benefits at small scale. Running a MySQL or PostgreSQL database for local services is easy, it’s a simple apt command to install it and then it basically works. Doing backup and recovery isn’t so easy. One could say “just hire competent people” but if you do hire competent people do you want them running MySQL databases etc or have them just click on the “create mysql database” option on a cloud control panel and then move on to more important things?

The FreedomBox project is a great project for installing and managing home/personal services [3]. But it’s not about running things like database servers, it’s for a high level running mail servers and other things for the user not for the developer.

The Debian packaging of Open Stack looks interesting [4], it’s a complete setup for running your own hyper scale cloud service. For medium and large organisations running Open Stack could be a good approach. But for small organisations it’s cheaper and easier to just use a cloud service to run things.

The issue of when to run things in-house and when to put them in the cloud is very complex. I think that if the organisation is going to spend less money on cloud services than on the salary of one sysadmin then it’s probably best to have things in the cloud. When cloud costs start to exceed the salary of one person who manages systems then having them spend the extra time and effort to run things locally starts making more sense. There is also an opportunity cost in having a good sysadmin work on the backups for all the different systems instead of letting the cloud provider just do it. Another possibility of course is to run things in-house on low end hardware and just deal with the occasional downtime to save money. Knowingly choosing less reliability to save money can be quite reasonable as long as you have considered the options and all the responsible people are involved in the discussion.

The one situation that I strongly oppose is having hyper scale services setup by people who don’t understand them. Running a database server on a cloud service because you don’t want to spend the time managing it is a reasonable choice in many situations. Running a database server on a cloud service because you don’t understand how to setup a database server is never a good choice. While the cloud services are quite resilient there are still ways of breaking the overall system if you don’t understand it. Also while it is quite possible for someone to know how to develop for databases including avoiding SQL injection etc but be unable to setup a database server that’s probably not going to be common, probably if someone can’t set it up (a generally easy task) then they can’t do the hard tasks of making it secure.

SSD Endurance

etbe — Sun, 16 Jan 2022 05:33:04 +0000

I previously wrote about the issue of swap potentially breaking SSD [1]. My conclusion was that swap wouldn’t be a problem as no normally operating systems that I run had swap using any significant fraction of total disk writes. In that post the most writes I could see was 128GB written per day on a 120G Intel SSD (writing the entire device once a day).

My post about swap and SSD was based on the assumption that you could get many thousands of writes to the entire device which was incorrect. Here’s a background on the terminology from WD [2]. So in the case of the 120G Intel SSD I was doing over 1 DWPD (Drive Writes Per Day) which is in the middle of the range of SSD capability, Intel doesn’t specify the DWPD or TBW (Tera Bytes Written) for that device.

The most expensive and high end NVMe device sold by my local computer store is the Samsung 980 Pro which has a warranty of 150TBW for the 250G device and 600TBW for the 1TB device [3]. That means that the system which used to have an Intel SSD would have exceeded the warranty in 3 years if it had a 250G device.

My current workstation has been up for just over 7 days and has averaged 110GB written per day. It has some light VM use and the occasional kernel compile, a fairly typical developer workstation. It’s storage is 2*Crucial 1TB NVMe devices in a BTRFS RAID-1, the NVMe devices are the old series of Crucial ones and are rated for 200TBW which means that they can be expected to last for 5 years under the current load. This isn’t a real problem for me as the performance of those devices is lower than I hoped for so I will buy faster ones before they are 5yo anyway.

My home server (and my wife’s workstation) is averaging 325GB per day on the SSDs used for the RAID-1 BTRFS filesystem for root and for most data that is written much (including VMs). The SSDs are 500G Samsung 850 EVOs [4] which are rated at 150TBW which means just over a year of expected lifetime. The SSDs are much more than a year old, I think Samsung stopped selling them more than a year ago. Between the 2 SSDs SMART reports 18 uncorrectable errors and “btrfs device stats” reports 55 errors on one of them. I’m not about to immediately replace them, but it appears that they are well past their prime.

The server which runs my blog (among many other things) is averaging over 1TB written per day. It currently has a RAID-1 of hard drives for all storage but it’s previous incarnation (which probably had about the same amount of writes) had a RAID-1 of “enterprise” SSDs for the most written data. After a few years of running like that (and some time running with someone else’s load before it) the SSDs became extremely slow (sustained writes of 15MB/s) and started getting errors. So that’s a pair of SSDs that were burned out.

Conclusion

The amounts of data being written are steadily increasing. Recent machines with more RAM can decrease storage usage in some situations but that doesn’t compare to the increased use of checksummed and logged filesystems, VMs, databases for local storage, and other things that multiply writes. The amount of writes allowed under warranty isn’t increasing much and there are new technologies for larger SSD storage that decrease the DWPD rating of the underlying hardware.

For the systems I own it seems that they are all going to exceed the rated TBW for the SSDs before I have other reasons to replace them, and they aren’t particularly high usage systems. A mail server for a large number of users would hit it much earlier.

RAID of SSDs is a really good thing. Replacement of SSDs is something that should be planned for and a way of swapping SSDs to less important uses is also good (my parents have some SSDs that are too small for my current use but which work well for them). Another thing to consider is that if you have a server with spare drive bays you could put some extra SSDs in to spread the wear among a larger RAID-10 array. Instead of having a 2*SSD BTRFS RAID-1 for a server you could have 6*SSD to get a 3* longer lifetime than a regular RAID-1 before the SSDs wear out (BTRFS supports this sort of thing).

Based on these calculations and the small number of errors I’ve seen on my home server I’ll add a 480G SSD I have lying around to the array to spread the load and keep it running for a while longer.

Some Ideas About Storage Reliability

etbe — Mon, 31 May 2021 11:00:28 +0000

Hard Drive Brands

When people ask for advice about what storage to use they often get answers like “use brand X, it works well for me and brand Y had a heap of returns a few years ago”. I’m not convinced there is any difference between the small number of manufacturers that are still in business.

One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won’t spread it’s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk).

Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can’t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don’t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn’t mean the next series of disks from the same manufacturer will have problems.

The issues with SSDs are similar to the issues with hard drives but a little different. I’m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don’t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk.

Hard Drive vs SSD

A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a “SSHD” or “Hybrid Drive” gets all the potential failures of hard drives and SSDs.

One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don’t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn’t production ready last time I tested it).

Some people are concerned that their filesystem won’t support “wear levelling” for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn’t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device.

Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD [1] (conclusion – it won’t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about “Wear_Leveling_Count” (old age).

Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays.

My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren’t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably.

NVMe

I think that NVMe isn’t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology.

Future Technology

The latest thing for high end servers is Optane Persistent memory [2] also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is “Memory Mode” where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of “RAM” not ridiculously expensive. Another way of using it is “App Direct Mode” where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is “Mixed Memory Mode” which has some DCPMM in “Memory Mode” and some in “App Direct Mode”.

This has much potential for use of backups and to make things extra exciting “App Direct Mode” has RAID-0 but no other form of RAID.

Conclusion

I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data.

Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode.

Backups are a good thing.

SMART and SSDs

etbe — Sun, 20 Dec 2020 04:50:30 +0000

The Hetzner server that hosts my blog among other things has 2*256G SSDs for the root filesystem. The smartctl and smartd programs report both SSDs as in FAILING_NOW state for the Wear_Leveling_Count attribute. I don’t have a lot of faith in SMART. I run it because it would be stupid not to consider data about possible drive problems, but don’t feel obliged to immediately replace disks with SMART errors when not convenient (if I ordered a new server and got those results I would demand replacement before going live).

Doing any sort of SMART scan will cause service outage. Replacing devices means 2 outages, 1 for each device.

I noticed the SMART errors 2 weeks ago, so I guess that the SMART claims that both of the drives are likely to fail within 24 hours have been disproved. The system is running BTRFS so I know there aren’t any unseen data corruption issues and it uses BTRFS RAID-1 so if one disk has an unreadable sector that won’t cause data loss.

Currently Hetzner Server Bidding has ridiculous offerings for systems with SSD storage. Search for a server with 16G of RAM and SSD storage and the minimum prices are only 2E cheaper than a new server with 64G of RAM and 2*512G NVMe. In the past Server Bidding has had servers with specs not much smaller than the newest systems going for rates well below the costs of the newer systems. The current Hetzner server is under a contract from Server Bidding which is significantly cheaper than the current Server Bidding offerings, so financially it wouldn’t be a good plan to replace the server now.

Monitoring

I have just released a new version of etbe-mon [1] which has a new monitor for SMART data (smartctl). It also has a change to the sslcert check to search all IPv6 and IPv4 addresses for each hostname, makes freespace check look for filesystem mountpoint, and makes the smtpswaks check use latest swaks command-line.

For the new smartctl check there is an option to treat “Marginal” alert status from smartctl as errors and there is an option to name attributes that will be treated as marginal even if smartctl thinks they are significant. So now I have my monitoring system checking the SMART data on the servers of mine which have real hard drives (not VMs) and aren’t using RAID hardware that obscures such things. Also it’s not alerting me about the Wear_Leveling_Count on that particular Hetzner server.

[1] https://doc.coker.com.au/projects/etbe-mon/

Load Average Monitoring

etbe — Sun, 02 Feb 2020 09:10:46 +0000

For my ETBE-Mon [1] monitoring system I recently added a monitor for the Linux load average. The Unix load average isn’t a very good metric for monitoring system load, but it’s well known and easy to use. I’ve previously written about the Linux load average and how it’s apparently different from other Unix like OSs [2]. The monitor is still named loadavg but I’ve now made it also monitor on the usage of memory because excessive memory use and load average are often correlated.

For issues that might be transient it’s good to have a monitoring system give a reasonable amount of information about the problem so it can be diagnosed later on. So when the load average monitor gives an alert I have it display a list of D state processes (if any), a list of the top 10 processes using the most CPU time if they are using more than 5%, and a list of the top 10 processes using the most RAM if they are using more than 2% total virtual memory.

For documenting the output of the free(1) command (or /proc/meminfo when writing a program to do it) the best page I found was this StackExchange page [3]. So I compare MemAvailable+SwapFree to MemTotal+SwapTotal to determine the percentage of virtual memory used.

Any suggestions on how I could improve this?

The code is in the recent releases of etbemon, it’s in Debian/Unstable, on the project page on my site, and here’s a link to the loadave.monitor script in the Debian Salsa Git repository [4].