Dell T320 H310 RAID and IT Mode

The Problem

Just over 2 years ago my Dell T320 server had a motherboard failure [1]. I recently bought another T320 that had been gutted (no drives, PSUs, or RAM) and put the bits from my one in it.

I installed Debian and the resulting installation wouldn’t boot, I tried installing with both UEFI and BIOS modes with the same result. Then I realised that the disks I had installed were available even though I hadn’t gone through the RAID configuration (I usually make a separate RAID-0 for each disk to work best with BTRFS or ZFS). I tried changing the BIOS setting for SATA disks between “RAID” and “AHCI” modes which didn’t change things and realised that the BIOS setting in question probably applies to the SATA connector on the motherboard and that the RAID card was in “IT” mode which means that each disk is seen separately.

If you are using ZFS or BTRFS you don’t want to use a RAID-1, RAID-5, or RAID-6 on the hardware RAID controller, if there are different versions of the data on disks in the stripe then you want the filesystem to be able to work out which one is correct. To use “IT” mode you have to flash a different unsupported firmware on the RAID controller and then you either have to go to some extra effort to make it bootable or have a different device to boot from.

The Root Causes

Dell has no reason to support unusual firmware on their RAID controllers. Installing different firmware on a device that is designed for high availability is going to have some probability of data loss and perhaps more importantly for Dell some probability of customers returning hardware during the support period and acting innocent about why it doesn’t work. Dell has a great financial incentive to make it difficult to install Dell firmware on LSI cards from other vendors which have equivalent hardware as they don’t want customers to get all the benefits of iDRAC integration etc without paying the Dell price premium.

All the other vendors have similar financial incentives so there is no official documentation or support on converting between different firmware images. Dell’s support for upgrading the Dell version is pretty good, but it aborts if it sees something different.

The Attempts

I tried following the instructions in this document to flash back to Dell firmware [2]. This document is about the H310 RAID card in my Dell T320 AKA a “LSI SAS 9211-8i”. The sas2flash.efi program didn’t seem to do anything, it returned immediately and didn’t give an error message.

This page gives a start of how to get inside the Dell firmware package but doesn’t work [3]. It didn’t cover the case where sasdupie aborts with an error because it detects the current version as “00.00.00.00” not something that the upgrade program is prepared to upgrade from. But it’s a place to start looking for someone who wants to try harder at this.

This forum post has some interesting information, I gave up before trying it, but it may be useful for someone else [4].

The Solution

Dell tower servers have as a standard feature an internal USB port for a boot device. So I created a boot image on a spare USB stick and installed it there and it then loads the kernel and mounts the filesystem from a SATA hard drive. Once I got that working everything was fine. The Debian/Trixie installer would probably have allowed me to install an EFI device on the internal USB stick as part of the install if I had known what was going to happen.

The system is now fully working and ready to sell. Now I just need to find someone who wants “IT” mode on the RAID controller and hopefully is willing to pay extra for it.

Whatever I sell the system for it seems unlikely to cover the hours I spent working on this. But I learned some interesting things about RAID firmware and hopefully this blog post will be useful to other people, even if only to discourage them from trying to change firmware.

Service Setup Difficulties

Marco wrote a blog post opposing hyperscale systems which included “We want to use an hyperscaler cloud because our developers do not want to operate a scalable and redundant database just means that you need to hire competent developers and/or system administrators.” [1].

I previously wrote a blog post Why Clusters Usually Don’t Work [2] and I believe that all the points there are valid today – and possibly exacerbated by clusters getting less direct use as clustering is increasingly being done by hyperscale providers.

Take a basic need, a MySQL or PostgreSQL database for example. You want it to run and basically do the job and to have good recovery options. You could set it up locally, run backups, test the backups, have a recovery plan for failures, maybe have a hot-spare server if it’s really important, have tests for backups and hot-spare server, etc. Then you could have documentation for this so if the person who set it up isn’t available when there’s a problem they will be able to find out what to do. But the hyperscale option is to just select a database in your provider and have all this just work. If the person who set it up isn’t available for recovery in the event of failure the company can just put out a job advert for “person with experience on cloud company X” and have them just immediately go to work on it.

I don’t like hyperscale providers as they are all monopolistic companies that do anti-competitive actions. Google should be broken up, Android development and the Play Store should be separated from Gmail etc which should be separated from search and adverts, and all of them should be separated from the GCP cloud service. Amazon should be broken up, running the Amazon store should be separated from selling items on the store, which should be separated from running a video on demand platform, and all of them should be separated from the AWS cloud. Microsoft should be broken up, OS development should be separated from application development all of that should be separated from cloud services (Teams and Office 365), and everything else should be separate from the Azure cloud system.

But the cloud providers offer real benefits at small scale. Running a MySQL or PostgreSQL database for local services is easy, it’s a simple apt command to install it and then it basically works. Doing backup and recovery isn’t so easy. One could say “just hire competent people” but if you do hire competent people do you want them running MySQL databases etc or have them just click on the “create mysql database” option on a cloud control panel and then move on to more important things?

The FreedomBox project is a great project for installing and managing home/personal services [3]. But it’s not about running things like database servers, it’s for a high level running mail servers and other things for the user not for the developer.

The Debian packaging of Open Stack looks interesting [4], it’s a complete setup for running your own hyper scale cloud service. For medium and large organisations running Open Stack could be a good approach. But for small organisations it’s cheaper and easier to just use a cloud service to run things.

The issue of when to run things in-house and when to put them in the cloud is very complex. I think that if the organisation is going to spend less money on cloud services than on the salary of one sysadmin then it’s probably best to have things in the cloud. When cloud costs start to exceed the salary of one person who manages systems then having them spend the extra time and effort to run things locally starts making more sense. There is also an opportunity cost in having a good sysadmin work on the backups for all the different systems instead of letting the cloud provider just do it. Another possibility of course is to run things in-house on low end hardware and just deal with the occasional downtime to save money. Knowingly choosing less reliability to save money can be quite reasonable as long as you have considered the options and all the responsible people are involved in the discussion.

The one situation that I strongly oppose is having hyper scale services setup by people who don’t understand them. Running a database server on a cloud service because you don’t want to spend the time managing it is a reasonable choice in many situations. Running a database server on a cloud service because you don’t understand how to setup a database server is never a good choice. While the cloud services are quite resilient there are still ways of breaking the overall system if you don’t understand it. Also while it is quite possible for someone to know how to develop for databases including avoiding SQL injection etc but be unable to setup a database server that’s probably not going to be common, probably if someone can’t set it up (a generally easy task) then they can’t do the hard tasks of making it secure.

2

SSD Endurance

I previously wrote about the issue of swap potentially breaking SSD [1]. My conclusion was that swap wouldn’t be a problem as no normally operating systems that I run had swap using any significant fraction of total disk writes. In that post the most writes I could see was 128GB written per day on a 120G Intel SSD (writing the entire device once a day).

My post about swap and SSD was based on the assumption that you could get many thousands of writes to the entire device which was incorrect. Here’s a background on the terminology from WD [2]. So in the case of the 120G Intel SSD I was doing over 1 DWPD (Drive Writes Per Day) which is in the middle of the range of SSD capability, Intel doesn’t specify the DWPD or TBW (Tera Bytes Written) for that device.

The most expensive and high end NVMe device sold by my local computer store is the Samsung 980 Pro which has a warranty of 150TBW for the 250G device and 600TBW for the 1TB device [3]. That means that the system which used to have an Intel SSD would have exceeded the warranty in 3 years if it had a 250G device.

My current workstation has been up for just over 7 days and has averaged 110GB written per day. It has some light VM use and the occasional kernel compile, a fairly typical developer workstation. It’s storage is 2*Crucial 1TB NVMe devices in a BTRFS RAID-1, the NVMe devices are the old series of Crucial ones and are rated for 200TBW which means that they can be expected to last for 5 years under the current load. This isn’t a real problem for me as the performance of those devices is lower than I hoped for so I will buy faster ones before they are 5yo anyway.

My home server (and my wife’s workstation) is averaging 325GB per day on the SSDs used for the RAID-1 BTRFS filesystem for root and for most data that is written much (including VMs). The SSDs are 500G Samsung 850 EVOs [4] which are rated at 150TBW which means just over a year of expected lifetime. The SSDs are much more than a year old, I think Samsung stopped selling them more than a year ago. Between the 2 SSDs SMART reports 18 uncorrectable errors and “btrfs device stats” reports 55 errors on one of them. I’m not about to immediately replace them, but it appears that they are well past their prime.

The server which runs my blog (among many other things) is averaging over 1TB written per day. It currently has a RAID-1 of hard drives for all storage but it’s previous incarnation (which probably had about the same amount of writes) had a RAID-1 of “enterprise” SSDs for the most written data. After a few years of running like that (and some time running with someone else’s load before it) the SSDs became extremely slow (sustained writes of 15MB/s) and started getting errors. So that’s a pair of SSDs that were burned out.

Conclusion

The amounts of data being written are steadily increasing. Recent machines with more RAM can decrease storage usage in some situations but that doesn’t compare to the increased use of checksummed and logged filesystems, VMs, databases for local storage, and other things that multiply writes. The amount of writes allowed under warranty isn’t increasing much and there are new technologies for larger SSD storage that decrease the DWPD rating of the underlying hardware.

For the systems I own it seems that they are all going to exceed the rated TBW for the SSDs before I have other reasons to replace them, and they aren’t particularly high usage systems. A mail server for a large number of users would hit it much earlier.

RAID of SSDs is a really good thing. Replacement of SSDs is something that should be planned for and a way of swapping SSDs to less important uses is also good (my parents have some SSDs that are too small for my current use but which work well for them). Another thing to consider is that if you have a server with spare drive bays you could put some extra SSDs in to spread the wear among a larger RAID-10 array. Instead of having a 2*SSD BTRFS RAID-1 for a server you could have 6*SSD to get a 3* longer lifetime than a regular RAID-1 before the SSDs wear out (BTRFS supports this sort of thing).

Based on these calculations and the small number of errors I’ve seen on my home server I’ll add a 480G SSD I have lying around to the array to spread the load and keep it running for a while longer.

8

Some Ideas About Storage Reliability

Hard Drive Brands

When people ask for advice about what storage to use they often get answers like “use brand X, it works well for me and brand Y had a heap of returns a few years ago”. I’m not convinced there is any difference between the small number of manufacturers that are still in business.

One problem we face with reliability of computer systems is that the rate of change is significant, so every year there will be new technological developments to improve things and every company will take advantage of them. Storage devices are unique among computer parts for their requirement for long-term reliability. For most other parts in a computer system a fault that involves total failure is usually easy to fix and even a fault that causes unreliable operation usually won’t spread it’s damage too far before being noticed (except in corner cases like RAM corruption causing corrupted data on disk).

Every year each manufacturer will bring out newer disks that are bigger, cheaper, faster, or all three. Those disks will be expected to remain in service for 3 years in most cases, and for consumer disks often 5 years or more. The manufacturers can’t test the new storage technology for even 3 years before releasing it so their ability to prove the reliability is limited. Maybe you could buy some 8TB disks now that were manufactured to the same design as used 3 years ago, but if you buy 12TB consumer grade disks, the 20TB+ data center disks, or any other device that is pushing the limits of new technology then you know that the manufacturer never tested it running for as long as you plan to run it. Generally the engineering is done well and they don’t have many problems in the field. Sometimes a new range of disks has a significant number of defects, but that doesn’t mean the next series of disks from the same manufacturer will have problems.

The issues with SSDs are similar to the issues with hard drives but a little different. I’m not sure how much of the improvements in SSDs recently have been due to new technology and how much is due to new manufacturing processes. I had a bad experience with a nameless brand SSD a couple of years ago and now stick to the better known brands. So for SSDs I don’t expect a great quality difference between devices that have the names of major computer companies on them, but stuff that comes from China with the name of the discount web store stamped on it is always a risk.

Hard Drive vs SSD

A few years ago some people were still avoiding SSDs due to the perceived risk of new technology. The first problem with this is that hard drives have lots of new technology in them. The next issue is that hard drives often have some sort of flash storage built in, presumably a “SSHD” or “Hybrid Drive” gets all the potential failures of hard drives and SSDs.

One theoretical issue with SSDs is that filesystems have been (in theory at least) designed to cope with hard drive failure modes not SSD failure modes. The problem with that theory is that most filesystems don’t cope with data corruption at all. If you want to avoid losing data when a disk returns bad data and claims it to be good then you need to use ZFS, BTRFS, the NetApp WAFL filesystem, Microsoft ReFS (with the optional file data checksum feature enabled), or Hammer2 (which wasn’t production ready last time I tested it).

Some people are concerned that their filesystem won’t support “wear levelling” for SSD use. When a flash storage device is exposed to the OS via a block interface like SATA there isn’t much possibility of wear levelling. If flash storage exposes that level of hardware detail to the OS then you need a filesystem like JFFS2 to use it. I believe that most SSDs have something like JFFS2 inside the firmware and use it to expose what looks like a regular block device.

Another common concern about SSD is that it will wear out from too many writes. Lots of people are using SSD for the ZIL (ZFS Intent Log) on the ZFS filesystem, that means that SSD devices become the write bottleneck for the system and in some cases are run that way 24*7. If there was a problem with SSDs wearing out I expect that ZFS users would be complaining about it. Back in 2014 I wrote a blog post about whether swap would break SSD [1] (conclusion – it won’t). Apart from the nameless brand SSD I mentioned previously all of my SSDs in question are still in service. I have recently had a single Samsung 500G SSD give me 25 read errors (which BTRFS recovered from the other Samsung SSD in the RAID-1), I have yet to determine if this is an ongoing issue with the SSD in question or a transient thing. I also had a 256G SSD in a Hetzner DC give 23 read errors a few months after it gave a SMART alert about “Wear_Leveling_Count” (old age).

Hard drives have moving parts and are therefore inherently more susceptible to vibration than SSDs, they are also more likely to cause vibration related problems in other disks. I will probably write a future blog post about disks that work in small arrays but not in big arrays.

My personal experience is that SSDs are at least as reliable as hard drives even when run in situations where vibration and heat aren’t issues. Vibration or a warm environment can cause data loss from hard drives in situations where SSDs will work reliably.

NVMe

I think that NVMe isn’t very different from other SSDs in terms of the actual storage. But the different interface gives some interesting possibilities for data loss. OS, filesystem, and motherboard bugs are all potential causes of data loss when using a newer technology.

Future Technology

The latest thing for high end servers is Optane Persistent memory [2] also known as DCPMM. This is NVRAM that fits in a regular DDR4 DIMM socket that gives performance somewhere between NVMe and RAM and capacity similar to NVMe. One of the ways of using this is “Memory Mode” where the DCPMM is seen by the OS as RAM and the actual RAM caches the DCPMM (essentially this is swap space at the hardware level), this could make multiple terabytes of “RAM” not ridiculously expensive. Another way of using it is “App Direct Mode” where the DCPMM can either be a simulated block device for regular filesystems or a byte addressable device for application use. The final option is “Mixed Memory Mode” which has some DCPMM in “Memory Mode” and some in “App Direct Mode”.

This has much potential for use of backups and to make things extra exciting “App Direct Mode” has RAID-0 but no other form of RAID.

Conclusion

I think that the best things to do for storage reliability are to have ECC RAM to avoid corruption before the data gets written, use reasonable quality hardware (buy stuff with a brand that someone will want to protect), and avoid new technology. New hardware and new software needed to talk to new hardware interfaces will have bugs and sometimes those bugs will lose data.

Filesystems like BTRFS and ZFS are needed to cope with storage devices returning bad data and claiming it to be good, this is a very common failure mode.

Backups are a good thing.

SMART and SSDs

The Hetzner server that hosts my blog among other things has 2*256G SSDs for the root filesystem. The smartctl and smartd programs report both SSDs as in FAILING_NOW state for the Wear_Leveling_Count attribute. I don’t have a lot of faith in SMART. I run it because it would be stupid not to consider data about possible drive problems, but don’t feel obliged to immediately replace disks with SMART errors when not convenient (if I ordered a new server and got those results I would demand replacement before going live).

Doing any sort of SMART scan will cause service outage. Replacing devices means 2 outages, 1 for each device.

I noticed the SMART errors 2 weeks ago, so I guess that the SMART claims that both of the drives are likely to fail within 24 hours have been disproved. The system is running BTRFS so I know there aren’t any unseen data corruption issues and it uses BTRFS RAID-1 so if one disk has an unreadable sector that won’t cause data loss.

Currently Hetzner Server Bidding has ridiculous offerings for systems with SSD storage. Search for a server with 16G of RAM and SSD storage and the minimum prices are only 2E cheaper than a new server with 64G of RAM and 2*512G NVMe. In the past Server Bidding has had servers with specs not much smaller than the newest systems going for rates well below the costs of the newer systems. The current Hetzner server is under a contract from Server Bidding which is significantly cheaper than the current Server Bidding offerings, so financially it wouldn’t be a good plan to replace the server now.

Monitoring

I have just released a new version of etbe-mon [1] which has a new monitor for SMART data (smartctl). It also has a change to the sslcert check to search all IPv6 and IPv4 addresses for each hostname, makes freespace check look for filesystem mountpoint, and makes the smtpswaks check use latest swaks command-line.

For the new smartctl check there is an option to treat “Marginal” alert status from smartctl as errors and there is an option to name attributes that will be treated as marginal even if smartctl thinks they are significant. So now I have my monitoring system checking the SMART data on the servers of mine which have real hard drives (not VMs) and aren’t using RAID hardware that obscures such things. Also it’s not alerting me about the Wear_Leveling_Count on that particular Hetzner server.

4

Load Average Monitoring

For my ETBE-Mon [1] monitoring system I recently added a monitor for the Linux load average. The Unix load average isn’t a very good metric for monitoring system load, but it’s well known and easy to use. I’ve previously written about the Linux load average and how it’s apparently different from other Unix like OSs [2]. The monitor is still named loadavg but I’ve now made it also monitor on the usage of memory because excessive memory use and load average are often correlated.

For issues that might be transient it’s good to have a monitoring system give a reasonable amount of information about the problem so it can be diagnosed later on. So when the load average monitor gives an alert I have it display a list of D state processes (if any), a list of the top 10 processes using the most CPU time if they are using more than 5%, and a list of the top 10 processes using the most RAM if they are using more than 2% total virtual memory.

For documenting the output of the free(1) command (or /proc/meminfo when writing a program to do it) the best page I found was this StackExchange page [3]. So I compare MemAvailable+SwapFree to MemTotal+SwapTotal to determine the percentage of virtual memory used.

Any suggestions on how I could improve this?

The code is in the recent releases of etbemon, it’s in Debian/Unstable, on the project page on my site, and here’s a link to the loadave.monitor script in the Debian Salsa Git repository [4].

Process Monitoring

Since forking the Mon project to etbemon [1] I’ve been spending a lot of time working on the monitor scripts. Actually monitoring something is usually quite easy, deciding what to monitor tends to be the hard part. The process monitoring script ps.monitor is the one I’m about to redesign.

Here are some of my ideas for monitoring processes. Please comment if you have any suggestions for how do do things better.

For people who don’t use mon, the monitor scripts return 0 if everything is OK and 1 if there’s a problem along with using stdout to display an error message. While I’m not aware of anyone hooking mon scripts into a different monitoring system that’s going to be easy to do. One thing I plan to work on in the future is interoperability between mon and other systems such as Nagios.

Basic Monitoring

ps.monitor tor:1-1 master:1-2 auditd:1-1 cron:1-5 rsyslogd:1-1 dbus-daemon:1- sshd:1- watchdog:1-2

I’m currently planning some sort of rewrite of the process monitoring script. The current functionality is to have a list of process names on the command line with minimum and maximum numbers for the instances of the process in question. The above is a sample of the configuration of the monitor. There are some limitations to this, the “master” process in this instance refers to the main process of Postfix, but other daemons use the same process name (it’s one of those names that’s wrong because it’s so obvious). One obvious solution to this is to give the option of specifying the full path so that /usr/lib/postfix/sbin/master can be differentiated from all the other programs named master.

The next issue is processes that may run on behalf of multiple users. With sshd there is a single process to accept new connections running as root and a process running under the UID of each logged in user. So the number of sshd processes running as root will be one greater than the number of root login sessions. This means that if a sysadmin logs in directly as root via ssh (which is controversial and not the topic of this post – merely something that people do which I have to support) and the master process then crashes (or the sysadmin stops it either accidentally or deliberately) there won’t be an alert about the missing process. Of course the correct thing to do is to have a monitor talk to port 22 and look for the string “SSH-2.0-OpenSSH_”. Sometimes there are multiple instances of a daemon running under different UIDs that need to be monitored separately. So obviously we need the ability to monitor processes by UID.

In many cases process monitoring can be replaced by monitoring of service ports. So if something is listening on port 25 then it probably means that the Postfix “master” process is running regardless of what other “master” processes there are. But for my use I find it handy to have multiple monitors, if I get a Jabber message about being unable to send mail to a server immediately followed by a Jabber message from that server saying that “master” isn’t running I don’t need to fully wake up to know where the problem is.

SE Linux

One feature that I want is monitoring SE Linux contexts of processes in the same way as monitoring UIDs. While I’m not interested in writing tests for other security systems I would be happy to include code that other people write. So whatever I do I want to make it flexible enough to work with multiple security systems.

Transient Processes

Most daemons have a second process of the same name running during the startup process. This means if you monitor for exactly 1 instance of a process you may get an alert about 2 processes running when “logrotate” or something similar restarts the daemon. Also you may get an alert about 0 instances if the check happens to run at exactly the wrong time during the restart. My current way of dealing with this on my servers is to not alert until the second failure event with the “alertafter 2” directive. The “failure_interval” directive allows specifying the time between checks when the monitor is in a failed state, setting that to a low value means that waiting for a second failure result doesn’t delay the notification much.

To deal with this I’ve been thinking of making the ps.monitor script automatically check again after a specified delay. I think that solving the problem with a single parameter to the monitor script is better than using 2 configuration directives to mon to work around it.

CPU Use

Mon currently has a loadavg.monitor script that to check the load average. But that won’t catch the case of a single process using too much CPU time but not enough to raise the system load average. Also it won’t catch the case of a CPU hungry process going quiet (EG when the SETI at Home server goes down) while another process goes into an infinite loop. One way of addressing this would be to have the ps.monitor script have yet another configuration option to monitor CPU use, but this might get confusing. Another option would be to have a separate script that alerts on any process that uses more than a specified percentage of CPU time over it’s lifetime or over the last few seconds unless it’s in a whitelist of processes and users who are exempt from such checks. Probably every regular user would be exempt from such checks because you never know when they will run a file compression program. Also there is a short list of daemons that are excluded (like BOINC) and system processes (like gzip which is run from several cron jobs).

Monitoring for Exclusion

A common programming mistake is to call setuid() before setgid() which means that the program doesn’t have permission to call setgid(). If return codes aren’t checked (and people who make such rookie mistakes tend not to check return codes) then the process keeps elevated permissions. Checking for processes running as GID 0 but not UID 0 would be handy. As an aside a quick examination of a Debian/Testing workstation didn’t show any obvious way that a process with GID 0 could gain elevated privileges, but that could change with one chmod 770 command.

On a SE Linux system there should be only one process running with the domain init_t. Currently that doesn’t happen in Stretch systems running daemons such as mysqld and tor due to policy not matching the recent functionality of systemd as requested by daemon service files. Such issues will keep occurring so we need automated tests for them.

Automated tests for configuration errors that might impact system security is a bigger issue, I’ll probably write a separate blog post about it.

1

Apache Mesos on Debian

I decided to try packaging Mesos for Debian/Stretch. I had a spare system with a i7-930 CPU, 48G of RAM, and SSDs to use for building. The i7-930 isn’t really fast by today’s standards, but 48G of RAM and SSD storage mean that overall it’s a decent build system – faster than most systems I run (for myself and for clients) and probably faster than most systems used by Debian Developers for build purposes.

There’s a github issue about the lack of an upstream package for Debian/Stretch [1]. That upstream issue could probably be worked around by adding Jessie sources to the APT sources.list file, but a package for Stretch is what is needed anyway.

Here is the documentation on building for Debian [2]. The list of packages it gives as build dependencies is incomplete, it also needs zlib1g-dev libapr1-dev libcurl4-nss-dev openjdk-8-jdk maven libsasl2-dev libsvn-dev. So BUILDING this software requires Java + Maven, Ruby, and Python along with autoconf, libtool, and all the usual Unix build tools. It also requires the FPM (Fucking Package Management) tool, I take the choice of name as an indication of the professionalism of the author.

Building the software on my i7 system took 79 minutes which includes 76 minutes of CPU time (I didn’t use the -j option to make). At the end of the build it turned out that I had mistakenly failed to install the Fucking Package Management “gem” and it aborted. At this stage I gave up on Mesos, the pain involved exceeds my interest in trying it out.

How to do it Better

One of the aims of Free Software is that bugs are more likely to get solved if many people look at them. There aren’t many people who will devote 76 minutes of CPU time on a moderately fast system to investigate a single bug. To deal with this software should be prepared as components. An example of this is the SE Linux project which has 13 source modules in the latest release [3]. Of those 13 only 5 are really required. So anyone who wants to start on SE Linux from source (without considering a distribution like Debian or Fedora that has it packaged) can build the 5 most important ones. Also anyone who has an issue with SE Linux on their system can find the one source package that is relevant and study it with a short compile time. As an aside I’ve been working on SE Linux since long before it was split into so many separate source packages and know the code well, but I still find the separation convenient – I rarely need to work on more than a small subset of the code at one time.

The requirement of Java, Ruby, and Python to build Mesos could be partly due to language interfaces to call Mesos interfaces from Ruby and Python. Ohe solution to that is to have the C libraries and header files to call Mesos and have separate packages that depend on those libraries and headers to provide the bindings for other languages. Another solution is to have autoconf detect that some languages aren’t installed and just not try to compile bindings for them (this is one of the purposes of autoconf).

The use of a tool like Fucking Package Management means that you don’t get help from experts in the various distributions in making better packages. When there is a FOSS project with a debian subdirectory that makes barely functional packages then you will be likely to have an experienced Debian Developer offer a patch to improve it (I’ve offered patches for such things on many occasions). When there is a FOSS project that uses a tool that is never used by Debian developers (or developers of Fedora and other distributions) then the only patches you will get will be from inexperienced people.

A software build process should not download anything from the Internet. The source archive should contain everything that is needed and there should be dependencies for external software. Any downloads from the Internet need to be protected from MITM attacks which means that a responsible software developer has to read through the build system and make sure that appropriate PGP signature checks etc are performed. It could be that the files that the Mesos build downloaded from the Apache site had appropriate PGP checks performed – but it would take me extra time and effort to verify this and I can’t distribute software without being sure of this. Also reproducible builds are one of the latest things we aim for in the Debian project, this means we can’t just download files from web sites because the next build might get a different version.

Finally the fpm (Fucking Package Management) tool is a Ruby Gem that has to be installed with the “gem install” command. Any time you specify a gem install command you should include the -v option to ensure that everyone is using the same version of that gem, otherwise there is no guarantee that people who follow your documentation will get the same results. Also a quick Google search didn’t indicate whether gem install checks PGP keys or verifies data integrity in other ways. If I’m going to compile software for other people to use I’m concerned about getting unexpected results with such things. A Google search indicates that Ruby people were worried about such things in 2013 but doesn’t indicate whether they solved the problem properly.

3

Monitoring of Monitoring

I was recently asked to get data from a computer that controlled security cameras after a crime had been committed. Due to the potential issues I refused to collect the computer and insisted on performing the work at the office of the company in question. Hard drives are vulnerable to damage from vibration and there is always a risk involved in moving hard drives or systems containing them. A hard drive with evidence of a crime provides additional potential complications. So I wanted to stay within view of the man who commissioned the work just so there could be no misunderstanding.

The system had a single IDE disk. The fact that it had an IDE disk is an indication of the age of the system. One of the benefits of SATA over IDE is that swapping disks is much easier, SATA is designed for hot-swap and even systems that don’t support hot-swap will have less risk of mechanical damage when changing disks if SATA is used instead of IDE. For an appliance type system where a disk might be expected to be changed by someone who’s not a sysadmin SATA provides more benefits over IDE than for some other use cases.

I connected the IDE disk to a USB-IDE device so I could read it from my laptop. But the disk just made repeated buzzing sounds while failing to spin up. This is an indication that the drive was probably experiencing “stiction” which is where the heads stick to the platters and the drive motor isn’t strong enough to pull them off. In some cases hitting a drive will get it working again, but I’m certainly not going to hit a drive that might be subject to legal action! I recommended referring the drive to a data recovery company.

The probability of getting useful data from the disk in question seems very low. It could be that the drive had stiction for months or years. If the drive is recovered it might turn out to have data from years ago and not the recent data that is desired. It is possible that the drive only got stiction after being turned off, but I’ll probably never know.

Doing it Properly

Ever since RAID was introduced there was never an excuse for having a single disk on it’s own with important data. Linux Software RAID didn’t support online rebuild when 10G was a large disk. But since the late 90’s it has worked well and there’s no reason not to use it. The probability of a single IDE disk surviving long enough on it’s own to capture useful security data is not particularly good.

Even with 2 disks in a RAID-1 configuration there is a chance of data loss. Many years ago I ran a server at my parents’ house with 2 disks in a RAID-1 and both disks had errors on one hot summer. I wrote a program that’s like ddrescue but which would read from the second disk if the first gave a read error and ended up not losing any important data AFAIK. BTRFS has some potential benefits for recovering from such situations but I don’t recommend deploying BTRFS in embedded systems any time soon.

Monitoring is a requirement for reliable operation. For desktop systems you can get by without specific monitoring, but that is because you are effectively relying on the user monitoring it themself. Since I started using mon (which is very easy to setup) I’ve had it notify me of some problems with my laptop that I wouldn’t have otherwise noticed. I think that ideally for desktop systems you should have monitoring of disk space, temperature, and certain critical daemons that need to be running but which the user wouldn’t immediately notice if they crashed (such as cron and syslogd).

There are some companies that provide 3G SIMs for embedded/IoT applications with rates that are significantly cheaper than any of the usual phone/tablet plans if you use small amounts of data or SMS. For a reliable CCTV system the best thing to do would be to have a monitoring contract and have the monitoring system trigger an event if there’s a problem with the hard drive etc and also if the system fails to send a “I’m OK” message for a certain period of time.

I don’t know if people are selling CCTV systems without monitoring to compete on price or if companies are cancelling monitoring contracts to save money. But whichever is happening it’s significantly reducing the value derived from monitoring.

4

Basics of Backups

I’ve recently had some discussions about backups with people who aren’t computer experts, so I decided to blog about this for the benefit of everyone. Note that this post will deliberately avoid issues that require great knowledge of computers. I have written other posts that will benefit experts.

Essential Requirements

Everything that matters must be stored in at least 3 places. Every storage device will die eventually. Every backup will die eventually. If you have 2 backups then you are covered for the primary storage failing and the first backup failing. Note that I’m not saying “only have 2 backups” (I have many more) but 2 is the bare minimum.

Backups must be in multiple places. One way of losing data is if your house burns down, if that happens all backup devices stored there will be destroyed. You must have backups off-site. A good option is to have backup devices stored by trusted people (friends and relatives are often good options).

It must not be possible for one event to wipe out all backups. Some people use “cloud” backups, there are many ways of doing this with Dropbox, Google Drive, etc. Some of these even have free options for small amounts of storage, for example Google Drive appears to have 15G of free storage which is more than enough for all your best photos and all your financial records. The downside to cloud backups is that a computer criminal who gets access to your PC can wipe it and the backups. Cloud backup can be a part of a sensible backup strategy but it can’t be relied on (also see the paragraph about having at least 2 backups).

Backup Devices

USB flash “sticks” are cheap and easy to use. The quality of some of those devices isn’t too good, but the low price and small size means that you can buy more of them. It would be quite easy to buy 10 USB sticks for multiple copies of data.

Stores that sell office-supplies sell USB attached hard drives which are quite affordable now. It’s easy to buy a couple of those for backup use.

The cheapest option for backing up moderate amounts of data is to get a USB-SATA device. This connects to the PC by USB and has a cradle to accept a SATA hard drive. That allows you to buy cheap SATA disks for backups and even use older disks as backups.

With choosing backup devices consider the environment that they will be stored in. If you want to store a backup in the glove box of your car (which could be good when travelling) then a SD card or USB flash device would be a good choice because they are resistant to physical damage. Note that if you have no other options for off-site storage then the glove box of your car will probably survive if your house burns down.

Multiple Backups

It’s not uncommon for data corruption or mistakes to be discovered some time after it happens. Also in recent times there is a variety of malware that encrypts files and then demands a ransom payment for the decryption key.

To address these problems you should have older backups stored. It’s not uncommon in a corporate environment to have backups every day stored for a week, backups every week stored for a month, and monthly backups stored for some years.

For a home use scenario it’s more common to make backups every week or so and take backups to store off-site when it’s convenient.

Offsite Backups

One common form of off-site backup is to store backup devices at work. If you work in an office then you will probably have some space in a desk drawer for personal items. If you don’t work in an office but have a locker at work then that’s good for storage too, if there is high humidity then SD cards will survive better than hard drives. Make sure that you encrypt all data you store in such places or make sure that it’s not the secret data!

Banks have a variety of ways of storing items. Bank safe deposit boxes can be used for anything that fits and can fit hard drives. If you have a mortgage your bank might give you free storage of “papers” as part of the service (Commonwealth Bank of Australia used to offer that). A few USB sticks or SD cards in an envelope could fit the “papers” criteria. An accounting firm may also store documents for free for you.

If you put a backup on USB or SD storage in your waller then that can also be a good offsite backup. For most people losing data from disk is more common than losing their wallet.

A modern mobile phone can also be used for backing up data while travelling. For a few years I’ve been doing that. But note that you have to encrypt all data stored on a phone so an attacker who compromises your phone can’t steal it. In a typical phone configuration the mass storage area is much less protected than application data. Also note that customs and border control agents for some countries can compel you to provide the keys for encrypted data.

A friend suggested burying a backup device in a sealed plastic container filled with dessicant. That would survive your house burning down and in theory should work. I don’t know of anyone who’s tried it.

Testing

On occasion you should try to read the data from your backups and compare it to the original data. It sometimes happens that backups are discovered to be useless after years of operation.

Secret Data

Before starting a backup it’s worth considering which of the data is secret and which isn’t. Data that is secret needs to be treated differently and a mixture of secret and less secret data needs to be treated as if it’s all secret.

One category of secret data is financial data. If your accountant provides document storage then they can store that, generally your accountant will have all of your secret financial data anyway.

Passwords need to be kept secret but they are also very small. So making a written or printed copy of the passwords is part of a good backup strategy. There are options for backing up paper that don’t apply to data.

One category of data that is not secret is photos. Photos of holidays, friends, etc are generally not that secret and they can also comprise a large portion of the data volume that needs to be backed up. Apparently some people have a backup strategy for such photos that involves downloading from Facebook to restore, that will help with some problems but it’s not adequate overall. But any data that is on Facebook isn’t that secret and can be stored off-site without encryption.

Backup Corruption

With the amounts of data that are used nowadays the probability of data corruption is increasing. If you use any compression program with the data that is backed up (even data that can’t be compressed such as JPEGs) then errors will be detected when you extract the data. So if you have backup ZIP files on 2 hard drives and one of them gets corrupt you will easily be able to determine which one has the correct data.

Failing Systems – update 2016-08-22

When a system starts to fail it may limp along for years and work reasonably well, or it may totally fail soon. At the first sign of trouble you should immediately make a full backup to separate media. Use different media to your regular backups in case the data is corrupt so you don’t overwrite good backups with bad ones.

One traditional sign of problems has been hard drives that make unusual sounds. Modern drives are fairly quiet so this might not be loud enough to notice. Another sign is hard drives that take unusually large amounts of time to read data. If a drive has some problems it might read a sector hundreds or even thousands of times until it gets the data which dramatically reduces system performance. There are lots of other performance problems that can occur (system overheating, software misconfiguration, and others), most of which are correlated with potential data loss.

A modern SSD storage device (as used in a lot of the recent laptops) doesn’t tend to go slow when it nears the end of it’s life. It is more likely to just randomly fail entirely and then work again after a reboot. There are many causes of systems randomly hanging or crashing (of which overheating is common), but they are all correlated with data loss so a good backup is a good idea.

When in doubt make a backup.

Any Suggestions?

If you have any other ideas for backups by typical home users then please leave a comment. Don’t comment on expert issues though, I have other posts for that.