Monitoring of Monitoring

I was recently asked to get data from a computer that controlled security cameras after a crime had been committed. Due to the potential issues I refused to collect the computer and insisted on performing the work at the office of the company in question. Hard drives are vulnerable to damage from vibration and there is always a risk involved in moving hard drives or systems containing them. A hard drive with evidence of a crime provides additional potential complications. So I wanted to stay within view of the man who commissioned the work just so there could be no misunderstanding.

The system had a single IDE disk. The fact that it had an IDE disk is an indication of the age of the system. One of the benefits of SATA over IDE is that swapping disks is much easier, SATA is designed for hot-swap and even systems that don’t support hot-swap will have less risk of mechanical damage when changing disks if SATA is used instead of IDE. For an appliance type system where a disk might be expected to be changed by someone who’s not a sysadmin SATA provides more benefits over IDE than for some other use cases.

I connected the IDE disk to a USB-IDE device so I could read it from my laptop. But the disk just made repeated buzzing sounds while failing to spin up. This is an indication that the drive was probably experiencing “stiction” which is where the heads stick to the platters and the drive motor isn’t strong enough to pull them off. In some cases hitting a drive will get it working again, but I’m certainly not going to hit a drive that might be subject to legal action! I recommended referring the drive to a data recovery company.

The probability of getting useful data from the disk in question seems very low. It could be that the drive had stiction for months or years. If the drive is recovered it might turn out to have data from years ago and not the recent data that is desired. It is possible that the drive only got stiction after being turned off, but I’ll probably never know.

Doing it Properly

Ever since RAID was introduced there was never an excuse for having a single disk on it’s own with important data. Linux Software RAID didn’t support online rebuild when 10G was a large disk. But since the late 90’s it has worked well and there’s no reason not to use it. The probability of a single IDE disk surviving long enough on it’s own to capture useful security data is not particularly good.

Even with 2 disks in a RAID-1 configuration there is a chance of data loss. Many years ago I ran a server at my parents’ house with 2 disks in a RAID-1 and both disks had errors on one hot summer. I wrote a program that’s like ddrescue but which would read from the second disk if the first gave a read error and ended up not losing any important data AFAIK. BTRFS has some potential benefits for recovering from such situations but I don’t recommend deploying BTRFS in embedded systems any time soon.

Monitoring is a requirement for reliable operation. For desktop systems you can get by without specific monitoring, but that is because you are effectively relying on the user monitoring it themself. Since I started using mon (which is very easy to setup) I’ve had it notify me of some problems with my laptop that I wouldn’t have otherwise noticed. I think that ideally for desktop systems you should have monitoring of disk space, temperature, and certain critical daemons that need to be running but which the user wouldn’t immediately notice if they crashed (such as cron and syslogd).

There are some companies that provide 3G SIMs for embedded/IoT applications with rates that are significantly cheaper than any of the usual phone/tablet plans if you use small amounts of data or SMS. For a reliable CCTV system the best thing to do would be to have a monitoring contract and have the monitoring system trigger an event if there’s a problem with the hard drive etc and also if the system fails to send a “I’m OK” message for a certain period of time.

I don’t know if people are selling CCTV systems without monitoring to compete on price or if companies are cancelling monitoring contracts to save money. But whichever is happening it’s significantly reducing the value derived from monitoring.

3 comments to Monitoring of Monitoring

  • Adam

    Why did they turn off the computer with the IDE drive?!?

  • Adam: They probably weren’t aware of the potential problems in turning off a long running system. That said while it’s possible that the system was working correctly until it was turned off and cooled down that seems unlikely, it probably wasn’t working for a while.

    If the system was configured to record in a loop and only had disk space for a few days then it would be important to turn it off to stop the needed data being overwritten.

  • David Zuccaro

    Very smart move Russell, decisions like that can only result from years of experience working in IT.