Donate

Categories

Advert

XHTML

Valid XHTML 1.0 Transitional

RAID Pain

One of my clients has a NAS device. Last week they tried to do what should have been a routine RAID operation, they added a new larger disk as a hot-spare and told the RAID array to replace one of the active disks with the hot-spare. The aim was to replace the disks one at a time to grow the array. But one of the other disks had an error during the rebuild and things fell apart.

I was called in after the NAS had been rebooted when it was refusing to recognise the RAID. The first thing that occurred to me is that maybe RAID-5 isn’t a good choice for the RAID. While it’s theoretically possible for a RAID rebuild to not fail in such a situation (the data that couldn’t be read from the disk with an error could have been regenerated from the disk that was being replaced) it seems that the RAID implementation in question couldn’t do it. As the NAS is running Linux I presume that at least older versions of Linux have the same problem. Of course if you have a RAID array that has 7 disks running RAID-6 with a hot-spare then you only get the capacity of 4 disks. But RAID-6 with no hot-spare should be at least as reliable as RAID-5 with a hot-spare.

Whenever you recover from disk problems the first thing you want to do is to make a read-only copy of the data. Then you can’t make things worse. This is a problem when you are dealing with 7 disks, fortunately they were only 3TB disks and only each had 2TB in use. So I found some space on a ZFS pool and bought a few 6TB disks which I formatted as BTRFS filesystems. For this task I only wanted filesystems that support snapshots so I could work on snapshots not on the original copy.

I expect that at some future time I will be called in when an array of 6+ disks of the largest available size fails. This will be a more difficult problem to solve as I don’t own any system that can handle so many disks.

I copied a few of the disks to a ZFS filesystem on a Dell PowerEdge T110 running kernel 3.2.68. Unfortunately that system seems to have a problem with USB, when copying from 4 disks at once each disk was reading about 10MB/s and when copying from 3 disks each disk was reading about 13MB/s. It seems that the system has an aggregate USB bandwidth of 40MB/s – slightly greater than USB 2.0 speed. This made the process take longer than expected.

One of the disks had a read error, this was presumably the cause of the original RAID failure. dd has the option conv=noerror to make it continue after a read error. This initially seemed good but the resulting file was smaller than the source partition. It seems that conv=noerror doesn’t seek the output file to maintain input and output alignment. If I had a hard drive filled with plain ASCII that MIGHT even be useful, but for a filesystem image it’s worse than useless. The only option was to repeatedly run dd with matching skip and seek options incrementing by 1K until it had passed the section with errors.

for n in /dev/loop[0-6] ; do echo $n ; mdadm –examine -v -v –scan $n|grep Events ; done

Once I had all the images I had to assemble them. The Linux Software RAID didn’t like the array because not all the devices had the same event count. The way Linux Software RAID (and probably most RAID implementations) work is that each member of the array has an event counter that is incremented when disks are added, removed, and when data is written. If there is an error then after a reboot only disks with matching event counts will be used. The above command shows the Events count for all the disks.

Fortunately different event numbers aren’t going to stop us. After assembling the array (which failed to run) I ran “mdadm -R /dev/md1” which kicked some members out. I then added them back manually and forced the array to run. Unfortunately attempts to write to the array failed (presumably due to mismatched event counts).

Now my next problem is that I can make a 10TB degraded RAID-5 array which is read-only but I can’t mount the XFS filesystem because XFS wants to replay the journal. So my next step is to buy another 2*6TB disks to make a RAID-0 array to contain an image of that XFS filesystem.

Finally backups are a really good thing…

Smart Phones Should Measure Charge Speed

My first mobile phone lasted for days between charges. I never really found out how long it’s battery would last because there was no way that I could use it to deplete the charge in any time that I could spend awake. Even if I had managed to run the battery out the phone was designed to accept 4*AA batteries (it’s rechargeable battery pack was exactly that size) so I could buy spare batteries at any store.

Modern phones are quite different in physical phone design (phones that weigh less than 4*AA batteries aren’t uncommon), functionality (fast CPUs and big screens suck power), and use (games really drain your phone battery). This requires much more effective chargers, when some phones are intensively used (EG playing an action game with Wifi enabled) they can’t be charged as they use more power than the plug-pack supplies. I’ve previously blogged some calculations about resistance and thickness of wires for phone chargers [1], it’s obvious that there are some technical limitations to phone charging based on the decision to use a long cable at ~5V.

My calculations about phone charge rate were based on the theoretical resistance of wires based on their estimated cross-sectional area. One problem with such analysis is that it’s difficult to determine how thick the insulation is without destroying the wire. Another problem is that after repeated use of a charging cable some conductors break due to excessive bending. This can significantly increase the resistance and therefore increase the charging time. Recently a charging cable that used to be really good suddenly became almost useless. My Galaxy Note 2 would claim that it was being charged even though the reported level of charge in the battery was not increasing, it seems that the cable only supplied enough power to keep the phone running not enough to actually charge the battery.

I recently bought a USB current measurement device which is really useful. I have used it to diagnose power supplies and USB cables that didn’t work correctly. But one significant way in which it fails is in the case of problems with the USB connector. Sometimes a cable performs differently when connected via the USB current measurement device.

The CurrentWidget program [2] on my Galaxy Note 2 told me that all of the dedicated USB chargers (the 12V one in my car and all the mains powered ones) supply 1698mA (including the ones rated at 1A) while a PC USB port supplies ~400mA. I don’t think that the Note 2 measurement is particularly reliable. On my Galaxy Note 3 it always says 0mA, I guess that feature isn’t implemented. An old Galaxy S3 reports 999mA of charging even when the USB current measurement device says ~500mA. It seems to me that method the CurrentWidget uses to get the current isn’t accurate if it even works at all.

Android 5 on the Nexus 4/5 phones will tell the amount of time until the phone is charged in some situations (on the Nexus 4 and Nexus 5 that I used for testing it didn’t always display it and I don’t know why). This is an useful but it’s still not good enough.

I think that what we need is to have the phone measure the current that’s being supplied and report it to the user. Then when a phone charges slowly because apps are using some power that won’t be mistaken for a phone charging slowly due to a defective cable or connector.

One Android Phone Per Child

I was asked for advice on whether children should have access to smart phones, it’s an issue that many people are discussing and seems worthy of a blog post.

Claimed Problems with Smart Phones

The first thing that I think people should read is this XKCD post with quotes about the demise of letter writing from 99+ years ago [1]. Given the lack of evidence cited by people who oppose phone use I think we should consider to what extent the current concerns about smart phone use are just reactions to changes in society. I’ve done some web searching for reasons that people give for opposing smart phone use by kids and addressed the issues below.

Some people claim that children shouldn’t get a phone when they are so young that it will just be a toy. That’s interesting given the dramatic increase in the amount of money spent on toys for children in recent times. It’s particularly interesting when parents buy game consoles for their children but refuse mobile phone “toys” (I know someone who did this). I think this is more of a social issue regarding what is a suitable toy than any real objection to phones used as toys. Obviously the educational potential of a mobile phone is much greater than that of a game console.

It’s often claimed that kids should spend their time reading books instead of using phones. When visiting libraries I’ve observed kids using phones to store lists of books that they want to read, this seems to discredit that theory. Also some libraries have Android and iOS apps for searching their catalogs. There are a variety of apps for reading eBooks, some of which have access to many free books but I don’t expect many people to read novels on a phone.

Cyber-bullying is the subject of a lot of anxiety in the media. At least with cyber-bullying there’s an electronic trail, anyone who suspects that their child is being cyber-bullied can check that while old-fashioned bullying is more difficult to track down. Also while cyber-bullying can happen faster on smart phones the victim can also be harassed on a PC. I don’t think that waiting to use a PC and learn what nasty thing people are saying about you is going to be much better than getting an instant notification on a smart phone. It seems to me that the main disadvantage of smart phones in regard to cyber-bullying is that it’s easier for a child to participate in bullying if they have such a device. As most parents don’t seem concerned that their child might be a bully (unfortunately many parents think it’s a good thing) this doesn’t seem like a logical objection.

Fear of missing out (FOMO) is claimed to be a problem, apparently if a child has a phone then they will want to take it to bed with them and that would be a bad thing. But parents could have a policy about when phones may be used and insist that a phone not be taken into the bedroom. If it’s impossible for a child to own a phone without taking it to bed then the parents are probably dealing with other problems. I’m not convinced that a phone in bed is necessarily a bad thing anyway, a phone can be used as an alarm clock and instant-message notifications can be turned off at night. When I was young I used to wait until my parents were asleep before getting out of bed to use my PC, so if smart-phones were available when I was young it wouldn’t have changed my night-time computer use.

Some people complain that kids might use phones to play games too much or talk to their friends too much. What do people expect kids to do? In recent times the fear of abduction has led to children doing playing outside a lot less, it used to be that 6yos would play with other kids in their street and 9yos would be allowed to walk to the local park. Now people aren’t allowing 14yo kids walk to the nearest park alone. Playing games and socialising with other kids has to be done over the Internet because kids aren’t often allowed out of the house. Play and socialising are important learning experiences that have to happen online if they can’t happen offline.

Apps can be expensive. But it’s optional to sign up for a credit card with the Google Play store and the range of free apps is really good. Also the default configuration of the app store is to require a password entry before every purchase. Finally it is possible to give kids pre-paid credit cards and let them pay for their own stuff, such pre-paid cards are sold at Australian post offices and I’m sure that most first-world countries have similar facilities.

Electronic communication is claimed to be somehow different and lesser than old-fashioned communication. I presume that people made the same claims about the telephone when it first became popular. The only real difference between email and posted letters is that email tends to be shorter because the reply time is smaller, you can reply to any questions in the same day not wait a week for a response so it makes sense to expect questions rather than covering all possibilities in the first email. If it’s a good thing to have longer forms of communication then a smart phone with a big screen would be a better option than a “feature phone”, and if face to face communication is preferred then a smart phone with video-call access would be the way to go (better even than old fashioned telephony).

Real Problems with Smart Phones

The majority opinion among everyone who matters (parents, teachers, and police) seems to be that crime at school isn’t important. Many crimes that would result in jail sentences if committed by adults receive either no punishment or something trivial (such as lunchtime detention) if committed by school kids. Introducing items that are both intrinsically valuable and which have personal value due to the data storage into a typical school environment is probably going to increase the amount of crime. The best options to deal with this problem are to prevent kids from taking phones to school or to home-school kids. Fixing the crime problem at typical schools isn’t a viable option.

Bills can potentially be unexpectedly large due to kids’ inability to restrain their usage and telcos deliberately making their plans tricky to profit from excess usage fees. The solution is to only use pre-paid plans, fortunately many companies offer good deals for pre-paid use. In Australia Aldi sells pre-paid credit in $15 increments that lasts a year [2]. So it’s possible to pay $15 per year for a child’s phone use, have them use Wifi for data access and pay from their own money if they make excessive calls. For older kids who need data access when they aren’t at home or near their parents there are other pre-paid phone companies that offer good deals, I’ve previously compared prices of telcos in Australia, some of those telcos should do [3].

It’s expensive to buy phones. The solution to this is to not buy new phones for kids, give them an old phone that was used by an older relative or buy an old phone on ebay. Also let kids petition wealthy relatives for a phone as a birthday present. If grandparents want to buy the latest smart-phone for a 7yo then there’s no reason to stop them IMHO (this isn’t a hypothetical situation).

Kids can be irresponsible and lose or break their phone. But the way kids learn to act responsibly is by practice. If they break a good phone and get a lesser phone as a replacement or have to keep using a broken phone then it’s a learning experience. A friend’s son head-butted his phone and cracked the screen – he used it for 6 months after that, I think he learned from that experience. I think that kids should learn to be responsible with a phone several years before they are allowed to get a “learner’s permit” to drive a car on public roads, which means that they should have their own phone when they are 12.

I’ve seen an article about a school finding that tablets didn’t work as well as laptops which was touted as news. Laptops or desktop PCs obviously work best for typing. Tablets are for situations where a laptop isn’t convenient and when the usage involves mostly reading/watching, I’ve seen school kids using tablets on excursions which seems like a good use of them. Phones are even less suited to writing than tablets. This isn’t a problem for phone use, you just need to use the right device for each task.

Phones vs Tablets

Some people think that a tablet is somehow different from a phone. I’ve just read an article by a parent who proudly described their policy of buying “feature phones” for their children and tablets for them to do homework etc. Really a phone is just a smaller tablet, once you have decided to buy a tablet the choice to buy a smart phone is just about whether you want a smaller version of what you have already got.

The iPad doesn’t appear to be able to make phone calls (but it supports many different VOIP and video-conferencing apps) so that could technically be described as a difference. AFAIK all Android tablets that support 3G networking also support making and receiving phone calls if you have a SIM installed. It is awkward to use a tablet to make phone calls but most usage of a modern phone is as an ultra portable computer not as a telephone.

The phone vs tablet issue doesn’t seem to be about the capabilities of the device. It’s about how portable the device should be and the image of the device. I think that if a tablet is good then a more portable computing device can only be better (at least when you need greater portability).

Recently I’ve been carrying a 10″ tablet around a lot for work, sometimes a tablet will do for emergency work when a phone is too small and a laptop is too heavy. Even though tablets are thin and light it’s still inconvenient to carry, the issue of size and weight is a greater problem for kids. 7″ tablets are a lot smaller and lighter, but that’s getting close to a 5″ phone.

Benefits of Smart Phones

Using a smart phone is good for teaching children dexterity. It can also be used for teaching art in situations where more traditional art forms such as finger painting aren’t possible (I have met a professional artist who has used a Samsung Galaxy Note phone for creating art work).

There is a huge range of educational apps for smart phones.

The Wikireader (that I reviewed 4 years ago) [4] has obvious educational benefits. But a phone with Internet access (either 3G or Wifi) gives Wikipedia access including all pictures and is a better fit for most pockets.

There are lots of educational web sites and random web sites that can be used for education (Googling the answer to random questions).

When it comes to preparing kids for “the real world” or “the work environment” people often claim that kids need to use Microsoft software because most companies do (regardless of the fact that most companies will be using radically different versions of MS software by the time current school kids graduate from university). In my typical work environment I’m expected to be able to find the answer to all sorts of random work-related questions at any time and I think that many careers have similar expectations. Being able to quickly look things up on a phone is a real work skill, and a skill that’s going to last a lot longer than knowing today’s version of MS-Office.

There are a variety of apps for tracking phones. There are non-creepy ways of using such apps for monitoring kids. Also with two-way monitoring kids will know when their parents are about to collect them from an event and can stay inside until their parents are in the area. This combined with the phone/SMS functionality that is available on feature-phones provides some benefits for child safety.

iOS vs Android

Rumour has it that iOS is better than Android for kids diagnosed with Low Functioning Autism. There are apparently apps that help non-verbal kids communicate with icons and for arranging schedules for kids who have difficulty with changes to plans. I don’t know anyone who has a LFA child so I haven’t had any reason to investigate such things. Anyone can visit an Apple store and a Samsung Experience store as they have phones and tablets you can use to test out the apps (at least the ones with free versions). As an aside the money the Australian government provides to assist Autistic children can be used to purchase a phone or tablet if a registered therapist signs a document declaring that it has a therapeutic benefit.

I think that Android devices are generally better for educational purposes than iOS devices because Android is a less restrictive platform. On an Android device you can install apps downloaded from a web site or from a 3rd party app download service. Even if you stick to the Google Play store there’s a wider range of apps to choose from because Google is apparently less restrictive.

Android devices usually allow installation of a replacement OS. The Nexus devices are always unlocked and have a wide range of alternate OS images and the other commonly used devices can usually have an alternate OS installed. This allows kids who have the interest and technical skill to extensively customise their device and learn all about it’s operation. iOS devices are designed to be sealed against the user. Admittedly there probably aren’t many kids with the skill and desire to replace the OS on their phone, but I think it’s good to have option.

Android phones have a range of sizes and features while Apple only makes a few devices at any time and there’s usually only a couple of different phones on sale. iPhones are also a lot smaller than most Android phones, according to my previous estimates of hand size the iPhone 5 would be a good tablet for a 3yo or good for side-grasp phone use for a 10yo [5]. The main benefits of a phone are for things other than making phone calls so generally the biggest phone that will fit in a pocket is the best choice. The tiny iPhones don’t seem very suitable.

Also buying one of each is a viable option.

Conclusion

I think that mobile phone ownership is good for almost all kids even from a very young age (there are many reports of kids learning to use phones and tablets before they learn to read). There are no real down-sides that I can find.

I think that Android devices are generally a better option than iOS devices. But in the case of special needs kids there may be advantages to iOS.

BTRFS Status June 2015

The version of btrfs-tools in Debian/Jessie is incapable of creating a filesystem that can be mounted by the kernel in Debian/Wheezy. If you want to use a BTRFS filesystem on Jessie and Wheezy (which isn’t uncommon with removable devices) the only options are to use the Wheezy version of mkfs.btrfs or to use a Jessie kernel on Wheezy. I recently got bitten by this issue when I created a BTRFS filesystem on a removable device with a lot of important data (which is why I wanted metadata duplication and checksums) and had to read it on a server running Wheezy. Fortunately KVM in Wheezy works really well so I created a virtual machine to read the disk. Setting up a new KVM isn’t that difficult, but it’s not something I want to do while a client is anxiously waiting for their data.

BTRFS has been working well for me apart from the Jessie/Wheezy compatability issue (which was an annoyance but didn’t stop me doing what I wanted). I haven’t written a BTRFS status report for a while because everything has been OK and there has been nothing exciting to report.

I regularly get errors from the cron jobs that run a balance supposedly running out of free space. I have the cron jobs due to past problems with BTRFS running out of metadata space. In spite of the jobs often failing the systems keep working so I’m not too worried at the moment. I think this is a bug, but there are many more important bugs.

Linux kernel version 3.19 was the first version to have working support for RAID-5 recovery. This means version 3.19 was the first version to have usable RAID-5 (I think there is no point even having RAID-5 without recovery). It wouldn’t be prudent to trust your important data to a new feature in a filesystem. So at this stage if I needed a very large scratch space then BTRFS RAID-5 might be a viable option but for anything else I wouldn’t use it. BTRFS still has had little performance optimisation, while this doesn’t matter much for SSD and for single-disk filesystems for a RAID-5 of hard drives that would probably hurt a lot. Maybe BTRFS RAID-5 would be good for a scratch array of SSDs. The reports of problems with RAID-5 don’t surprise me at all.

I have a BTRFS RAID-1 filesystem on 2*4TB disks which is giving poor performance on metadata, simple operations like “ls -l” on a directory with ~200 subdirectories takes many seconds to run. I suspect that part of the problem is due to the filesystem being written by cron jobs with files accumulating over more than a year. The “btrfs filesystem” command (see btrfs-filesystem(8)) allows defragmenting files and directory trees, but unfortunately it doesn’t support recursively defragmenting directories but not files. I really wish there was a way to get BTRFS to put all metadata on SSD and all data on hard drives. Sander suggested the following command to defragment directories on the BTRFS mailing list:

find / -xdev -type d -execdir btrfs filesystem defrag -c {} +

Below is the output of “zfs list -t snapshot” on a server I run, it’s often handy to know how much space is used by snapshots, but unfortunately BTRFS has no support for this.

NAME USED AVAIL REFER MOUNTPOINT
hetz0/be0-mail@2015-03-10 2.88G 387G
hetz0/be0-mail@2015-03-11 1.12G 388G
hetz0/be0-mail@2015-03-12 1.11G 388G
hetz0/be0-mail@2015-03-13 1.19G 388G

Hugo pointed out on the BTRFS mailing list that the following command will give the amount of space used for snapshots. $SNAPSHOT is the name of a snapshot and $LASTGEN is the generation number of the previous snapshot you want to compare with.

btrfs subvolume find-new $SNAPSHOT $LASTGEN | awk '{total = total + $7}END{print total}'

One upside of the BTRFS implementation in this regard is that the above btrfs command without being piped through awk shows you the names of files that are being written and the amounts of data written to them. Through casually examining this output I discovered that the most written files in my home directory were under the “.cache” directory (which wasn’t exactly a surprise).

Now I am configuring workstations with a separate subvolume for ~/.cache for the main user. This means that ~/.cache changes don’t get stored in the hourly snapshots and less disk space is used for snapshots.

Conclusion

My observation is that things are going quite well with BTRFS. It’s more than 6 months since I had a noteworthy problem which is pretty good for a filesystem that’s still under active development. But there are still many systems I run which could benefit from the data integrity features of ZFS and BTRFS that don’t have the resources to run ZFS and need more reliability than I can expect from an unattended BTRFS system.

At this time the only servers I run with BTRFS are located within a reasonable drive from my home (not the servers in Germany and the US) and are easily accessible (not the embedded systems). ZFS is working well for some of the servers in Germany. Eventually I’ll probably run ZFS on all the hosted servers in Germany and the US, I expect that will happen before I’m comfortable running BTRFS on such systems. For the embedded systems I will just take the risk of data loss/corruption for the next few years.

Anti-Systemd People

For the Technical People

This post isn’t really about technology, I’ll cover the technology briefly skip to the next section if you aren’t interested in Linux programming or system administration.

I’ve been using the Systemd init system for a long time, I first tested it in 2010 [1]. I use Systemd on most of my systems that run Debian/Wheezy (which means most of the Linux systems I run which aren’t embedded systems). Currently the only systems where I’m not running Systemd are some systems on which I don’t have console access, while Systemd works reasonably well it wasn’t a standard init system for Debian/Wheezy so I don’t run it everywhere. That said I haven’t had any problems with Systemd in Wheezy, so I might have been too paranoid.

I recently wrote a blog post about systemd, just some basic information on how to use it and why it’s not a big deal [2]. I’ve been playing with Systemd for almost 5 years and using it in production for almost 2 years and it’s performed well. The most serious bug I’ve found in systemd is Bug #774153 which causes a Wheezy->Jessie upgrade to hang until you run “systemctl daemon-reexec” [3].

I know that some people have had problems with systemd, but any piece of significant software will cause problems for some people, there are bugs in all software that is complex enough to be useful. However the fact that it has worked so well for me on so many systems suggests that it’s not going to cause huge problems, it should be covered in the routine testing that is needed for a significant deployment of any new version of a distribution.

I’ve been using Debian for a long time. The transitions from libc4 to libc5 and then libc6 were complex but didn’t break much. The use of devfs in Debian caused some issues and then the removal of devfs caused other issues. The introduction of udev probably caused problems for some people too. Doing major updates to Debian systems isn’t something that is new or which will necessarily cause significant problems, I don’t think that the change to systemd by default compares to changing from a.out binaries to ELF binaries (which required replacing all shared objects and executables).

The Social Issue of the Default Init

Recently the Debian technical committee determined that Systemd was the best choice for the default init system in Debian/Jessie (the next release of Debian which will come out soon). Decisions about which programs should be in the default install are made periodically and it’s usually not a big deal. Even when the choice is between options that directly involve the user (such as the KDE and GNOME desktop environments) it’s not really a big deal because you can just install a non-default option.

One of the strengths of Debian has always been the fact that any Debian Developer (DD) can just add any new package to the archive if they maintain it to a suitable technical standard and if copyright and all other relevant laws are respected. Any DD who doesn’t like any of the current init systems can just package a new one and upload it. Obviously the default option will get more testing, so the non-default options will need more testing by the maintainer. This is particularly difficult for programs that have significant interaction with other parts of the system, I’ve had difficulties with this over the course of 14 years of SE Linux development but I’ve also found that it’s not an impossible problem to solve.

It’s generally accepted that making demands of other people’s volunteer work is a bad thing, which to some extent is a reasonable position. There is a problem when this is taken to extremes, Debian has over 1000 developers who have to work together so sometimes it’s a question of who gets to do the extra work to make the parts of the distribution fit together. The issue of who gets to do the work is often based on what parts are the defaults or most commonly used options. For my work on SE Linux I often have to do a lot of extra work because it’s not part of the default install and I have to make my requests for changes to other packages be as small and simple as possible.

So part of the decision to make Systemd be the default init is essentially a decision to impose slightly more development effort on the people who maintain SysVInit if they are to provide the same level of support – of course given the lack of overall development on SysVInit the level of support provided may decrease. It also means slightly less development effort for the people who maintain Systemd as developers of daemon packages MUST make them work with it. Another part of this issue is the fact that DDs who maintain daemon packages need to maintain init.d scripts (for SysVInit) and systemd scripts, presumably most DDs will have a preference for one init system and do less testing for the other one. Therefore the choice of systemd as the default means that slightly less developer effort will go into init.d scripts. On average this will slightly increase the amount of sysadmin effort that will be required to run systems with SysVInit as the scripts will on average be less well tested. This isn’t going to be a problem in the short term as the current scripts are working reasonably well, but over the course of years bugs may creep in and a proposed solution to this is to have SysVInit scripts generated from systemd config files.

We did have a long debate within Debian about the issue of default init systems and many Debian Developers disagree about this. But there is a big difference between volunteers debating about their work and external people who don’t contribute but believe that they are entitled to tell us what to do. Especially when the non-contributors abuse the people who do the work.

The Crowd Reaction

In a world filled with reasonable people who aren’t assholes there wouldn’t be any more reaction to this than there has been to decisions such as which desktop environment should be the default (which has caused some debate but nothing serious). The issue of which desktop environment (or which version of a desktop environment) to support has a significant affect on users that can’t be avoided, I could understand people being a little upset about that. But the init system isn’t something that most users will notice – apart from the boot time.

For some reason the men in the Linux community who hate women the most seem to have taken a dislike to systemd. I understand that being “conservative” might mean not wanting changes to software as well as not wanting changes to inequality in society but even so this surprised me. My last blog post about systemd has probably set a personal record for the amount of misogynistic and homophobic abuse I received in the comments. More gender and sexuality related abuse than I usually receive when posting about the issues of gender and sexuality in the context of the FOSS community! For the record this doesn’t bother me, when I get such abuse I’m just going to write more about the topic in question.

While the issue of which init system to use by default in Debian was being discussed we had a lot of hostility from unimportant people who for some reason thought that they might get their way by being abusive and threatening people. As expected that didn’t give the result they desired, but it did result in a small trend towards people who are less concerned about the reactions of users taking on development work related to init systems.

The next thing that they did was to announce a “fork” of Debian. Forking software means maintaining a separate version due to a serious disagreement about how it should be maintained. Doing that requires a significant amount of work in compiling all the source code and testing the results. The sensible option would be to just maintain a separate repository of modified packages as has been done many times before. One of the most well known repositories was the Debian Multimedia repository, it was controversial due to flouting legal issues (the developer produced code that was legal where they lived) and due to confusion among users. But it demonstrated that you can make a repository containing many modified packages. In my work on SE Linux I’ve always had a repository of packages containing changes that haven’t been accepted into Debian, which included changes to SysVInit in about 2001.

The latest news on the fork-Debian front seems to be the call for donations [4]. Apparently most of the money that was spent went to accounting fees and buying a laptop for a developer. The amount of money involved is fairly small, Forbes has an article about how awful people can use “controversy” to get crowd-funding windfalls [5].

MikeeUSA is an evil person who hates systemd [6]. This isn’t any sort of evidence that systemd is great (I’m sure that evil people make reasonable choices about software on occasion). But it is a significant factor in support for non-systemd variants of Debian (and other Linux distributions). Decent people don’t want to be associated with people like MikeeUSA, the fact that the anti-systemd people seem happy to associate with him isn’t going to help their cause.

Conclusion

Forking Debian is not the correct technical solution to any problem you might have with a few packages. Filing bug reports and possibly forking those packages in an external repository is the right thing to do.

Sending homophobic and sexist abuse is going to make you as popular as the GamerGate and GodHatesAmerica.com people. It’s not going to convince anyone to change their mind about technical decisions.

Abusing volunteers who might consider donating some of their time to projects that you like is generally a bad idea. If you abuse them enough you might get them to volunteer less of their time, but the most likely result is that they just don’t volunteer on anything associated with you.

Abusing people who write technical blog posts isn’t going to convince them that they made an error. Abuse is evidence of the absence of technical errors.

SE Linux Play Machine Over Tor

I work on SE Linux to improve security for all computer users. I think that my work has gone reasonably well in that regard in terms of directly improving security of computers and helping developers find and fix certain types of security flaws in apps. But a large part of the security problems we have at the moment are related to subversion of Internet infrastructure. The Tor project is a significant step towards addressing such problems. So to achieve my goals in improving computer security I have to support the Tor project. So I decided to put my latest SE Linux Play Machine online as a Tor hidden service. There is no real need for it to be hidden (for the record it’s in my bedroom), but it’s a learning experience for me and for everyone who logs in.

A Play Machine is what I call a system with root as the guest account with only SE Linux to restrict access.

Running a Hidden Service

A Hidden Service in TOR is just a cryptographically protected address that forwards to a regular TCP port. It’s not difficult to setup and the Tor project has good documentation [1]. For Debian the file to edit is /etc/tor/torrc.

I added the following 3 lines to my torrc to create a hidden service for SSH. I forwarded port 80 for test purposes because web browsers are easier to configure for SOCKS proxying than ssh.

HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 22 192.168.0.2:22
HiddenServicePort 80 192.168.0.2:22

Generally when setting up a hidden service you want to avoid using an IP address that gives anything away. So it’s a good idea to run a hidden service on a virtual machine that is well isolated from any public network. My Play machine is hidden in that manner not for secrecy but to prevent it being used for attacking other systems.

SSH over Tor

Howtoforge has a good article on setting up SSH with Tor [2]. That has everything you need for setting up Tor for a regular ssh connection, but the tor-resolve program only works for connecting to services on the public Internet. By design the .onion addresses used by Hidden Services have no mapping to anything that reswemble IP addresses and tor-resolve breaks it. I believe that the fact that tor-resolve breaks thins in this situation is a bug, I have filed Debian bug report #776454 requesting that tor-resolve allow such things to just work [3].

Host *.onion
ProxyCommand connect -5 -S localhost:9050 %h %p

I use the above ssh configuration (which can go in ~/.ssh/config or /etc/ssh/ssh_config) to tell the ssh client how to deal with .onion addresses. I also had to install the connect-proxy package which provides the connect program.

ssh root@zp7zwyd5t3aju57m.onion
The authenticity of host ‘zp7zwyd5t3aju57m.onion ()
ECDSA key fingerprint is 3c:17:2f:7b:e2:f6:c0:c2:66:f5:c9:ab:4e:02:45:74.
Are you sure you want to continue connecting (yes/no)?

I now get the above message when I connect, the ssh developers have dealt with connecting via a proxy that doesn’t have an IP address.

Also see the general information page about my Play Machine, that information page has the root password [4].

Systemd Notes

A few months ago I gave a lecture about systemd for the Linux Users of Victoria. Here are some of my notes reformatted as a blog post:

Scripts in /etc/init.d can still be used, they work the same way as they do under sysvinit for the user. You type the same commands to start and stop daemons.

To get a result similar to changing runlevel use the “systemctl isolate” command. Runlevels were never really supported in Debian (unlike Red Hat where they were used for starting and stopping the X server) so for Debian users there’s no change here.

The command systemctl with no params shows a list of loaded services and highlights failed units.

The command “journalctl -u UNIT-PATTERN” shows journal entries for the unit(s) in question. The pattern uses wildcards not regexs.

The systemd journal includes the stdout and stderr of all daemons. This solves the problem of daemons that don’t log all errors to syslog and leave the sysadmin wondering why they don’t work.

The command “systemctl status UNIT” gives the status and last log entries for the unit in question.

A program can use ioctl(fd, TIOCSTI, …) to push characters into a tty buffer. If the sysadmin runs an untrusted program with the same controlling tty then it can cause the sysadmin shell to run hostile commands. The system call setsid() to create a new terminal session is one solution but managing which daemons can be started with it is difficult. The way that systemd manages start/stop of all daemons solves this. I am glad to be rid of the run_init program we used to use on SE Linux systems to deal with this.

Systemd has a mechanism to ask for passwords for SSL keys and encrypted filesystems etc. There have been problems with that in the past but I think they are all fixed now. While there is some difficulty during development the end result of having one consistent way of managing this will be better than having multiple daemons doing it in different ways.

The commands “systemctl enable” and “systemctl disable” enable/disable daemon start at boot which is easier than the SysVinit alternative of update-rc.d in Debian.

Systemd has built in seat management, which is not more complex than consolekit which it replaces. Consolekit was installed automatically without controversy so I don’t think there should be controversy about systemd replacing consolekit.

Systemd improves performance by parallel start and autofs style fsck.

The command systemd-cgtop shows resource use for cgroups it creates.

The command “systemd-analyze blame” shows what delayed the boot process and
systemd-analyze critical-chain” shows the critical path in boot delays.

Sysremd also has security features such as service private /tmp and restricting service access to directory trees.

Conclusion

For basic use things just work, you don’t need to learn anything new to use systemd.

It provides significant benefits for boot speed and potentially security.

It doesn’t seem more complex than other alternative solutions to the same problems.

https://wiki.debian.org/systemd

http://freedesktop.org/wiki/Software/systemd/Optimizations/

http://0pointer.de/blog/projects/security.html

Conference Suggestions

LCA 2015 is next week so it seems like a good time to offer some suggestions for other delegates based on observations of past LCAs. There’s nothing LCA specific about the advice, but everything is based on events that happened at past LCAs.

Don’t Oppose a Lecture

Question time at the end of a lecture isn’t the time to demonstrate that you oppose everything about the lecture. Discussion time between talks at a mini-conf isn’t a time to demonstrate that you oppose the entire mini-conf. If you think a lecture or mini-conf is entirely wrong then you shouldn’t attend.

The conference organisers decide which lectures and mini-confs are worthy of inclusion and the large number of people who attend the conference are signalling their support for the judgement of the conference organisers. The people who attend the lectures and mini-confs in question want to learn about the topics in question and people who object should be silent. If someone gives a lecture about technology which appears to have a flaw then it might be OK to ask one single question about how that issue is resolved, apart from that the lecture hall is for the lecturer to describe their vision.

The worst example of this was between talks at the Haecksen mini-conf last year when an elderly man tried at great length to convince me that everything about feminism is wrong. I’m not sure to what degree the Haecksen mini-conf is supposed to be a feminist event, but I think it’s quite obviously connected to feminism – which is of course was why he wanted to pull that stunt. After he discovered that I was not going to be convinced and that I wasn’t at all interested in the discussion he went to the front of the room to make a sexist joke and left.

Consider Your Share of Conference Resources

I’ve previously written about the length of conference questions [1]. Question time after a lecture is a resource that is shared among all delegates. Consider whether you are asking more questions than the other delegates and whether the questions are adding benefit to other people. If not then send email to the speaker or talk to them after their lecture.

Note that good questions can add significant value to the experience of most delegates. For example when a lecturer appears to be having difficulty in describing their ideas to the audience then good questions can make a real difference, but it takes significant skill to ask such questions.

Dorm Walls Are Thin

LCA is one of many conferences that is typically held at a university with dorm rooms offered for delegates. Dorm rooms tend to have thinner walls than hotel rooms so it’s good to avoid needless noise at night. If one of your devices is going to make sounds at night please check the volume settings before you start it. At one LCA I was startled at about 2AM but the sound of a very loud porn video from a nearby dorm room, the volume was reduced within a few seconds, but it’s difficult to get to sleep quickly after that sort of surprise.

If you set an alarm then try to avoid waking other people. If you set an early alarm and then just get up then other people will get back to sleep, but pressing “snooze” repeatedly for several hours (as has been done in the past) is anti-social. Generally I think that an alarm should be at a low volume unless it is set for less than an hour before the first lecture – in which case waking people in other dorm rooms might be doing them a favor.

Phones in Lectures

Do I need to write about this? Apparently I do because people keep doing it!

Phones can be easily turned to vibrate mode, most people who I’ve observed taking calls in LCA lectures have managed this but it’s worth noting for those who don’t.

There are very few good reasons for actually taking a call when in a lecture. If the hospital calls to tell you that they have found a matching organ donor then it’s a good reason to take the call, but I can’t think of any other good example.

Many LCA delegates do system administration work and get calls at all times of the day and night when servers have problems. But that isn’t an excuse for having a conversation in the middle of the lecture hall while the lecture is in progress (as has been done). If you press the green button on a phone you can then walk out of the lecture hall before talking, it’s expected that mobile phone calls sometimes have signal problems at the start of the call so no-one is going to be particularly surprised if it takes 10 seconds before you say hello.

As an aside, I think that the requirement for not disturbing other people depends on the number of people who are there to be disturbed. In tutorials there are fewer people and the requirements for avoiding phone calls are less strict. In BoFs the requirements are less strict again. But the above is based on behaviour I’ve witnessed in mini-confs and main lectures.

Smoking

It is the responsibility of people who consume substances to ensure that their actions don’t affect others. For smokers that means smoking far enough away from lecture halls that it’s possible for other delegates to attend the lecture without breathing in smoke. Don’t smoke in the lecture halls or near the doorways.

Also using an e-cigarette is still smoking, don’t do it in a lecture hall.

Photography

Unwanted photography can be harassment. I don’t think there’s a need to ask for permission to photograp people who harass others or break the law. But photographing people who break the social agreement as to what should be done in a lecture probably isn’t. At a previous LCA a man wanted to ask so many questions at a keynote lecture that he had a page of written notes (seriously), that was obviously outside the expected range of behaviour – but probably didn’t justify the many people who photographed him.

A Final Note

I don’t think that LCA is in any way different from other conferences in this regard. Also I don’t think that there’s much that conference organisers can or should do about such things.

DNSSEC

reason=”verification failed; insecure key”

I’ve recently noticed OpenDKIM on systems I run giving the above message when trying to verify a DKIM message from my own domain. According to Google searches this is due to DNSSEC not being enabled. I’m not certain that I really need DNSSEC for this reason (I can probably make DKIM work without it), but the lack of it does decrease the utility of DKIM and DNSSEC is generally a good thing to have.

Client (Recursive) Configuration

The Debian Wiki page about DNSSEC is really good for setting up recursive resolvers [1]. Basically if you install the bind9 package on Debian/Wheezy (current stable) it will work by default. If you have upgraded from an older release then it might not work (IE if you modified the BIND configuration and didn’t allow the upgrade to overwrite your changes). The Debian Wiki page is also quite useful if you aren’t using Debian, most of it is more Linux specific than Debian specific.

dig +short test.dnssec-or-not.net TXT | tail -1

After you have enabled DNSSEC on a recursive resolver the above command should return “Yes, you are using DNSSEC“.

dig +noall +comments dnssec-failed.org

The above command queries a zone that’s deliberately misconfigured, it will fail if DNSSEC is working correctly.

Signing a Zone

Digital Ocean has a reasonable tutorial on signing a zone [2].

dnssec-keygen -a NSEC3RSASHA1 -b 2048 -n ZONE example.com

The above command creates a Zone Signing Key.

dnssec-keygen -f KSK -a NSEC3RSASHA1 -b 4096 -n ZONE example.com

The above command creates a Key Signing Key. This will take a very long time if you don’t have a good entropy source, on my systems it took a couple of days. Run this from screen or tmux.

$INCLUDE ksk/Kexample.com.+123+12345.key
$INCLUDE zsk/Kexample.com.+123+34567.key

When you have created the ZSK and KSK you need to add something like the above to your zone file to include the DNSKEY records.

all: example.com.signed

%.signed: %
        dnssec-signzone -A -3 $(shell head -c 100 /dev/random | sha1sum | cut -b 1-16) -k $(shell echo ksk/K$<*.key) -N INCREMENT -o $< -t $< $(shell echo zsk/K$<*.key)
        rndc reload

Every time you change your signed zone you need to create a new signed zone file. Above is the Makefile I’m currently using to generate the signed file. This relies on storing the KSK files in a directory named ksk/ and the ZSK files in a directory named zsk/. Then BIND needs to be configured to use example.com.signed instead of example.com.

The Registrar

Every time you sign the zone a file with a name like dsset-example.com. will be created, it will have the same contents every time which are the DS entries you send to the registrar to have your zone publicly known as being signed.

Many registrars don’t support DNSSEC, if you use such a registrar (as I do) then you need to transfer your zone before you can productively use DNSSEC. Without the DS entries being signed by a registrar and included in the TLD no-one will recognise your signatures on zone data.

ICANN has a list of registrars that support DNSSEC [3]. My next task is to move some of my domains to such registrars, unfortunately they cost more so I probably won’t transfer all my zones. Some of my zones don’t do anything that’s important enough to need DNSSEC.

wp-spamshield

Yesterday I installed the wp-spamshield plugin for WordPress [1]. It blocks automated comment spam systems by using JavaScript and cookies, apparently most spammers can’t handle that. Before I installed it I was getting hundreds of spam comments per day even with the block spam by math plugin enabled. Now I’ve had it running for 24 hours without any spam. The real advantage of this is that now when a legitimate comment gets flagged as spam I’ll notice it, previously I was deleting hundreds or thousands of comments at a time without reading them.

deb http://www.coker.com.au wheezy wordpress

The above repository has the wordpress-wp-spamshield package for Debian/Wheezy. I have no immediate plans for uploading it to Debian because the security support for WordPress plugins doesn’t fit in with the Debian model. I am prepared to negotiate about this if someone has good reasons for including it or any of the other WordPress plugins I’ve packages.

My packaging work is under the GPL (of course) so any DD who disagrees with me could just rebuild the package and upload it. Within Debian there is no rule taking another DD’s GPL’d code that they decided not to upload and then uploading it. There is a consensus that such things are not appropriate without permission, but anyone who wishes can take this blog post as permission.