lifetime failures (LF)

This morning at LCA Andrew Tanenbaum gave a talk about Minix 3 and his work on creating reliable software.

He cited examples of consumer electronics devices such as TVs that supposedly don’t crash. However in the past I have power-cycled TVs after they didn’t behave as desired (not sure if it was a software crash – but that seems like a reasonable possibility) and I have had a DVD player crash when dealing with damaged disks.

It seems to me that there are two reasons that TV and DVD failures aren’t regarded as a serious problem. One is that there is hardly any state in such devices, and most of that is not often changed (long-term state such as frequencies used for station tuning is almost never written and therefore unlikely to be lost on a crash). The other is that the reboot time is reasonably short (generally less than two seconds). So when (not if) a TV or DVD player crashes the result is a service interruption of two seconds plus the time taken to get to the power point and no loss of important data. If this sort of thing happens less than once a month then it’s likely that it won’t register as a failure with someone who is used to rebooting their PC once a day!

Another example that was cited was cars. I have been wondering whether there are any crash situations for a car electronic system that could result in the engine stalling. Maybe sometimes when I try to start my car and it stalls it’s really doing a warm-boot of the engine control system.

Later in his talk Andrew produced the results of killing some Minix system processes which show minimal interruption to service (killing an Ethernet device driver every two seconds decreased network performance by about 10%). He also described how some service state is stored so that it can be used if the service is restarted after a crash. Although he didn’t explicitely mention it in his talk it seems that he has followed the minimal data loss plus fast recovery features that we are used to seeing in TVs and DVD players.

The design of Minix also has some good features for security. When a process issues a read request it will grant the filesystem driver access to the memory region that contains the read buffer – and nothing else. It seems likely that many types of kernel security bug that would compromise systems such as Linux would not be a serious problem on the HURD. Compromising a driver for a filesystem that is mounted nosuid and nodev would not allow any direct attacks on applications.

Every delegate of LCA was given a CD with Minix 3, I’ll have to install it on one of my machines and play with it. I may put a public access Minux machine online at some time if there is interest.

Comments are closed.