I just wrote about the system administration issues related to the recent Debian SSL/SSH security flaw . The next thing we need to consider is how we can change things to reduce the incidence of such problems.
The problem we just had was due to the most important part of the entropy supply for the random number generator not being used due to a mistake in commenting out some code. The only entropy that was used was the PID number of the process which uses the SSL library code which gives us 15 bits of entropy. It seems to me that if we had zero bits of entropy the problem would have been discovered a lot sooner (almost certainly before the code was released in a stable version). Therefore it seems that using a second-rate source of entropy (which was never required) masked the problem that the primary source of entropy was not working. Would it make sense to have a practice of not using such second-rate sources of entropy to reduce the risk of such problems being undetected for any length of time? Is this a general issue or just a corner case?
Joss makes some suggestions for process improvements . He suggests that having a single .diff.gz file (the traditional method for maintaining Debian packages) that directly contains all patches can obscure some patches. The other extreme is when you have a patch management system with several dozen small patches and the reader has to try and discover what each of them does. For an example of this see the 43 patches which are included in the Debian PAM package for Etch, also note that the PAM system is comprised of many separate shared objects (modules), this means that the patching system lends itself to having one patch per module and thus 43 patches for PAM isn’t as difficult to manage as 43 patches for a complex package which is not comprised of multiple separate components might be. That said I think that there is some potential for separating out patches. Having a distinction between different types of patches might help. For example we could have a patch for Makefiles etc (including autoconf etc), a patch for adding features, and a patch for fixing bugs. Then people reviewing the source for potential bugs could pay a lot of attention to bug fixes, a moderate amount of attention to new features, and casually skim the Makefile stuff.
The problem began with this mailing list discussion . Kurt’s first message starts with “When debbuging applications” and ends with “What do you people think about removing those 2 lines of code?“. The reply he received from Ulf (a member of the OpenSSL development team) is “If it helps with debugging, I’m in favor of removing them“. It seems to me that there might have been a miscommunication there, Ulf may have believed that the discussion only concerned a debugging built and not a build that would eventually end up on millions of machines.
It seems possible that the reaction would have been different if Kurt had mentioned that he wanted to have a single source tree for both debugging and for regular use. It also seems likely that his proposed change may have received more inspection if he had clearly stated that he was doing to include it in Debian where it would be used by millions of people. When I am doing Debian development I generally don’t mention all the time “this code will be used by millions of people so it’s important that we get it right“, although I do sometimes make such statements if I feel that my questions are not getting the amount of consideration from upstream that a binary package destined for use by millions of people deserves. Maybe it would be a good practice to clarify such things in the case of important packages. For a package that is relevant to the security of the entire distribution (and possibly to other machines around the net – as demonstrated in this case) it doesn’t seem unreasonable to include a post-script mentioning the scope of the code use (it could be done with an alternate SIG if using a MUA that allows selecting from multiple SIGs in a convenient manner).
In the response from the OpenSSL upstream  it is claimed that the mailing list used was not the correct one. Branden points out that the openssl-team mailing list address seems not to be documented anywhere . One thing to be learned from this is that distribution developers need to be proactive in making contact with upstream developers. You might think that building packages for a major distribution and asking questions about it on the mailing list would result in someone from the team noticing and mentioning any other things that you might need to do. But maybe it would make sense to send private mail to one of the core developers, introduce yourself, and ask for advice on the best way to manage communication to avoid this type of confusion.
I think that it is ideal for distribution developers to have the phone numbers of some of the upstream developers. If the upstream work is sponsored by the employer of one of the upstream developers then it seems reasonable to ask for their office phone number. Sometimes it’s easier to sort things out by phone than by email.
Gunnar Wolf describes how the way this bug was discovered and handled shows that the Debian processes work . A similar bug in proprietary software would probably not be discovered nearly as quickly and would almost certainly not be fixed in such a responsible manner.
Update: According to the OpenSSL project about page , Ulf is actually not a “core” member, just a team member. I had used the term “core” in a slang manner based on the fact that Ulf has an official @openssl.org email address.
-  http://etbe.coker.com.au/2008/05/18/debian-ssh-problems/
-  http://np237.livejournal.com/17981.html
-  http://marc.info/?t=114651088900003&r=1&w=2
-  http://www.links.org/?p=328
-  http://advogato.org/person/branden/diary/5.html
-  http://gwolf.org/node/1743
-  http://www.openssl.org/about/