6

Journalism, Age, and Mono

Daniel Stone has criticised the IT journalist Sam Varghese for writing a negative article about a college student [1].

The student in question is 21 years old, that means he is legally an adult in almost every modern jurisdiction that I am aware of (the exception being Italy where you must be 25 years old to vote in senatorial elections [2]). It’s well known that college students often do stupid things, it’s not at all uncommon for college parties to end up involving the police. When 21yo college students do foolish things that involve breaking the law is it common for people to defend them because they are only students? I’m pretty sure that the legal system won’t accept such a defense. So while Sam was rather harsh in his comments (and did go a bit far with implying links to GNOME/Mono people), I don’t think it’s inappropriate on the basis of age. That said, a casual glance at life insurance premium tables by age will show that men who are less than 25 years old are prone to doing silly things, so I won’t hold this against the student in question as I’m sure he will be more sensible in future – I haven’t included his name in this post.

It’s often said that “you shouldn’t do anything that you would be ashamed of if it was described on the front page of a newspaper“, while I think that statement is a little extreme I do think it’s reasonable to try and avoid writing blog posts that you would be ashamed of if a popular blogger linked to it with a negative review. You have to expect that a post titled “Fuck You X” where X is the name of some famous person will get a significant reaction, and no-one can reasonably claim to have not wanted to offend anyone with such a post. Personally I would prefer that when people disagree with me they provide a list of reasons (as Sam did) rather than just a short negative comment with no content (as is more often the case).

Here is Sam’s article [3].

Here is the original version of “Fuck you, Richard Stallman and other GNU/Trolls” [4].

Here is an updated version titled “On Mono and the GPL” [5].

Here is a good rebuttal of the points made in the article [6] by “Cranky Old Nutcase”. Note that this rebuttal is linked from reference [5], it is a positive sign when someone links to documents that oppose their ideas to allow the reader to get all the facts. One significant fact that Cranky Old Nutcase pointed out and which was missed by Sam is that the Indian student wrote “A mentor of mine told me that patents are to prevent companies from getting sued, not to sue companies” while the Microsoft case against Tomtom is conclusive proof that patents ARE for the purpose of suing other companies and they ARE used in such a manner by Microsoft! I wonder whether the “mentor” in question is a Microsoft employee…

On the topic of Mono, I think that Alexander Reichle-Schmehl has the most reasonable and sensible description of the situation regarding Mono in Debian [7].

In spite of the nice dinner they gave me I still don’t trust Microsoft [8].

DomainKeys and OpenSSL have Defeated Me

I have previously written about an error that valgrind reported in the STL when some string operations were performed by the DKIM library [1]. This turned out to be a bug, Jonathan Wakely filed GCC bug report #40518 [2] about it, Jonathan is one of many very skillful people who commented on that post.

deb http://www.coker.com.au lenny gcc

I’m still not sure whether that bug could actually harm my program, Nathan Myers strongly suggested that it would not impact the correct functionality of the program but mentioned a possible performance issue (which will hurt me as the target platform is 8 or 12 core systems). Jaymz Julian seems to believe that the STL code in question can lead to incorrect operation and suggested stlport as an alternative. As I’m not taking any chances I built GCC with a patch from Jonathan’s bug report for my development machines and then built libdkim with that GCC. I created the above APT repository for my patched GCC packages. I also included version 3.4.1 of Valgrind (back-ported from Debian/Unstable) in that repository.

Nathan Myers also wrote: “Any program that calls strtok() even once may be flagged as buggy regardless of any thread safety issues. Use of strtok() (or strtok_r()) is a marker not unlike gets() of ill thought out coding.” I agree, I wrote a program to find such code and have eliminated all such code where it is called from my program [3].

I think it’s unfortunate that I have to rebuild all of GCC for a simple STL patch. My blog post about the issue of the size and time required to rebuild those packages [4] received some interesting comments, probably the most immediately useful one was to use --disable-bootstrap to get a faster GCC build, that was from Jonathan Wakely. Joe Buck noted that the source is available in smaller packages upstream, this is interesting, but unless the Debian developers package it in the same way I will have to work with the large Debian source packages.

I have filed many bug reports against the OpenSSL packages in Debian based on the errors reported by Valgrind [5]. I didn’t report all the issues related to error handling as there were too many. Now my program is often crashing when DomainKeys code is calling those error functions, so one of the many Valgrind/Helgrind issues I didn’t report may be the cause of my problems. But I can’t report too many bugs at once, I need to give people time to work on the current bug list first.

Another problem I have is that sometimes the libdkim code will trigger a libc assertion on malloc() or free() if DomainKeys code has been previously called. So it seems that the DomainKeys code (or maybe the OpenSSL code it calls) is corrupting the heap.

So I have given up on the idea of getting DomainKeys code working in a threaded environment. Whenever I need to validate a DomainKeys message my program will now fork a child process to do that. If it corrupts the heap while doing so it’s no big deal as the child process calls exit(0) after it has returned the result over a pipe. This causes a performance loss, but it appears that it’s less than 3 times slower which isn’t too bad. From a programming perspective this was fairly easy to implement because a thread of the main program prepares all the data and then the child process can operate on it – it would be a lot harder to implement such things on an OS which doesn’t have fork().

DomainKeys has been obsoleted by DKIM for some time, so all new deployments of signed email should be based on DKIM and systems that currently use DomainKeys should be migrating soon. So the performance loss on what is essentially a legacy feature shouldn’t impact the utility of my program.

I am considering uploading my libdomainkeys package to Debian. I’m not sure how useful it would be as DomainKeys is hopefully going away. But as I’ve done a lot of work on it already I’m happy to share if people are interested.

Thanks again for all the people who wrote great comments on my posts.

3

Web Hosting After Death

Steve Kemp writes about his concerns for what happens to his data after death [1]. Basically everything will go away when bills stop being paid. If you have hosting on a monthly basis (IE a Xen DomU) then when the bank account used for the bill payment is locked (maybe a week after death) the count-down to hosting expiry starts. As noted in Steve’s post it is possible to pay for things in advance, but everything will run out eventually.

One option is to have relatives keep the data online. With hard drives getting bigger all the time it wouldn’t be difficult to backup the web sites for everyone in your family to a USB flash device and then put it online at a suitable place. Of course that relies on having relatives with the skill and interest necessary.

The difficult part is links, if the domain expires then links will be broken. One way of alleviating this would be to host content with Blogger, Livejournal, or other similar services. But then instead of the risk of a domain being lost you have the risk of a hosting company going bankrupt.

It seems to me that the ideal solution would be to have a hosting company take over the web sites of deceased people and put adverts on them to cover the hosting costs. As the amount of money being spent on Internet advertising will only increase while the costs of hosting steadily go down it seems that collecting a lot of content for advertising purposes would be a good business model. If the web sites of dead people are profitable then they will remain online.

It wouldn’t be technically difficult to extract the data from a blog server such as WordPress (either from a database dump or crawling the web site), change the intra-site links to point to a different domain name, and then put it online as static content with adverts. If a single company (such as Google) had a large portion of the market of hosting the web sites of dead people then when someone died and had their web site transferred the links on the other sites maintained by the same company could be automatically adjusted to match. A premium service from such a company could be to manage the domain. If they were in the domain registrar business it would be easy to allow someone to pay for 10 or 20 years after their death. Possibly with a portion of the advertising revenue going towards extending the domain registration. I think that this idea has some business potential, I don’t have the time or energy to implement it myself and my clients are busy on other things so I’m offering it to the world.

Cory Doctorow has written an article for the Guardian about a related issue – how to allow the next of kin to access encrypted data when someone is dead [2]. One obvious point that he missed is the possibility that he might forget his own password, a small injury from a car accident could cause that problem.

It seems strange to me that someone would have a great deal of secret data that needs strong encryption but yet has some value after they are dead. Archives of past correspondence to/from someone who is dead is one category of secret data that is really of little use to anyone unless the deceased was particularly famous. Probably the majority of encrypted data from a dead person would be best wiped.

For the contents of personal computers the best strategy would probably be to start by dividing the data into categories according to the secrecy requirements. Publish the things that aren’t secret, store a lot of data unencrypted (things that are not really secret but you merely don’t want to share them with the world), have a large encrypted partition that will have it’s contents lost when you die, and have a very small encrypted device that has bank passwords and other data that is actually useful for the executors of the will.

One thing that we really need is to have law firms that have greater technical skills. It would be good if the law firms that help people draw up wills could advise them on such issues and act as a repository for such data. It seems to me that the technical skills that are common within law firms are not adequate for the task of guarding secret electronic data for clients.

8

Valgrind and OpenSSL

I’ve just filed Debian bug report #534534 about Valgrind/Helgrind reporting “Possible data race during write” [1]. I included a patch that seems to fix that problem (by checking whether a variable is not zero before setting it to zero). But on further testing with Valgrind 3.4.1 (backported from Debian/Unstable) it seems that my patch is not worth using, I expect that Valgrind related patches won’t be accepted into the Lenny version of OpenSSL.

I would appreciate suggestions on how to fix this, the problem is basically having a single static variable that is initialised to the value 1 but set to 0 the first time one of the malloc functions is called. Using a lock for this is not desirable as it will add overhead to every malloc operation. However without the lock it does seem possible to have a race condition if one thread calls CRYPTO_set_mem_functions() and then before that operation is finished a time slice is given to a thread that is allocating memory. So in spite of the overhead I guess that using a lock is the right thing to do.

deb http://www.coker.com.au lenny gcc

For the convenience of anyone who is testing these things on Debian and wants to use the latest valgrind, the above Debian repository has Valgrind 3.4.1 and a build of GCC to fix the problem I mentioned in my previous blog post about Valgrind [2].

if (default_RSA_meth == NULL)
default_RSA_meth=RSA_PKCS1_SSLeay();

I have also filed bug #534656 about another reported race condition in the OpenSSL libraries [3]. Above is the code in question (with some C preprocessor stuff removed). This seems likely to be a problem on an architecture for which assignment of a pointer is not an atomic operation, I don’t know if we even have any architectures that work in such a way.

static void impl_check(void)   {
        CRYPTO_w_lock(CRYPTO_LOCK_EX_DATA);
        if(!impl)
                impl = &impl_default;
        CRYPTO_w_unlock(CRYPTO_LOCK_EX_DATA);
}
#define IMPL_CHECK if(!impl) impl_check();

A similar issue is my bug report bug #534683 [4] which is due to a similar issue with the above code. If the macro is changed to just call impl_check() then the problem will go away, but at some performance cost.

I filed bug report #534685 about a similar issue with the EX_DATA_CHECK macro [5].

I filed bug report #534687 about some code that has CRYPTO_w_lock(CRYPTO_LOCK_EX_DATA); before it [6], so it seems that the code may be safe and it may be an issue with how Valgrind recognises problems (maybe a Valgrind bug or an issue with how Valgrind interprets what the OpenSSL code is doing). Valgrind 3.3.1 reported many more issues that were similar to this, so it appears that version 3.4.1 improved the analysis of this but didn’t do quite enough.

I filed bug report #534706 about the cleanse_ctr global variable that is used as a source of pseudo-randomness for the OPENSSL_cleanse() function without locking [7]. It seems that they have the idea that memset() is not adequate for clearing memory. Does anyone know of a good research paper about recovering the contents of memory after memset()? I doubt that we need such things.

I filed bug report #534699 about what appears to be a potential race condition in int_new_ex_data() [8]. The def_get_class() function obtains a lock before returning a pointer to a member of a hash table. It seems possible for an item to be deleted from the hash table (and it’s memory freed) after def_get_class() has returned the pointed but before int_new_ex_data() accesses the memory in question.

I filed bug report #534889 about int_free_ex_data() and int_new_ex_data() which call def_get_class() before obtaining a lock and then use the data returned from that function in a locked area[9] (it seems that obtaining the lock earlier would solve this).

I filed bug report #534892 about another piece of code which would have a race condition if pointer assignment isn’t atomic, this time in err_fns_check() [10]. In my first pass I didn’t bother filing bug reports about most of the issues helgrind raised with the error handling code (there were so many that I just hoped that there was some subtle locking involved that eluded helgrind and my brief scan of the source). But a new entry in my core file collection suggests that this may be a problem area for my code.

I think that it is fairly important to get security related libraries to be clean for use with valgrind and other debugging tools – if only to allow better debugging of the code that calls them. I would appreciate any assistance that people can offer in terms of fixing these problems. I know that there are security risks in terms of changing code in such important libraries, but there are also risks in leaving potential race conditions in such code.

As an aside, I’ve filed a wishlist bug report #534695 requesting that valgrind would have a feature to automatically add entries to the suppressions file [11]. As a function that is considered to be unsafe can be called from different contexts, and code that is considered unsafe can be in a macro that is called from multiple functions there can be many different suppressions needed. Pasting them all into the suppressions file is tedious.

20

Valgrind/Helgrind and STL string

I am trying to track down a thread-safety problem in one of my programs. Valgrind when run as “valgrind –tool=helgrind ./thread-test” claims that there is a problem with the following program (the Valgrind errors are at the end of the post). The SGI documents state [1]: “The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses. “.

My interpretation of the SGI document is that different STL strings can be manipulated independently. When I have two threads running I have two stacks, so strings that are allocated on the stack will be “distinct containers“. So this looks like a bug in the STL or a bug in Valgrind. I am hesitant to file a bug report because I remember the advice that a wise programmer once gave me: “the people who develop compilers and tool chains are much better at coding than you, any time you think there’s a bug in their code there is probably a bug in yours“. I know that my blog is read by some really great programmers. I look forward to someone explaining what is wrong with my code or confirming that I have found a bug in Valgrind or the STL.

#define NUM_THREADS 2
#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#include <string>

using namespace std;

void whatever()
{
string str;
str.erase();
}

struct thread_data
{
int thread_id;
};

void *do_work(void *data)
{
struct thread_data *td = (struct thread_data *)data;
printf("%d:stack:%X\n", td->thread_id, &td);
while(1)
  whatever();
}

int main(int argc, char **argv)
{
pthread_t *thread_info = (pthread_t *)calloc(NUM_THREADS, sizeof(thread_info));
pthread_attr_t attr;
if(pthread_attr_init(&attr) || pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE) || pthread_attr_setstacksize(&attr, 32*1024))
  fprintf(stderr, "Can't set thread attributes.\n");

int t;
struct thread_data td[NUM_THREADS];
for(t = 0; t < NUM_THREADS; t++)
{
  printf("created thread %d\n", t);
  td[t].thread_id = t;
  int p = pthread_create(&thread_info[t], &attr, do_work, (void *)&td[t]);
  if(p)
  {
  fprintf(stderr, "Can't create thread %d\n", t);
  exit(1);
  }
}
pthread_attr_destroy(&attr);
void *value_ptr;
for(t = 0; t < NUM_THREADS; t++)
  pthread_join(thread_info[t], &value_ptr);
free(thread_info);

return 0;
}

==30622== Possible data race during write of size 4 at 0x414FED0
==30622==    at 0x40FAD9A: std::string::_Rep::_M_set_sharable() (basic_string.h:201)
==30622==    by 0x40A945E: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:206)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: shared-readonly by threads #2, #3
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no consistent locks
==30622==  Location 0x414FED0 has never been protected by any lock
==30622==
==30622== Possible data race during write of size 4 at 0x414FEC8
==30622==    at 0x40A9465: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:207)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: shared-readonly by threads #2, #3
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no consistent locks
==30622==  Location 0x414FEC8 has never been protected by any lock
==30622==
==30622== Possible data race during write of size 1 at 0x414FED4
==30622==    at 0x40A9012: std::char_traits<char>::assign(char&, char const&) (char_traits.h:246)
==30622==    by 0x40A9488: _ZNSs4_Rep26_M_set_length_and_sharableEj@@GLIBCXX_3.4.5 (basic_string.h:208)
==30622==    by 0x40FD6B6: std::string::_M_mutate(unsigned, unsigned, unsigned) (basic_string.tcc:471)
==30622==    by 0x40FDBEE: std::string::erase(unsigned, unsigned) (basic_string.h:1133)
==30622==    by 0x8048AA8: whatever() (thread-test.cpp:16)
==30622==    by 0x8048B0A: do_work(void*) (thread-test.cpp:29)
==30622==    by 0x402641B: mythread_wrapper (hg_intercepts.c:193)
==30622==    by 0x40464BF: start_thread (in /lib/i686/cmov/libpthread-2.7.so)
==30622==    by 0x42696DD: clone (in /lib/i686/cmov/libc-2.7.so)
==30622==  Old state: owned exclusively by thread #2
==30622==  New state: shared-modified by threads #2, #3
==30622==  Reason:    this thread, #3, holds no locks at all

3

Lies and Online Dating

Separating Fact From Fiction: An Examination of Deceptive Self-Presentation in Online Dating Profiles is a really interesting paper by Catalina L. Toma and Jeffrey Hancock of Cornell University and Nicole Ellison Michigan State University [1]. People who don’t use the Internet much regard online dating as an area that is filled with liars – largely due to a small number of gross liars that are well publicised (EG people conducting net-relationships while lying about their gender), but the evidence suggests that this is not the case.

The first point this paper made which I found to be particularly interesting is that anyone who desires a meeting in person will be limited in their lies (EG they have to post a real photograph of themself). So it seems that someone who doesn’t want to be lied to should insist on phone calls and meetings in person without much delay – the worst horror stories about online dating seem to concern people who interact over the net for many months before meeting. Maybe it would be a good strategy for users of singles sites to have a paragraph stating “I want to talk to you after receiving X messages” where X is some number significantly less than 10.

The next point was that connections to the real-world persona decrease the ability to lie, lying on a dating site that is only read by other singles is going to be easier than lying on your favorite social networking site, personal web-site, blog, or any other online resource that is read by people who know you. So someone who didn’t want to be lied to could demand the link to other online resources of the person that they are corresponding to.

Also the ability to track different versions of the profile decreases the ability to lie. It would be interesting if some of the singles sites added a history tracking feature so that it was possible to see the previous versions of someone’s profile. But even without such a feature the need to have a single profile greatly decreases the ability to lie. When trying to impress someone in a bar I believe that it is standard practice to pretend to be interested in whatever interests them, but on a profile page it is necessary to provide a short list of interests that remains relatively static.

Later in the paper there is an analysis of the average discrepancies of the profiles, which in most cases are small enough that they would not be noticed when meeting in person. But it does note that there were a few extreme lies (such as someone misrepresenting their age by 11 years).

Near the end of the paper there is a brief description of some related research. One interesting point is that even when there is no way for a liar to be caught (EG online discussion forums in which no meeting in person is planned) most people will still tend not to lie much. Great lies require a change in self-concept, and most people don’t want to think of themself as a liar.

It seems to me that this paper provides strong evidence to show that no-one should be afraid of being lied to on an Internet singles site and it can also be used to form strategies to avoid being the victim of a lie. So for those of you who are single and afraid of singles sites, fear no more!

10

A “Well Rounded” CV

When discussing career advice one idea that occasionally comes up is that someone should be “well rounded” and should demonstrate this by listing skills that are entirely unrelated to the job in question. Something along the lines of “I’m applying for your C programmer position, and I like spending my spare time playing tennis and golf“.

I suspect that the bad idea in question originated in the days when it was not uncommon to work for the same company for 20+ years and when there were company picnics etc. In that social environment employing someone implied socialising with them outside work so it would be a benefit to have something in common with your employees other than working for the same company. Also in those times there were few laws about discrimination in the hiring process.

It is often claimed that participation in team sports teaches people how to do well in team activities in a work environment. I have previously described the ways in which software development is a team sport [1]. Like most analogies this one is good in some ways and bad in others. Team-work is required in software development but it’s not quite the same as the team-work in sports. One significant difference is that most team sports have a single ball, and the person who has the ball (or who is about to catch it, hit it, etc) is (for a moment) the most important person on the field. There have been many sporting debacles when two players from the same team tried to catch a ball at the same time, so the rule in team sports is that you don’t compete with a colleague. In a work environment there are many situations where it’s necessary for tasks to be passed between colleagues at short notice. For example when a deadline is imminent tasks often need to be reassigned to the most skilled people. A junior programmer needs to know that they aren’t an athlete who is running with the ball, their teamwork involves having difficult tasks being reassigned from them at short notice.

Another significant difference between sports and work is the amount of aggression that is tolerated. In most sports some level of harassment of opposing players is tolerated. But in the modern workplace using a single naughty word can be considered as just cause for instantly sacking an employee. So it seems that exposure to an aggressive sporting environment would be a bad thing if it actually makes any difference.

One thing that is sometimes ignored is the teamwork that is involved with hobbyist computer work. Being involved with a software development team for fun will surely give teamwork experience that is more relevant to paid software development work than any sport!

One of the reasons cited for being “well rounded” is the ability to have a “work life balance“. I might almost believe such a claim if it wasn’t made in connection with the IT industry. But given how common it is to demand 60 hour working weeks (or longer) and the number of people who are required to have mobile phones turned on when they aren’t at work it seems that the general trend in the IT industry is against a work-life balance. When hiring people to work in cultures where a strict 40 hour working week is well accepted it seems that hiring people who are willing to work as long as required is important. When I worked in the Netherlands I lost count of the number of times I worked until 10PM or later to fix a broken system after all my Dutch colleagues departed at 5PM.

I have also seen the bizarre claim that consumption of alcohol leads to developing better social skills. It seems really strange to me that anyone would want to work in a company where social skills that are relevant to a bar would be useful (I am reminded of a company that was named after the founder’s penis – I declined to send my CV to that company). Also of course there is the fact that in most countries where I would want to live it is illegal to discriminate against hiring someone for refusing to drink alcohol.

It is quite common for the geekiest people to do a significant portion of their socialising via email and instant/short-messaging (formerly IRC, now Jabber, Twitter, and other services). It seems to me that this experience is more relevant to most aspects of the modern work environment (where most communication that matters is via email and instant-messaging) than any form of socialising that happens in a sports club or a bar. In fact people who are used to face-to-face dealings might have difficulty fitting in to an environment where most communication is electronic.

Now employers seem to have worked these things out. Recruiting agents (who reject most job applicants) have told me that they want to see nothing on a CV that doesn’t relate to a job. That is an extreme position, but seems to represent the desires of the hiring managers who will see the CVs that get past the recruiters. Hiring managers often don’t even read a CV before an interview, they often entirely rely on recruiting agents to determine who they will interview. So it seems that an effective CV will in most cases list as many keywords as possible, demonstrate experience in the technologies that were listed in the job advert, show years of work with no long breaks, and have little else.

Finally the IT industry is distinguished by having a significant number of people who’s work and hobby are almost identical, those people tend to be significantly more skilled than average. It seems to be a bad idea to avoid the potential of hiring some of the most skilled people.

To a large extent your career success depends on what you learn from your colleagues, so if you end up working in a team of people with low skills then it is bad for your career. Therefore it seems that anyone who wants to have a successful career will strive to avoid working for a company who’s hiring process had any criteria other than the ability to do the job well and the ability to not be a jerk. So when it comes to the technical part of a job interview (where the hiring manager brings his most technical people to grill the candidate) it probably makes sense to ask those technical people what their hobbies are. If their hobby is something other than computers then it indicates that the employer might be a bad one – so at least you should ask for more money as compensation for not having highly skilled colleagues.

7

Finding Thread-unsafe Code

One problem that I have had on a number of occasions when developing Unix software is libraries that use non-reentrant code which are called from threaded programs. For example if a function such as strtok() is used which is implemented with a static variable to allow subsequent calls to operate on the same string then calling it from a threaded program may result in a SEGV (if for example thread A calls strtok() and then frees the memory before thread B makes a second call to strtok(). Another problem is that a multithreaded program may have multiple threads performing operations on data of different sensitivity levels, for example a threaded milter may operate on email destined for different users at the same time. In that case use of a library call which is not thread safe may result in data being sent to the wrong destination.

One potential solution is to use a non-threaded programming model (IE a state machine or using multiple processes). State machines don’t work with libraries based on a callback model (EG libmilter), can’t take advantage of the CPU power available in a system with multiple CPU cores, and require asynchronous implementations of DNS name resolution. Multiple processes will often give less performance and are badly received by users who don’t want to see hundreds of processes in ps output.

So the question is how to discover whether a library that is used by your program has code that is not reentrant. Obviously a library could implement it’s own functions that use static variables – I don’t have a solution to this. But a more common problem is a library that uses strtok() and other libc functions that aren’t reentrant – simply because they are more convenient. Trying to examine the program with nm and similar tools doesn’t seem viable as libraries tend to depend on other libraries so it’s not uncommon to have 20 shared objects being linked in at run-time. Also there is the potential problem of code that isn’t called, if library function foo() happens to call strtok() but I only call function bar() from that library then even though it resolves the symbol strtok at run-time it shouldn’t be a problem for me.

So the obvious step is to use a LD_PRELOAD hack to override all the undesirable functions with code that will assert() or otherwise notify the developer. Bruce Chapman of Sun did a good job of this in 2002 for Solaris [1]. His code is very feature complete but has a limited list of unsafe functions.

Instead of using his code I wrote a minimal implementation of the same concept which searches the section 3 man pages installed on the system for functions which have a _r variant. In addition to that list of functions I added some functions from Bruce’s list which did not have a _r variant. That way I got a list of 72 functions compared to the 40 that Bruce uses. Of course with my method the number of functions that are intercepted will depend on the configuration of the system used to build the code – but that is OK, if the man pages are complete then that will cover all functions that can be called from programs that you write.

Now there is one significant disadvantage to my code. That is the case where unsafe functions are called before child threads are created. Such code will be aborted even though in production it won’t cause any problems. One thing I am idly considering is writing code to parse the man pages for the various functions so it can use the correct parameters for proxying the library calls with dlsym(RTLD_NEXT, function_name). The other option would be to hand code each of the 72 functions (and use more hand coding for each new library function I wanted to add).

To run my code you simply compile the shared object and then run “LD_PRELOAD=./thread.so ./program_to_test” and the program will abort and generate a core dump if the undesirable functions are called.

Here’s the source to the main program:

#!/bin/bash
cat > thread.c << END
#undef NDEBUG
#include <assert.h>
END
OTHERS="getservbyname getservbyport getprotobyname getnetbyname getnetbyaddr getrpcbyname getrpcbynumber getrpcent ctermid tempnam gcvt getservent"
for n in $OTHERS $(ls -1 /usr/share/man/man3/*_r.*|sed -e "s/^.*\///" -e "s/_r\..*$//"|grep -v ^lgamma|sort -u) ; do
  cat >> thread.c << END
void $n()
{
  assert(0);
}
END
done

Here is the Makefile, probably the tabs will be munged by my blog but I’m sure you know where they go:

all: thread.so

thread.c: gen.sh Makefile
./gen.sh

thread.so: thread.c
gcc -shared -o thread.so -fPIC thread.c

clean:
rm thread.so thread.c

Update:
Simon Josefsson wrote an interesting article in response to this [2].

2

Linux Rate-Limiting of an ADSL Link

After great pain I’ve got tc working on some Linux routers. The difficulty with limiting an ADSL link is that the ADSL modem has significant buffers and the link between the Linux machine and the modem is significantly faster than the ADSL upstream channel. This means that the transmission speed needs to be artificially limited, a speed of about 95% the maximum channel speed is often recommended. As ADSL upstream speed often varies (at least in my experience) that means that you must limit the transmission speed to 95% of the lowest speed that you expect to see – which of course means a significant drop in performance when the ADSL link is performing well.

I use the HTB queuing discipline to limit the transmission rate. My transmission speed varies between 550kbit and 680kbit in my rough tests. So I start by limiting the overall device performance to 550kbit. Then I have three different classes with IDs 1:10, 1:20, and 1:30 with rates of 64kbit, 480kbit, and 128kbit respectively. It is often recommended that the inferior classes have a total bandwidth allowance that is equal to the allowance for the overall link, but I have the three inferior classes allocated with 672kbit – I think that this will work as it will be quite rare that all classes will be in operation at the same time and fairly unlikely that I will ever have all three classes running at maximum speed. I will be interested to see any comments about this, I might have misunderstood the issues related to this.

Each class has a SFQ queue discipline associated with it for fair queuing within the class. It might be a bit of overkill, I expect to only have one data channel in operation on the VOIP class so it probably does no good there and my usage pattern is such that if the 480kbit connection is anywhere near busy then it’s due to a single large transfer. But with the power of a P3 CPU applied to the task of routing at ADSL speeds it really doesn’t matter if some CPU time is wasted.

Then the tc filter lines associate iptables marks with the classes.

Now this is only a tiny fraction of what tc can do. But I think that this basic configuration with the rate limits changed will suit many ADSL router configurations, it may not be an ideal configuration for most ADSL routers but it will probably be a viable configuration that will be better than having no traffic shaping. Below is the shell script that I am using:

#!/bin/bash -e

DEV=ppp0

tc qdisc del dev $DEV parent root handle 1:0 2> /dev/null | true
tc qdisc add dev $DEV parent root handle 1:0 htb default 30

# limit the rate to slightly lower than DSL line speed
tc class add dev $DEV parent 1:0 classid 1:1 htb rate 550kbit prio 1

# sub classes for each traffic type
# 10 is VOIP, 20 is default, 30 is the test network
tc class add dev $DEV parent 1:1 classid 1:10 htb rate 64kbit burst 6k prio 2
tc class add dev $DEV parent 1:1 classid 1:20 htb rate 480kbit burst 12k prio 3
tc class add dev $DEV parent 1:1 classid 1:30 htb rate 128kbit burst 12k prio 4

# use an sfq under each class to share the bandwidth
tc qdisc add dev $DEV parent 1:10 handle 10: sfq
tc qdisc add dev $DEV parent 1:20 handle 20: sfq
tc qdisc add dev $DEV parent 1:30 handle 30: sfq

tc filter add dev $DEV parent 1: protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev $DEV parent 1: protocol ip prio 2 handle 2 fw classid 1:20
tc filter add dev $DEV parent 1: protocol ip prio 3 handle 3 fw classid 1:30

iptables -t mangle -F POSTROUTING
iptables -t mangle -A POSTROUTING -j MARK --set-mark 2
iptables -t mangle -A POSTROUTING -p tcp --sport 22 -j MARK --set-mark 3
iptables -t mangle -A POSTROUTING -d $VOIPSERVER -j MARK --set-mark 1

4

LG U990 Viewty

back of Viewtyfront of Viewty

I have just got a LG U990 “Viewty” mobile phone [1]. It’s a 3G phone and came free on the $29 monthly cap plan from “Three” (minimum monthly spend is $29 – but this is free if you use $29 per month). My previous plan was the $29 cap but had a minimum spend of $20 per month, as I never happened to use less than $29 per month I am not paying more.

For a modern mobile phone the actual phone functionality is a sideline. If a device was strictly designed to be a phone then I think it would be very similar to the Nokia phones that were available 5+ years ago – the Nokia I had in 1999 performed every phone function that I desired of it.

Like most modern phones the LG U990 “Viewty” suffers in it’s phone functionality from the desire to make it do non-phone tasks and from the desire to cripple it to meet the desires of the carriers (not the desires of the users). For example the “home” screen will always have at least two buttons for paid Three services and I have no configuration option to remove them. Replacing them with speed dial options for a couple of numbers that I regularly call would be handy. As the main screen is a touch-screen there is no excuse for this, they should allow the software to be reconfigured with more useful options. Eventually the Android will kill most of the other phones and this problem of phones being designed to suit the telephone companies instead of the users will be solved.

One of the most annoying mis-features of the phone is that it doesn’t properly handle address book entries with multiple phone numbers. One of my friends turns his mobile off when he is at home, so I regularly call his mobile and then immediately call his home number if the mobile is unavailable. With my previous two mobile phones I could press the “dial” button to bring up the list of previous calls and then use the arrow buttons to select from the other numbers that are attached to the same address-book entry, so with three button presses I would be dialing his other number. With the Viewty I have to go back to the address book.

The compelling feature of the Viewty is the camera and display. It has a 5MP camera which makes it the second highest resolution camera-phone offered by Three – the LG Renoir (KC910) has 8MP but needs a $99 plan for it to be free. It also has a 240*320 resolution touch-screen display (in a quick search in January when I bought my phone the best resolution display I could find on a phone is 240*400 in the LG Renoir).

While the camera is documented as being 5mp there are no specs available about the resolution of the CCD. I want to use the native resolution of the CCD for pictures (I think interpolation is a waste of space). The CCD might actually be 5mp, a picture of the 1400*1050 resolution screen on my Thinkpad allows me to read all the text even when small fonts are in use, so the CCD resolution must be significantly greater than 1400*1050. This is a really important feature as the Viewty will work well for making screen-shots for bug reports about crashed computers (several of my clients have expressed interest in getting one after seeing such a demonstration). One annoying problem is that the camera software takes a while to load, my trusty Sony digital camera starts a lot faster and the LG U890 phone I used for the past two years is also a lot faster and more convenient. This won’t work well for photographing unexpected events. When I am traveling by public transport I will photograph the relevant pages of my street directory as I can zoom in to photos of the maps and read the street names, it saves some weight when traveling. The 2G micro-SD card (which incidentally cost $10 from OfficeWorks) will allow me to store a lot of maps.

One interesting feature is the video recording capabilities. It can do 640*480 resolution at 30fps (which is pretty good) and 320*240 resolution at 120fps (they claim that you can film a balloon popping). In my quick tests the standard 640*480*30fps mode works well, but the 120fps mode requires much brighter light than most of my test environments, so I have not yet got it working properly.

The phone has a reasonable voice recording function, it can record considerably more than 34 hours of audio and the quality is reasonably good if you use an external microphone. It is however quite poor if you use the built-in microphone for a dictaphone function, it seems that quality is poor at any distance. I had wanted to record my LCA mini-conf talks with my phone but unfortunately forgot to bring the adapter for the external microphone. It’s a pity that the phone doesn’t have a standard microphone socket as I have misplaced my Viewty microphone, when designing the phone they should assume that misplacing attachments is a common occurrence and design it to use common parts.

I recently spoke to a journalist who uses his mobile phone to record interviews. He said that his phone supported phone calls, voice recording, and taking pictures – all the essential tasks for his work. It seems that the Viewty would be better than most phones for journalistic work apart from the issue of low quality voice recording when you have misplaced your external microphone.

The text editor is unfriendly in the keyboard mode (I have not tried hand-writing recognition), one thing I don’t like is the fact that the letters jump when you press them. This does allow changing a letter by moving the stylus before releasing the press (some people consider this a great feature). There are no cursor control keys (which is a serious omission), and the keyboard doesn’t resemble a real keyboard. My iPaQ is far better for writing (I once wrote a full-length magazine article on an iPaQ).

The stylus is quite strange and interesting. In the picture of the front of the phone the stylus is compacted with it’s lid on. In the picture of the back of the phone the stylus is extended with the lid off. The end of the stylus clips in to the lid so that removing the lid drags the central part out of the body. It’s an interesting design and with the string on the lid allows the stylus to be attached to the phone when it’s not being used. But I have never used it. Even with an iPaQ (which had a proper stylus that attached firmly inside the body of the device) I often used a fingernail on the touch screen. I have not felt the need to ever use a stylus with my Viewty.

One final noteworthy thing is the support for Google services. It seems to have client support for YouTube, Google Maps, GMail, and Blogger. This seems to be a major win for Google, the Viewty is one of the most popular phones at the moment and I expect that lots of people who buy them will now have an incentive to use the Google services. Between these sorts of deals and the Android I think that it will be necessary to have some sort of anti-trust action against Google. Google are generally doing good things for the users. I have been quite satisfied to use Google search, Google advertising on my blog, and Gmail. Also I have been moderately happy with Blogger (it was good when I started blogging) and Google Maps is useful on occasion. So generally I am happy with Google, but monopolies are bad for the users so I think that if things continue on their current trend then Google may have to be split into several little Googlets some time in the next few years.