wp-spamshield

Yesterday I installed the wp-spamshield plugin for WordPress [1]. It blocks automated comment spam systems by using JavaScript and cookies, apparently most spammers can’t handle that. Before I installed it I was getting hundreds of spam comments per day even with the block spam by math plugin enabled. Now I’ve had it running for 24 hours without any spam. The real advantage of this is that now when a legitimate comment gets flagged as spam I’ll notice it, previously I was deleting hundreds or thousands of comments at a time without reading them.

deb http://www.coker.com.au wheezy wordpress

The above repository has the wordpress-wp-spamshield package for Debian/Wheezy. I have no immediate plans for uploading it to Debian because the security support for WordPress plugins doesn’t fit in with the Debian model. I am prepared to negotiate about this if someone has good reasons for including it or any of the other WordPress plugins I’ve packages.

My packaging work is under the GPL (of course) so any DD who disagrees with me could just rebuild the package and upload it. Within Debian there is no rule taking another DD’s GPL’d code that they decided not to upload and then uploading it. There is a consensus that such things are not appropriate without permission, but anyone who wishes can take this blog post as permission.

Blog Comments

John Scalzi wrote an insightful post about the utility of blog comments with the way the Internet works nowadays [1]. He starts out focusing on hate comments that could reasonably be described as terrorism (death threats with the aim of preventing people writing about politics meet any reasonable definition of “terrorism”). Terrorists on the Internet are a significant problem but it’s one that doesn’t get much attention as it generally only affects people who aren’t straight-acting white men.

Blogging About Technology

One corner case that John doesn’t seem to consider is that of writing about technology. Issues related to programming often aren’t related to politics and are often testable so comments will be based on things that have been shown to work rather than stuff people invent or want to believe. I’ve received many useful and educational comments on my technical posts with little hostility. Even getting a snarky comment is rare when writing a strictly technical blog post.

The comments problem for technology blogging is spam. I’ve been using the WordPress plugin Block Spam by Math [2] (which is obsolete but still works) for years. Initially it stopped almost all spam, but now I’m getting at least 20 spam comments a day.

Extremely Popular Blogs

The comments section of a blog is sometimes described as a “conversation”. When a blog post gets comments from less than 10 people it is possible for them to have something that resembles a conversation with the author that is of benefit to other readers and doesn’t take excessive amounts of time for the author. When a blog is very popular and every post gets comments from 50+ people it’s not really possible. So a traditional blog comment section seems to work best when the blog is primarily read by a small well connected group of people who sometimes comment and some casual readers who never comment (but sometimes find value in the comments of others).

Discussions of blog comment systems usually include a reference to a post written by someone who disabled comments on their blog and found it to be a good thing, it always seems that the person who writes such a post has a large and varied audience who’s comments would take a lot of time to moderate. John followed the usual form in this regard by linking to a reasonably popular SF author who would presumably have a lot of fans with good net access.

I’m not going to criticise anyone for disabling comments when their blog becomes really popular, but any advice that they have to offer about such things won’t apply to the vast majority of blogs. Due to the long-tail effect the small blogs would probably comprise the majority of all comments so in terms of the way the blog environment works I don’t think it makes much difference when the small minority of very popular blogs disable comments. The vast majority of blogs that I regularly read only have a small number of comments.

One thing that should be noted is that getting a lot of readers shouldn’t be the only factor for writing a successful blog. For example some of my blog posts about SE Linux are aimed at a small audience of Linux programmers and have an even smaller number of people who are qualified to comment. When I write a post that can only receive comments other than “please explain more because I don’t understand” from a few dozen people that doesn’t make it any less important. Sometimes the few dozen people who know a topic well need to work together to educate the few thousand who can implement the ideas for the benefit of millions of users of the software.

Disabling Comments on Contentious Posts

One interesting method John uses is to disable comments early when posting about contentious issues. It’s a general practice when running a blog to disable comments on posts after a certain period of time (3 months to 1 year seem to be common time limits for comments). This means that the moderators can concentrate on recent posts and not be bothered with spam bots hitting ancient posts as the interest in writing legitimate comments on an old post is vanishingly small. John has a practice of disabling comments after a couple of days when the comments start to lose quality.

No matter how contentious the issue is I’m not likely to get the 400+ comments that John gets. But the idea of closing comments quickly still has some merit for my blog and other blogs with less traffic.

Not Accepting Comments While Asleep

John has a practice of closing comments while he’s asleep to avoid allowing a troll to get 8 hours of viewing for a nasty comment. The most immediate down-side to that is that it inconveniences people who don’t want to wait 8 hours to comment and prioritises comments from people in the same time zone, this makes me think of Cory Doctorow’s novel Eastern Standard Tribe (which is available for free download and I highly recommend reading it) [3]. It seems that a better solution to that problem would be to have a team of moderators to watch things 24*7 which is what a lot of popular blogs that allow comments do. The WordPress capabilities model doesn’t support granting a user no special privileges other than moderating comments [4], as WordPress is the most popular self-hosted blog software this limits the possibilities for people moderating comments on other people’s blogs.

No variation of this would work for me. I have lots of things that require my ongoing attention and don’t want to add my blog to the list. If I have other things to work on for a few days I want to just not bother with my blog. This means that my blog needs to be able to run on autopilot for days at a time – however I do monitor my blog closely after publishing a post that is likely to attract nasty comments. One extra problem that I have is that the Android client for WordPress has problems in synchronising comments.

Using a Forum for Comments

Popular Planet installations such as Planet Debian and Planet Linux Australia syndicate more than a few blogs that have comments disabled. A forum installation for such a Planet would be useful to allow people to comment on all posts and also support bloggers who are thinking of disabling comments. While the use of a forum for blog comments has been proven to work well for Boing Boing forums have their own issues of spam and anti-social behavior.

Debian already has a forum [5], if a section of that was devoted to discussing blog posts from Planet Debian then it shouldn’t make much of an increase to the work of the forum administrators while providing a benefit to the community. Also if the Debian forum had such a section it would probably attract use from more Debian Developers, I would use that forum if it was a place to comment on blogs that don’t have a comment section and I might also comment on other forum discussions.

It would be good if there was a forum for discussing Linux in Australia. I’m not volunteering to run it but I would help out if someone else wants to be the main sysadmin and I can offer free hosting.

Creating WordPress Packages

deb http://www.coker.com.au wheezy wordpress

I maintain Debian packages of a number of WordPress themes and plugins for my personal use which I am not planning to upload to Debian due to the maintenance and security issues. Generally the way things work with WordPress packages (and apparently most things in PHP) is that new versions are released whenever the author feels like it with little documentation and often now way of determining whether it’s a security issue. When there is a security issue it’s often fixed in a new version that includes new features giving no good option for someone who was happy with the old functionality and just wants a secure system. This isn’t the way we like to do things in Debian.

The result of this is that I maintain a number of packages for my personal use (and for the benefit of any interested people on the Internet) that often get new updates. I’ve written the below script to create a new version of a Debian package. It searches my repository for the most recent .debian.tar.gz file for the package, applies that, runs dch -i to update the changelog, and then builds the package. So far this has only been tested on one package, I expect that I’ll have to put a sed command in there to cover the case where the zip file name doesn’t match what I want as the package name and I’ll probably find other bugs in future, but I think it’s good enough to publish now.

#!/bin/bash
set -e
REPOSITORY=/home/whatever
unzip $1
FILE=$(basename $1)
PACKAGE=$(echo $FILE | sed -e "s/\..*$//")
LEN=$(($(echo $PACKAGE | wc -c)+1))
VER=$(echo $FILE | cut -c ${LEN}-200 | sed -e s/.zip//)
DIRNAME=wordpress-${PACKAGE}-${VER}
mv $PACKAGE $DIRNAME
tar czf wordpress-${PACKAGE}_${VER}.orig.tar.gz ${DIRNAME}
cd $DIRNAME
tar xzf $(ls -tr ${REPOSITORY}/wordpress-${PACKAGE}_*.debian.tar.gz | tail -1)
dch -i
dpkg-buildpackage

Any suggestions for improvement will be welcome, I don’t claim to be the world’s greatest shell scripter. But please note that I generally aim to write shell scripts that can be understood by people who aren’t experts. So if you can replace the program with a single line of Perl I will be impressed but I won’t implement your solution.

Targeted Advertising

Don Marti has written another blog post about targeted advertising [1]. His main point is that when a company uses the most targeted adverts (such as Google advertising) everyone knows that they are paying a small number of cents per click and nothing for the people who don’t click. This compares to TV adverts which cost a lot of money and for which most viewers either leave the room or use fast-forward. Therefore using Google adverts doesn’t send a signal about the amount of money invested in the products. Don also cited an example of a company sponsoring an OK Go film clip, that was a great idea, it shows that the company can do expensive things which are also a bit creative and fans will thank them (watch all the OK Go videos on Youtube, they are great).

The next question is how else companies can advertise? One thing I’d really like to see is sponsorship of authors. Pick an author and pay them a salary with paid editorial services for releasing a book a year for free in HTML and ebook formats. Having a fixed salary is a significant benefit when it comes time to apply for a mortgage or plan a holiday and being able to freely distribute books would be a significant benefit for an author who hasn’t got a large fan base.

In the computer industry it seems that there’s a lot of potential for sponsoring people who produce free things. That ranges from free software and designs for free hardware to blog posts and documentation. Five years ago Sun had a blogging contest and my friend Dave Hall won a server that was worth $21K [2]. It would be nice if some other companies started doing similar things and if Sun did a repeat so some other people I like could get some free kit.

Guest/Link Post Spam

I’ve been getting a lot of spam recently from people wanting to write guest posts or have their site included in a future links post.

Guest Posts

For guest posts the social convention for the planets which aggregate my blog seems to be that random guest posts are unacceptable. I could change my blog feed to have some posts excluded from the planet feeds but that’s too much effort – and I don’t want random guest posts anyway.

The only situations in which I will accept guest posts are when someone is writing what I might have written (IE they generally agree with me) or if they are a member of the free software community who doesn’t have their own blog but has something relevant to say. Any applicant for a guest post who runs a business that is useless and/or evil IMHO (EG anything related to the TSA) is going to get rejected firmly. Any applicant who tells me that they can “write on a wide variety of topics” probably isn’t capable of making me an offer that I would accept. Tell me that you can write about Linux programming or computer security and I’ll be interested, but you have to provide links to your previous work.

An application to write a guest post that starts with something like “I liked your recent post at the above URL but I think you missed some important points, as I have some experience in that field I think I could write a guest post that would help educate your readers” may be accepted. Probably the best thing to do however is to write comments on my blog, if you can write informative comments and offer to make a longer comment into a guest post then there’s a good chance that I’ll be interested.

Generally though I will only offer the opportunity to write a guest post to someone who writes something really informative in private email or on a closed mailing list. If you can solve a technical problem about Linux that has me stumped then I will almost certainly be willing to accept a guest post about it!

Links Posts

There are many sites which consist of nothing but links to other sites. In the 90’s such sites were really useful but since the rise of Google their value has declined dramatically. As an aside when I ran an Internet Cafe in the 90’s I had every web browser start with the main page of my “Hot Stuff” list, the purpose of this was to increase the hit rate of my Squid cache by having most of the customers visit the same sites. Even back then I probably wouldn’t have bothered if it wasn’t for Squid hits.

It must seem like an easy way to make money to create pages of links to articles with short summaries and then hope for advertising revenue, a domain sale, or the launch of some sort of profitable business once people start using the site. I wonder whether that ever works out for people. It doesn’t seem like a good business model, doing something that requires little skill, which can be done by almost anyone, and which is done by many people. Maybe there are sweat-shops dedicated to this.

Anyway my links posts are unlikely to ever link to any such links pages, I don’t think that the people who read my blog want double indirection. I won’t entirely rule out linking to a links page, but it would have to be of very high quality and related to something very technical about computers. Basically anyone who reads this should give up on the idea of submitting a links page to me, if it’s good enough to make me break my policy of not linking to such things then I’ll probably find it myself.

Conclusion

As a general rule if you want someone to publish your work then you need to look at what they are publishing and make sure that your work fits. With the recent requests for links and guest posts I’ve been getting I have to wonder whether the people making the requests have even read my blog. That method of operating is unlikely to give any success at a blog that has any reasonable number of readers.

I’ll probably link to this from my about page or something. It might discourage some of the spammers.

WordPress Maintainability

For a while I’ve been maintaining my own WordPress packages. I use quite a few plugins that weren’t included in Debian, some of them have unclear licenses so they can’t go in Debian while the rest would have to go in Volatile at best because they update regularly and often have little or no information in the changelog to describe the reason for the update – so we have to assume there is a potential security issue and update it reasonably quickly. As I’m maintaining plugin packages it seems most reasonable to keep maintaining my own packages of WordPress itself which I started doing some time ago then the version in Debian became outdated.

Now WordPress isn’t a convenient package to maintain, the design of it is that a user will upload it to their web space via FTP or whatever, it’s not designed to be managed by a packaging system with the option of rolling back upgrades that don’t work, tracking dependencies, etc. One example of this is the fact that it comes with a couple of plugins included in the package, of which Akismet is widely used. The Akismet package is periodically updated asynchronously from the updates to the WordPress package with the apparent expectation that you can just FTP the files. Of course I have to build a new WordPress package whenever Akismet is changed.

Now there is a new default theme for WordPress called TwentyTen [1]. This theme ships with WordPress and again has updates asynchronously. Just over a week ago my blog prompting me for an update to the theme even though I hadn’t consciously installed it – I have to update because I don’t know whether one of the other users on the same system has chosen it and because having a message about an update being required is annoying.

The Themes update page has no option for visiting the web site for the theme and only offered to send it to my server via FTP or SFTP, of course I’m not going to give WordPress access to change it’s own PHP files (and thus allow a trojan to be installed). So I had to do some Google searching to find the download page for TewntyTen – which happens to not be in the first few results from a Google search (even though those pages look like they should have a link to it and thus waste the time of anyone who just wants to download it).

After downloading the theme I had to build a new WordPress package containing it – I could have split it out into a separate package and have the WordPress package depend on it, but I’ve got enough little WordPress packages already. It doesn’t seem worth-while to put too much effort into my private repository of WordPress packages that possibly aren’t used by anyone other than me.

Plugins aren’t as bad, the list of plugins gives you a link to the main web page for each plugin which allows you to download it.

I wonder what portion of the WordPress user-base install via FTP to a server that they don’t understand and what portion of them use servers that are maintained properly with a packaging system, my guess is that with the possible exception of WordPress.com most bloggers are running on packaged code. It seems to me that optimising for Debian and CentOS is the smart thing to do for anyone who is developing a web service nowadays. That includes files managed by the packaging system, an option to downgrade (as well as upgrade) the database format (which changes with almost every release), and an option for upgrading the database from the command-line (so it can be done once for dozens or hundreds of users).

deb http://www.coker.com.au lenny wordpress

I have a repository of WordPress packages that anyone can use with the above APT sources.list line. There is no reason why they shouldn’t work with Testing or Unstable (the packaging process mostly involves copying PHP files to the correct locations) but I only test them on Lenny.

Link Within

Good Things about LinkWithin

For the last 10 weeks I’ve been using the LinkWithin.com service to show links to other blog posts at the end of each post (the links are only shown to visitors of my blog not in the RSS feed, so people who read my posts through RSS syndication will miss this). The service shows excerpts of pictures from my blog at the bottom of each post to entice readers into reading other posts.

When you click on a LinkWithin icon on my blog you visit a LinkWithin page that redirects you back to my page, so people who use that show up as new visitors being referred by LinkWithin. Currently this month LinkWithin is the fourth highest referrer to my blog, below Google, Reddit, and WebWombat.com.au – so it is clearly doing some good in enticing people to read other posts that they might not otherwise read!

Bad Things about LinkWithin

The first problem with LinkWithin is that the WordPress plugin was written by people who don’t know much about WordPress. Unlike every other plugin I use it doesn’t support configuration options for the user in the database but instead has it hard coded in the PHP code! When you download a plugin from their web site it creates a custom zip that includes the PHP code automatically generated just for you! I have some friends using the same web server as me for running blogs, and I have to tell them “you can install any plugin apart from LinkWithin – it works for no-one but me”.

The next problem is that LinkWithin advertises that it will display related posts, it seems to be doing a poor job of that on my blog although admittedly only a minority of my blog posts have pictures so this limits what it has to work with. This has however inspired me to use more pictures in my posts.

Another problem is that it produces web pages that are not valid XHTML, the following patch fixes this.

--- /tmp/linkwithin.php        2010-02-22 02:09:47.000000000 +0000
+++ ./wp-content/plugins/linkwithin/linkwithin.php        2010-02-22 02:17:29.000000000 +0000
@@ -15,7 +15,7 @@
        global $post, $wp_query, $linkwithin_code_start, $linkwithin_code_end;

        $permalink = get_permalink($post->ID);
–        $content .= '<div class="linkwithin_hook" id="'.$permalink.'"></div>';
+        $content .= '<div class="linkwithin_hook" id="'.str_replace('/','-',$permalink).'"></div>';
        $content = linkwithin_add_code($content);
    }
    return $content;
@@ -26,13 +26,13 @@
        global $post, $wp_query, $linkwithin_code_start, $linkwithin_code_end;

        if ($wp_query->current_post + 1 == $wp_query->post_count) {
–            $embed_code = '<script>
+            $embed_code = '<script type="text/javascript">
<!-- //LinkWithinCodeStart
var linkwithin_site_id = 151382;
var linkwithin_div_class = "linkwithin_hook";
//LinkWithinCodeEnd -->
</script>
-<script src="http://www.linkwithin.com/widget.js"></script>
+<script src="http://www.linkwithin.com/widget.js" type="text/javascript"></script>
<a href="http://www.linkwithin.com/"><img src="http://www.linkwithin.com/pixel.png" alt="Related Posts with Thumbnails" style="border: 0" /></a>';
            $content .= $embed_code;
        }

Finally LinkWithin shows up badly on the FireBug analysis (see my previous post about using Firebug to speed up my blog) [1] – see the below picture for details. As an aside, given that Google recommend Firebug it is rather ironic that Google Adsense related URLs cover the majority of the Firebug issues that are not caused by LinkWithin.

demonstration of how Linkwithin lowers my page speed

What Next?

I’ve sent email to the LinkWithin people about all these issues other than the FireBug problem reports, given that they haven’t responded to some suggestions for over 10 weeks it seems hardly worth my effort in informing them of other issues.

I’m thinking of trying OutBrain.com again. 18 monthe ago I tried OutBrain but never got it working due to technical issues and then forgot about it. It has some similar features and may work better – at least it has tech support people who respond to queries!

Web Server Performance

We Have to Make Our Servers Faster

Google have just announced that they have made site speed part of their ranking criteria for search results [1]. This means that we now need to put a lot of effort into making our servers run faster.

I’ve just been using the Page Speed Firefox Plugin [2] (which incidentally requires the Firebug Firefox Plugin [3]) to test my blog.

Image Size

One thing that Page Speed recommends is to specify the width and height of images in the img tag so the browser doesn’t have to change the layout of the window every time it loads a picture. The following script generates the HTML that I’m now using for my blog posts. I run “BASE=http://www.coker.com.au/blogpics/2010 jpeg.sh foo.jpg bar.jpg” and it generates HTML code that merely needs the data for the alt tag to be added. Note that this script relies on a scheme where there are files like foo-big.jpg that have maximum resolution and foo.jpg which has the small version. Anyone with some shell coding skills can change this of course, but I expect that some people will change the naming scheme that they use for new pictures.

#!/bin/bash
set -e
while [ "$1" != "" ]; do
  RES=$(identify $1|cut -f3 -d\ )
  WIDTH=$(echo $RES|cut -f1 -dx)px
  HEIGHT=$(echo $RES|cut -f2 -dx)px
  BIG=$(echo $1 | sed -e s/.jpg/-big.jpg/)
  echo "<a href=\"$BASE/$BIG\"><img src=\"$BASE/$1\" width=\"$WIDTH\" height=\"$HEIGHT\" alt=\"\" /></a>"
  shift
done

Thanks to Brett Pemberton for the tip about using identify from imagemagick to discover the resolution.

Apache and Cache Expiry

Page Speed complained that my static URLs didn’t specify a cache expiry time, this didn’t affect things for my own system as my Squid server forcibly caches some things without being told to but would be a problem for some others. I first ran the command “a2enmod expires ; a2enmod headers” to configure my web server to use the expires and headers Apache modules. Then I created a file named /etc/apache2/conf.d/expires with the following contents:

ExpiresActive On
ExpiresDefault "access plus 1 day"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType text/css "access plus 1 day"
# Set up caching on media files for 1 year (forever?)
<FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$">
ExpiresDefault "access plus 1 year"
Header append Cache-Control "public"
</FilesMatch>
# Set up caching on media files for 1 month
<FilesMatch "\.(gif|jpg|jpeg|png|swf)$">
ExpiresDefault "access plus 1 month"
Header append Cache-Control "public"
</FilesMatch>

DNS Lookups

Page Speed complains about DNS names that are used for only one URL. One example of this was the Octofinder service [4], it’s a service to find blogs based on tags, but I don’t seem to get any traffic from it so I just turned it off. In this case it was the only sensible thing to do to have a single URL from their web site, but I had been considering removing the Octofinder link for a while anyway. As an aside I will be interested to see if there are comments from anyone who has found Octofinder to be useful.

I’ve also disabled the widget that used to display my score from Technorati.com, it wasn’t doing what it used to do, the facility of allowing someone to list my blog as a favorite didn’t seem to provide any benefit, and it was taking extra DNS lookups and data transfers. I might put something from Technorati on my blog again in future as they used to be useful.

Cookies

If you have static content (such as images) on a server that uses cookies then the cookie data is sent with every request. This requires transferring more data and breaks caching. So I modified the style-sheet for my theme to reference icons on a different web server, this will supposedly save about 4K of data transfer for a page load while also giving better caching.

The down-side of this is that I have my static content on a different virtual server so now updating my WordPress theme will require updating two servers, this isn’t a problem for the theme (which doesn’t get updated often) but will be a problem if I do it with plugins.

Conclusion

The end result is that my blog now gets a rating of 95% for Page Speed when previously it got a rating of 82%. Now most of the top references that are flagged by Page Speed come from Google, although there is still work for me to do.

Also it seems that Australia is now generally unsuitable for hosting web sites for viewing in other countries. I will advise all my clients who do International business to consider hosting in the US or the EU.

Why Comments?

Russ Albery has described why he doesn’t support comments on his blog [1].

I respect his opinion and I’m not going to try and convince him to do otherwise. But I think it’s worth describing why I want comments on my blog and feel that they are worth having for many (possibly most) other blogs.

Types of Blog

The first thing to consider is the type of post on the blog. Some blogs are not well suited to comments. I have considered turning off comments on my documents blog [2] because it gets a small number of readers and is designed as reference material rather than something you might add to a public Planet feed or read every week as it has a small number of posts that are updated. So conversations in the blog comments are unlikely to happen. One thing that has made me keep comments open on my documents blog is the fact that I am using blog posts as the main reference pages for some of my projects and some people are using the comments facility for bug reports. I may make this the main bug reporting facility – I will delete the comments when I release a version of the software with the bugs fixed.

One particular corner case is a blog which has comments as a large part of it’s purpose. Some blogs have a regular “open thread” where anyone can comment about any topic, blogs which do such things have the owners act more like editors than writers. One example of this is the Making Light blog by Teresa and Patrick Nielsen Hayden [3] – who are both professional editors.

The next issue is the content of the post. If I was to create a separate blog for authoritative posts about SE Linux then there wouldn’t be much point in allowing comments, there are very few people who could correct me when I make a mistake and they would probably be just as happy to use email. When I write about issues where there is no provably correct answer (such as in this post) the input of random people on the net is more useful.

Another content issue is that of posts of a personal nature. Some people allow comments on most blog posts apart from when they announce some personal matter. I question the wisdom of blogging about any topic for which you would find comments intolerable, but if you are going to do so then turning off comments makes sense.

Finally there is the scale of the blog. If you don’t get enough readers to have a discussion in the comments then there is less benefit in having the facility turned on – the ratio of effort required to deal with spam to the benefit in comments isn’t good enough. In his FAQ about commenting [4] Russ claims that controlling spam “can take a tremendous amount of time or involve weird hoop-jumping required for commenters“. I have found the Block Spam by Math [5] WordPress plugin to be very effective in dealing with the spam, so for this blog it’s a clear benefit to allow comments. Since using that plugin my spam problem has decreased enough that I now allow comments on posts which are less than 1 year old – previously comments were closed after 90 days. The plugin is a little annoying but I changed the code to give an error message that describes the situation and prevents a comment from being lost so the readers don’t seem too unhappy.

The Purpose of Comments

Russ considers the purpose of comments to be “meaningfully addressed to the original post author or show intent to participate in a discussion“. That’s a reasonable opinion, but I believe that in most cases it’s best if comments are not addressed to the author of the post and are instead directed towards the general readers. I believe that participating in a discussion and helping random people who arrive as the result of a Google search are the main reasons for commenting. For my blog an average post will get viewed about 500 times a year and the popular posts get viewed more than 200 times per month, so when over the course of a year more than 1000 people read the comments on a post (which is probably common for one of my posts) then 99.9% of readers are not me and commentators might want to direct their comments accordingly. Of course a comment can be addressed at the blog author so the unknown audience can enjoy watching the discussion.

For some of my technical posts I don’t have time to respond to all comments. If I have developed a solution to a technical problem that is good enough I may not feel inclined to invest some extra work in developing an ideal solution. So when a reader suggests a better option I sometimes don’t test that out and therefore can’t respond to the comment. But the comment is still valuable to the 1000+ other people who read the comment section. So a commentator should not assume that I will always entirely read a comment on a technical matter.

Comment threads can end up being a little like mailing lists. I don’t think that general discussions really work well in comment threads and don’t aim for such things. But if a conversation starts then I think you might as well continue as long as it’s generally interesting.

Generally for most blogs I think that providing background information, supporting evidence, and occasionally evidence of errors is a major part of the purpose of blog comments. But entertainment is always welcome. I would be happy to see some poems in the comments section of technical posts, sometimes a Limerick or haiku could really help make a technical point.

Political blog posts can be a difficult area. Generally the people who feel inclined to write political blog posts or comment on them are not going to be convinced to entirely change course, but as there are many people who can’t seem to understand this fact a significant portion of the comments on political blog posts consist of different ways of saying “you’re wrong“. The solution to this is to moderate the comments aggressively, too many political blogs have comments sections that are all heat and no light. I’m happy for people to go off on tangents when commenting on my political posts or to suggest a compromise between my position and their preferred option. But my tolerance of comments that simply disagree is quite small. Generally I think that blogs which directly advocate a certain political position should have the comments moderated accordingly, people will read a site in the expectation of certain content and I believe that the comments should also meet that expectation to some degree. Comments on political posts can provide insights into different points of view and help discover compromise positions if moderated well.

How to provide Feedback

Russ advocates commenting to the blog author via email – it is now the only option he accepts. My observation is that the number of people who are prepared to comment via email (which generally involves giving away their identity) is vastly smaller than those who use Blog comment facilities. This means that you will miss some good comments. One of the most valuable commentators on my blog uses the name “Anonymous” and has not felt inclined to ever identify themself to me, I wouldn’t want to miss the input of that person and some of the other people who have useful things to say but who don’t want to identify themself. I have previously written about how not all opinions are equal and anonymous comments are given a lower weight [6]. That post inspired at least one blogger to configure their blog to refuse anonymous comments, it was not my intent to inspire such reactions (although they are logical actions based on a different opinion of the facts I presented). I believe that someone who is anonymous can gain authority by repeatedly producing quality work.

Another option is for people to write their own blog posts referencing the post in question. I don’t believe that my core reader base desires short posts so I won’t write a blog post unless I have something significant to say. I expect that many other people believe that the majority of their blog comments would not meet the level of quality that their readers expect from their posts (posts are expected to be more detailed and better researched than comments). As an aside forcing people to comment via blog posts will tend to increase your Technorati rating. :-#

A final option is for people to use services such as Twitter to provide short comments on posts. While Twitter is well designed for publishing short notes the problem with this is that it’s a different medium. There are many people who like reading and discussing blog posts but who don’t like Twitter and thus using a different service excludes them from the conversation.

For my blog I prefer comments for short responses and blog posts for the longer ones. If you write a blog post that references one of my posts then please enter a comment to inform me and the readers of my blog. Email is not preferred but anyone who wants to send me some is welcome to do so.

If this post inspires you to change your blog comment policy then please let me know. I would like to know whether I inspire people to allow or deny blog comments.

Citing Wikipedia

A meme that has been going around is that you can’t cite Wikipedia.

You can’t Cite Wikipedia Academically

Now it’s well known and generally agreed that you can’t cite Wikipedia for a scientific paper or other serious academic work. This makes sense firstly because Wikipedia changes, both in the short term (including vandalism) and in the long term (due to changes in technology, new archaeological discoveries, current events, etc). But you can link to a particular version of a Wikipedia page, you can just click on the history tab at the top of the screen and then click on the date of the version for which you want a direct permanent link.

The real reason for not linking to Wikipedia articles in academic publications is that you want to reference the original research not a report on it, which really makes sense. Of course the down-side is that you might reference some data that is in the middle of a 100 page report, in which case you might have to mention the page number as well. Also often the summary of the data you desire simply isn’t available anywhere else, someone might for example take some facts from 10 different pages of a government document and summarise them neatly in a single paragraph on Wikipedia. This isn’t a huge obstacle but just takes more time to create your own summary with references.

When Wikipedia is Suitable

The real issue however is how serious the document you are writing is and how much time you are prepared to spend on it. If I’m writing a message to a mailing list or a comment on a blog post then I probably won’t bother reading all the primary sources of Wikipedia pages, it would just waste too much of my time. Wikipedia is adequate for the vast majority of mailing list discussions.

If I’m discussing several choices for software with some colleagues we will probably start by reading the Wikipedia pages, if one option doesn’t appear to have the necessary features (according to Wikipedia) then we may ask the vendor if those features are really missing and if so whether they will be added in the next version – but we may decide that we don’t really need the features in question and modify our deployment plans. Many business decisions are made with incomplete data, time is money and there often isn’t time to do everything you want to do. Using Wikipedia as a primary source for business decisions is a way of trading off a little accuracy for a huge time saving. This is significantly better than the old fashioned approach of comparing products by reading their brochures – companies LIE in their advertising!

When writing blog posts the choice of whether to use Wikipedia as a reference depends on the point that you are trying to make and how serious the post is. If the post isn’t really serious or contentious or if the Wikipedia reference is for some facts that are not likely to be disputed then Wikipedia will probably do. For some posts a reference to a primary source will be better.

A blog post that references data that is behind a pay-wall (such as a significant portion of academic papers and news articles) is practically of less use than a post that cites Wikipedia. In most cases Wikipedia references free primary sources on the Internet (although it does sometimes refer to dead tree products and data that is behind a pay-wall). In the minority of cases where the primary references for a Wikipedia page are not available for free on the Internet there will be people searching for freely available references to replace the non-free ones. So if you refer to a Wikipedia page with non-free references a future reader might find that someone has added free references to it.

The Annoying People

One thing that often happens is that an Internet discussion contains no references for anything – it’s all just unsupported assertions. Then if anyone cites Wikipedia someone jumps in with “you can’t cite Wikipedia“. If you want to criticise Wikipedia references then please first start by criticising people who state opinions as fact and people who provide numbers without telling anyone where they came from! The Guinness Book of Records (now known as “Guinness World Records”) was devised as a reference to cite in debates in pubs [1]. It seems that most of the people who dismiss references to Wikipedia on the net would prefer that Internet debates have lower requirements for references than a pub debate.

When Wikipedia is cited in an online discussion it is usually a matter of one mouse click to check the references for the data in question. If Wikipedia happens to be wrong then anyone who cares can correct it. Saying “the Wikipedia page you cited had some transcription errors in copying data from primary sources and some of the other data was not attributed, I’ve corrected the numbers and noted that it contains original research” would be a very effective rebuttal to an argument that relies on data in Wikipedia. Saying “you can’t cite Wikipedia” means little, particularly if you happen to be strongly advocating an opposing position while not providing any references.

If one person cites an academic paper and someone else cites Wikipedia then it seems reasonable to assume that the academic paper is the better reference. But when it’s a choice between Wikipedia and no reference then surely Wikipedia should win! Also references to non-free data are not much good for supporting an argument, that’s really just unverified claims as far as most people can determine – therefore the issue becomes how much the person citing the non-free reference can be trusted to correctly understand and summarise the non-free data.

Also it has to be considered that not all primary sources are equal. Opinion pieces should be considered to have a fairly low value and while they are authoritative for representing the opinion of the person who wrote them they often prove little else – unless they happen to cite good references which brings them to the same level as Wikipedia. The main benefit for linking to opinion pieces is that it saves time typing and gives a better product for the readers – it’s sometimes easier to find someone else expressing an opinion well than to express it yourself.

So please, don’t criticise me for citing Wikipedia unless others in the discussion are citing better references. If most people are not citing any references or only citing opinion pieces then a Wikipedia page may be the best reference that is being provided!