4

WordPress and Thumbnails

I have just had a lot of trouble with Thumbnails on one of my blogs. It turned out that I had to install the package php5-gd and restart Apache before thumbnails would even be generated. The package php5-gd (or php4-gd) is “suggested” by the Debian WordPress package and it’s not a dependency, so the result of apt-get install wordpress will be that thumbnails won’t work.

I’ve filed Debian bug report 447492 [1] requesting that php5-gd be a dependency. Another slightly controversial issue is the fact that the MySQL server is not a dependency. I believe that it’s correct to suggest MySQL as the database server is commonly run on a different host and WordPress will clearly inform you if it can’t access the database.

An alternate way of solving this bug report would be to have WordPress give a warning such as “Thumbnails disabled due to lack of php-gd support” which would allow users to make requests of their sys-admins that can be easily granted.

Lazyweb Posts

A common practice in the blog space is to write posts that ask a question in the hope that someone else will answer it via a comment or a post. This is known as a “Lazyweb Post”.

It seems to me that the way of managing such posts could be improved with a little informal cooperation. From now on I plan to tag each Lazyweb post with a Lazyweb Tag, now any reader of my blog can with a single click see all the unanswered lazyweb posts that I have written (I will remove the tag once an adequate answer has been provided or I have discovered and documented the solution myself).

Almost all bloggers want to get more traffic to their blogs, the question is how to get traffic of the nature that you desire. Links from blogs that you like are a preferred source of traffic. If a blogger that you would like to receive a link from has a lazyweb tag or category then it provides a good list of ideas for post topics that will get you the links you desire. Such lists would also be good for determining what information is not generally available and which therefore can be used for the topics of original posts.

Such tags or categories should also be good for getting answers to lazyweb posts. I’ll start doing this and see how well it takes off.

2

Blog Copyright Infringement

I have previously written about some of my efforts to counter sploggers [1].

Since then I have had a particularly brazen splogger copy one of my posts entirely and claim to have written it. The only reason I noticed the copyright violation (my blog license is on my About Page [2]) was because the post in question linked to other posts of mine and I saw the links. I was offended by the flagrant violation of all aspects of copyright law (breaking the license and infringing my moral rights by not attributing me as the author) and by the fact that the splog in question was hosted by Dreamhost (who have offended me by refusing a DMCA take-down request). So I decided that merely issuing a DMCA take-down was not enough. I went through the splog and identified content copied from several major journals (including by a journalist I regard as a friend) as well as by one multi-national corporation – and I notified all the relevant people.

The splog in question deleted all it’s old content the next day, and immediately started copying new articles from other blogs. I have informed the people who appear to be copyright holders for some of the new articles…

I recommend that other people who deal with sploggers also go to the extra effort of notifying other victims. It’s usually quite easy to do you just select a random bit of text from the copied article and paste it into your favourite search engine – usually you get only a single result. Some of the splog posts are edited in small ways so the first search may fail – if so then you merely need to search for a second piece of text. If you only request that your own illegally copied material be taken down then the splogger still has a good business model. They can keep copying content in violation of the license, occasionally take a post down when they get caught, and both the splogger and the ISP continue to make money. If you notify other victims (many of which won’t have the skills to find the content themselves or the background knowledge sufficient to recognise the benefits in having it removed unless you explain it to them) then the splogger loses a lot of content at one go and the ISP will have a more difficult time claiming to be innocent of the process.

Also when you notify multi-national corporations you can expect that they have some decent lawyers and a budget assigned to such work. While I would be extremely unlikely to sue an ISP that repeatedly hosts unauthorised copies of my copyright materiel the same can’t be said for a corporation.

For more information on splogging see the Wikipedia entry [3].

4

Dreamhost and the DMCA

Dreamhost have refused my request (under the DMCA) to be correctly identified as the author of content copied from my blog. I am publishing this so that anyone else who deals with them will know what to expect. Also if someone wishes to sue Dreamhost in regard to content that they host this may help demonstrate a pattern of behaviour.

The situation is quite obviously the result of a broken script used by a splogger that doesn’t correctly match author names with articles. The fact that the official Dreamhost policy is to disregard the requirement that the author(s) of copyright material be correctly identified is reprehensible. It also seems likely to open them to the risk of legal action. If you know how to contact a director of Dreamhost then please give them a link to this post and explain the risks to them.

For anyone who wants the detail the messages are below.
Continue reading

I Am #40 in Don Marti’s List

Don Marti has written his own equivalent to Technorati based on links from blogs that he reads, and my blog comes in at #40 in the list (last place) [1].

Don does note the fact that such lists mean little and links to a post by Doc Searls [2] which makes the same point more strongly. But it’s still interesting to note.

Also if Don chose to release the Perl script in question (or host is as a cgi-bin script) to allow other people to make their own top 40 list then I’m sure that many people would appreciate it (I’ll write such a script myself and release it if he doesn’t). He notes when publishing his list that a blog may be included even if he never reads it. I believe that if a blog is highly ranked according to links from blogs that you read then it’s quite likely that you will be interested in reading it, it may be a blog that is opposed to your beliefs (for example I’m sure I’m not the only person in the Linux community to link to an official Microsoft blog) but if it gets enough links then it would be worth reading at least once (if only to discover why you don’t like it). While I can’t tell Don to read every blog on his top 40 list I’ll certainly read every blog on my top 40 list at least once when I create the list.

6

Blogging and Documents

It seems that the majority of blog traffic (at least in blogs I read) is time-based. It is personal-diary entry posts, references to current events, predictions about future events, bug reports, and other things that either become obsolete or for which it’s important to know the date. For such posts it makes sense to have the date be part of the Permalink URL, and in the normal course of events such posts will tend not to be updated after release.

Another type of blog traffic is posts that have ongoing reference value which will (ideally) be actively maintained to keep them current. For such posts it makes sense to have no date stamp in the Permalink – for example if I update a post about installing SE Linux on Etch once Lenny is released (a significant update) I don’t want people ignoring it when it comes up in search engines (or worse having search engines score it down) because the URL indicates that it was written some time before the release of Etch.

WordPress supports Pages as separate entities to Posts, and the names of Pages are direct links under the root of the WordPress installation. However there is no RSS feed for Pages (AFAIK – I may have missed something) and the WordPress themes often treat Pages differently (which may not be what you want for timeless posts). Also it is not unreasonable to have Pages and timeless posts.

I’m thinking of creating a separate WordPress installation for posts that I intend to manage for long periods of time with updates (such as documenting some aspects of software I have written). The management options for a blog server program provide significant benefits over editing HTML files. The other option would be to use a different CMS (a blog server being a sub-category of a CMS) to store such things.

What I want is a clear way of presenting the data with minimal effort from me (an advantage of WordPress for this is that I have already invested a significant amount of effort in learning how it works) and editing from remote sites (the offline blog editing tools that are just coming out is a positive point for using a blog server – particularly as I could use the same editor for blog posts and documents).

Any suggestions as to how to do this?

Then of course there’s the issue of how to syndicate this. For my document blog (for want of a better term) I am thinking of updating the time-stamp on a post every time I make a significant change. If you subscribe to the document feed than that would be because you want to receive new copies of the documents as they are edited. The other option would be to not change the time-stamp and then include the feed along with my regular blog feed (making two feeds be served as one is not a technical challenge). If I was to update the time stamps then I would have to write posts announcing the release of new documents.

Does anyone know of someone who writes essays or howto documents in a similar manner to Rick Moen [1] or Paul Graham [2] who also does daily blog posts? I’d like to see some examples of how others have solved these problems (if there are any).

4

Upgraded to WordPress 2.3

I just upgraded to WordPress 2.3. When using Konqueror (my favourite browser) the comment approval is slightly broken (when I tag a comment as spam it usually just turns red and doesn’t disappear from the main Comments tab) and I have to refresh that window more often than usual to make sure I got the result I desired. Also the Sidebar Widget editing is totally broken in Konqueror, I guess I’ll have to login with Firefox to get the new tags feature working.

Also I have got a few WordPress errors about the table “$table_prefix” . “post2cat” not existing. The table in question doesn’t exist in any of my blogs.

So far this is the worst WordPress upgrade experience I’ve had (I started on 2.0).

5

Blogger is Not for Serious Blogging

When I started blogging I used Blogger [1]. After some time I decided that it did not offer me the freedom I desired. I could not easily make changes (I could have created new themes, but it would have taken an unreasonable amount of work). I currently use WordPress, it’s still a lot of work to change themes, but at least it’s CSS and PHP coding which can be used for other things. Blogger offers no statistics on web use (I tried adding support for Google Analytics but couldn’t get it to work properly), what I want is Webalizer or something similar (which is easy to do when running your own server).

Blogger is a reasonable way of starting blogging, but if you use it then you want to make it easy to change to something that you own. Blogger has a feature of using a DNS name in a domain that you own for the name of your blog (which is much less obvious than it once was), I regret not using that feature as I still have my old posts on blogger and don’t want to break the links.

Blogger has in the past had problems with time-stamps on posts, when I used blogger I had some complaints that my posts were staying at the top of Planet listings for unreasonable amounts of time (I never tracked this down before switching to my own platform).

Hosting your own blog is not as difficult as you might expect (initially at least). It becomes difficult when you want to install lots of plug-ins, but then any blogging solution would be difficult if you want to do that. The WordPress [2] package in Debian works well and has good support for multiple WordPress blogs. There is a separate product named WordPress-MU [3] which is designed for people who want to run a service in competition with Blogger, some people recommend that you use WordPress-MU if you want to set up blogs for several people. I disagree. If you are setting up blogs for a small number of people then you can use the standard WordPress package and create a file named /etc/wordpress/config-whatever.example.com.php which contains the configuration for whatever.example.com and then create a new WordPress blog by using the web-based interface to do the rest. It would not be difficult to create the configuration file in question with an M4 script if you have a moderate number of blogs to host (maybe a hundred or so). I think that it’s only if you want to host thousands of blogs that you need the features of WordPress-MU. Note that MU is not as well packaged as the base WordPress and has some rough edges. Last time I tried to set up MU I was not successful.

This is not to say that WordPress is inherently the best program, there are many other free software blogging platforms out there. WordPress is the one that I use and am happy to recommend but if your requirements differ from mine then another platform may be better for you. I also suggest that WordPress be used as the base-line for comparing blogging software.

Blogger does not require significant resources. A virtual host with 256M of RAM should be more than adequate to run WordPress plus MySQL. Such virtual hosts are getting quite cheap nowadays, and one such host could easily be shared by a number of bloggers. My blog uses about 1.2G of data transfer per month. vpsland.com offers virtual hosts starting at 150G per month data transfer with 192M of RAM being the minimum. Prices start at $US15 per month. While I can’t compare vpsland.com to other virtual hosting providers (having never used any other such service) I can say that they work reasonably well and I have a client who is happy with them. So it seems that a minimal plan with vpsland.com would host 20 blogs with the same traffic as mine (with RAM being the limiting factor) and a slightly larger plan (with more RAM and more bandwidth) that costs $US30 or $US40 per month could handle 100 or more blogs that are similar to mine. If you get together with some friends and share a virtual server then blogging would not be expensive. Incidentally I had previously read a blog comment about people being hesitant to share servers with their friends (as they apparently would rather grant some unknown people at a faceless corporation the ability to snoop on them than people that they know). The advantage of a blog server in this regard is that everything is public anyway!

If you have good technical skills then I recommend using WordPress as your first blogging platform. If you find that you don’t like it for some reason then you can convert to another platform if you own the domain. If you are setting up a blog for a less technical user then WordPress is also a good choice. My sister uses WordPress, not that she made much of a choice (I had set up a Blogger account for her some time ago which she never used – I guess that could be considered as a choice to not use Blogger) but that I set up a WordPress blog for her and she seemed to like using it.

3

Multiple DNS Names

There are many situations where multiple DNS names for a single IP address that runs a single service are useful. One common example is with business web servers that have both www.example.com and example.com being active, so whichever a customer hits they will get the right content (the last thing you want is for a potential customer to make some trivial mistake and then give up).

Having both DNS names be equal and separate is common. One example of this is the way http://planet.ubuntulinux.org/ and http://planet.ubuntu.com/ both have the same content, it seems to me that planet.ubuntu.com is the more official name as the wiki for adding yourself to the Planet is wiki.ubuntu.com. Another example of this is the way http://planet.debian.org/ and http://planet.debian.net/ both have the same content. So far this month I have had 337 referrals to my blog from planet.debian.org and 147 from planet.debian.net. So even though I can’t find any official reason for preferring one over another the fact that more than 2/3 of the referrals from that planet come from the planet.debian.org address indicates that most people regard it as the canonical one.

In times past there was no problem with such things, it was quite routine to have web servers with multiple names and no-one cared about this (until of course one name went away and a portion of the user-base had broken links). Now there are three main problems with having two names visible:

  1. Confusion for users. When a post on thedebianuser.org referred to my post about Planet Ubuntu it used a different URL to the one I had used. I was briefly worried that I had missed half (or more) of the content by getting my links from the wrong blog – but it turned out that the same content was on both addresses.
  2. More confusing web stats for the people who run sites that are referenced (primarily the bloggers in the case of a Planet installation). This also means a lower ranking as the counts are split. In my Webalizer logs planet.debian.org is in position #5 and planet.debian.net is in position #14. If they were combined they would get position #3. One thing to keep in mind is that the number of hits that you get has some impact on the content. If someone sees repeated large amounts of traffic coming from planet.debian.org then they are likely to write more content that appeals to those users.
  3. Problems with sites that have strange security policies. Some bloggers configure their servers to only serve images if the referrer field in the HTTP protocol has an acceptable value (to prevent bandwidth theft by unethical people who link to their pictures). My approach to this problem is reactive (I rename the picture to break the links when it happens) because I have not had it happen often enough to do anything else. But I can understand why some people want to do more. If we assume that an increasing number of bloggers do this, it would be good to not make things difficult for them by having the smallest possible number of referrer URLs. It would suck for the readers to find that planet.debian.org has the pictures but planet.debian.net doesn’t.

The solution to this is simple, one name should redirect to the other. Having something like the following in the Apache virtual host configuration (or the .htaccess) file for the least preferred name should redirect all access to the other name.

RewriteCond %{REQUEST_URI} ^(.*$) [NC]
RewriteRule . http://planet.example.com/%1 [R=301,L]

In my posts last night I omitted the URLs for the Planet Searches from the email version (by not making them human readable). Here they are:

Categories for Best and Most Popular Posts

I have just added two new categories to my blog, one is for the most popular posts [1] (as indicated by the number of hits on the permalink pages). The other is for the best posts [2]. My criteria for adding a post to the best-posts list is that it provides some information that is new or some analysis that others do not appear to have performed, that it doesn’t get refuted by someone else (sometimes an idea seems good but someone points out a flaw), and that there is some level of interest in it from readers (based on page hits, comments, and links from other blogs).

Both of these categories may be added to posts some days or weeks after they are published. So adding the feeds for them to a syndication configuration might not be a good idea as they will always include posts that are old. I expect that a typical Planet configuration would never display posts from those feeds.

I suggest that other people consider adding similar categories to their blogs. It will allow readers who quickly browse your blog to see the posts that you regard as your best content and other bloggers in the same space to see what gets the most hits (which is worth-while if you don’t consider blogging to be a zero sum game).

I expect that someone will suggest that I only write posts that are eligible for the best-posts category. However this is one example of a post which I don’t consider to be eligible but which will still be useful to some people.