A significant problem with the old-fashioned media is that as a general rule they don’t cite references for anything. Some of the better TV documentaries and non-fiction books cite references, but this is the exception not the norm. Often documentaries only cite references in DVD extras which are good for the people who like the documentary enough to buy it but not for people who want to rebut it (few people will pay for a resource if they doubt the truth and accuracy of it’s claims).
I can understand newspapers not wanting to publish much in the way of background information in the paper version as every extra line of text in an article is a line of advertising that they can’t sell. So they have financial pressure to produce less content, and the number of people like me who want to check the facts and figures used in articles is probably a small portion of the readership. Another issue with newspapers is that they are often considered as primary authoritative sources (by themselves and by the readers). It is often the case that journalists will interview people who have first-hand knowledge of an issue and the resulting article will be authoritative and a primary source in which case all they need to do is to note that they interviewed the subject. However the majority of articles published will be sourced from elsewhere (news agencies [ http://en.wikipedia.org/wiki/News_agency ] such as Reuters are commonly used). Also articles will often be written based on press releases – it is very interesting to read press releases and see how little work is done by some media outlets to convert them to articles, through a well written press release a corporation or interest group can almost write it’s own articles for publication in the old media.
One way of partially addressing the problem of citing references in old media would be to create a web site of references, then every article could have a URL that is a permanent link to the references and calculations to support the claims and numbers used. Such a URL could be produced by any blogging software, and a blog would be an ideal way of doing this.
For bloggers however it’s much easier to cite references and readers have much higher expectations of links to other sites to support claims and of mathematical calculations shown to indicate how numbers are determined. But there is still room for improvement. Here are some of the most common mistakes that I see in posts by people who are trying to do the right thing:
Indirect links. When you refer to a site you want to refer to it directly. In email (which is generally considered a transient medium) a service such as TinyURL [ www.TinyURL.com ] can be used to create short URLs to refer to pages that have long URLs. This is really good for email as there are occasions when people will want to write the address down and type it in to another computer. For blogging you should assume that your reader has access to browse the web (which is the case most of the time). Another possibility is to have the textual description of a link include a reference to the TinyURL service but to have the HREF refer to the real address. Any service on the net may potentially go away at some future time. Any service on the net may have transient outages, and any reader of your blog may have routing problems that make parts of the net unavailable to them. If accessing a reference requires using TinyURL (or a similar service) as well as the target site then there are two potential things that might break and prevent your readers from accessing it.
One situation where indirect links are acceptable is for the printed version. So you could have a link in the HTML code for readers to click on to get to the reference page directly and a TinuURL link for people who have a printed version and need to type it in.
Also when linking to a blog it’s worth considering the fact that a track-back won’t work via TinyURL and track-backs may help you get more readers…
Links that expire. For example never say “there’s a good article on the front page of X” (where X is a blog or news site). Instead say “here’s a link to a good article which happens to be on the front page now” so that someone who reads your post in a couple of years time can see the article that you reference.
Another problem is links to transient data. For example if you want to comment on the features of a 2007 model car you should try to avoid linking to the car manufacturer page, next year they will release a new car and delete the old data from their site.
A potential problem related to this is the Google cache pages which translate PDF to HTML and high-light relevant terms and can make it much easier to extract certain information from web pages. It can provide value to readers to use such links but AFAIK there is no guarantee that they will remain forever. I suggest that if you use them you should also provide the authoritative link so that if the Google link breaks at some future time then the reader will still be able to access the data.
Not giving the URLs of links in human readable form. Print-outs of blog pages will lose links and blog reading by email will also generally lose links (although it would be possible to preserve them). This counts for a small part of your readership but there’s no reason not to support their needs by also including links as text (either in the body or at the end of the post). I suggest including the URL in brackets, the most important thing is that no non-URL text touch the ends of the URL (don’t have it in quotes and have the brackets spaced from it). Email clients can generally launch a web browser if the URL is clear. Note that prior to writing this post I have done badly in this regard, while thinking about the best advice for others I realised that my own blogging needed some improvement.
I am not certain that the practice I am testing in this post of citing URLs inline will work. Let me know what you think via comments, I may change to numbering the citations and providing a list of links in the footer.
Non-specific links. For example saying “Russell Coker wrote a good post about the SE Linux” and referring to my main blog URL is not very helpful to your readers as I have written many posts on that topic and plan to write many more (and there is a chance that some of my future posts on that topic may not meet your criteria of being “good”). Saying “here is a link to a good post by Russell Coker, his main blog URL is here” is more useful, it gives both the specific link (indicating which post you were referring to) and the general information (for people who aren’t able to find it themselves, for the case of deleted/renamed posts, and for Google). The ideal form would be “<a href=”http://etbe.coker.com.au/whatever”>here is a link to a good post by Russell Coker [ http://etbe.coker.com.au/whatever ]</A>, his main blog URL is <a href=”http://etbe.coker.com.au/”> [ http://etbe.coker.com.au ]</A>” (note that this is an example of HTML code as a guide for people who are writing their own HTML, people who use so-called WYSIWYG editors will need to do something different).
Links that are likely to expire. As a rule of thumb if a link is not human readable then the chance of it remaining long-term is low. Companies with content management systems are notorious for breaking links.
Referencing data that you can’t find. If you use data sourced from a web site and the site owner takes it down then you may be left with no evidence to support your assertions. If data is likely to be removed then you should keep a private copy off-line (online might be an infringement of copyright) for future reference. It won’t let you publish the original data but will at least let you discuss it with readers.
Referencing non-public data. The Open Access movement [ http://en.wikipedia.org/wiki/Open_access ] aims to make scholarly material free for unrestricted access. If you cite papers that are not open access then you deny your readers the ability to verify your claims and also encourage the companies that deny access to research papers.
An insidious problem is with web sites such as the New York Times [ www.nytimes.com ] which need a login and store cookies. As I have logged in to their site at some time in the past I get immediate access to all their articles. But if I reference them in a blog post many readers will be forced to register (some readers will object to this). With the NYT this isn’t such a problem as it’s free to register so anyone who is really interested can do so (with a fake name if they wish). But I still have to keep thinking about the readers for such sites.
I should probably preview my blog posts from a different account without such cookies.
Failing to provide calculations. My current procedure is to include the maths in my post, for example if you have a 32bit data type used to store a number of milliseconds then it can store 2^32/1000 seconds which is 2^32/1000/60/60/24 = 49.7 days, in this example you can determine with little guessing what each of the numbers represent. For more complex calculations an appendix could be used. A common feature of blogs is the ability to have a partial post sent to the RSS feed and the user has the ability to determine where the post gets cut. So you could cut the post before the calculations, the people who want to see them will find it’s only one click away, and the people who are happy to trust you will have a shorter post.
Linking with little reason. Having a random word appear highlighted with an underline in a blog post is often not very helpful for a reader. It sometimes works for Wikipedia links where you expect that most readers will know what the word means but you want to link to a reference for the few who don’t (my link for the word Wikipedia is an example). In the case where most readers are expected to know what you are referring to then citing the link fully (with a description of the link and a human-readable form for an email client) is overkill and reduces the readability of the text.
The blogging style of “see here and here for examples” does not work via email and does not explain why a reader should visit the sites. If you want to include random links in a post then having a section at the footer of related links would probably be best.
Linking to a URL as received. Many bloggers paste URLs from Google, email, and RSS feeds into their blog posts. This is a bad idea because it might miss redirection to a different site. If a Google search or an email gives you a URL that is about to go away then it might redirect to a different site. In that case citing the new URL instead of the old one is a service to your readers and will decrease the number of dead-links in your blog over the long-term. Also using services such as www.feedburner.com may cause redirects that you want to avoid when citing a blog post, see my previous post about Feedburner [ http://etbe.coker.com.au/2007/08/20/feedburner-item-link-clicks/ ].
Here are some less common problems in citing posts:
Inappropriately citing yourself. Obviously if there is a topic that you frequently blog about then there will be benefit to linking to old posts instead of covering all the background material, and as long as you don’t go overboard there should not be any problems (links to your own blog are assumed to have the same author so there is no need for a disclaimer). If you write authoritative content on a topic that is published elsewhere then you will probably want to blog about it (and your readers will be interested). But you must mention your involvement to avoid giving the impression that you are trying to mislead anyone. This is particularly important if you are part of a group that prepares a document, your name may not end up on the list of authors but you have a duty to your readers to declare this.
Any document that you helped prepare can not be used by itself as a support of claims that you make in a blog post. You can certainly say “I have previously demonstrated how to solve this problem, see the following reference”. But links with comments such as “here is an example of why X is true” are generally interpreted to be partly to demonstrate the popular support for an idea.
Citing secret data. The argument “if you knew what I know then you would agree with me” usually won’t be accepted well. There are of course various levels of secrecy that are appropriate. For example offering career advice without providing details of how much money you have earned (evidence of one aspect of career success) is acceptable as the readers understand the desire for some degree of financial secrecy (and of course in any game a coach doesn’t need to be a good player). Arguing the case for a war based on secret data (as many bloggers did) is not acceptable (IMHO), neither is arguing the case for the use of a technology without explaining the science or maths behind it.
Not reading the context of a source. For example I was reading the blog of a well regarded expert in an area of computer science, and he linked to another blog to support one of his claims. I read the blog in question (more than just the post he cited) and found some content that could be considered to be racially offensive and much of the material that I read contained claims that were not adequately supported by facts or logic. I find it difficult to believe that the expert in question (for whom I have a great deal of respect) even casually inspected the site in question. In future I will pay less attention to his posts because of this. I expect a blogger to pay more attention to the quality of their links than I do as a reader of their blog.
While writing this post I realised that my own blogging can be improved in this regard. Many of my older posts don’t adequately cite references. If you believe that any of my future posts fail in this regard then please let me know.