Web Site Validation

Over the last few days I’ve got this blog and my documents blog to conform to valid XHTML according to the W3C validation service [1].

One significant change that I made was to use lower-case for HTML tags. For about 15 years I’ve been using capitals for tags to make them stand out from content and my blogs are the latest in a long line of web sites with that. Naturally I wasn’t going to correct 900 posts manually so I ran a series of SQL commands such as the following on my database server (where X is the WordPress table prefix):

update X_wp_posts set post_content = replace(post_content,'<PRE>','<pre>');

But make sure you have a good backup of your database before running SQL search and replace commands on your blog data.

After running such commands about 90% of my blog posts conformed, so I only needed to edit about 90 posts to correct things. This process gave some real benefits. One issue is that an apostrophe in a URL must be quoted, otherwise some browsers will link to the desired URL and some will link to a truncated URL. Fixing a couple of variations of this problem resulted in some broken links being fixed. Another issue is that you can’t have paragraphs (<p> tags) within list items, fixing this made some of my posts align correctly – it was a tricky fix, in some cases I had to use <br/> to break up text in a list item and sometimes I replaced lists with different sections delimited by <h3> headings (which apparently is rumored to give better SEO).

It would make a really nice WordPress feature to be able to do W3C validation as part of the publishing process, ideally an attempt to publish or schedule a post would result in a message saying “saved as a draft because it’s not valid XHTML” if the checks failed. The source to the W3C validation software is significantly larger than WordPress [2], but it seems to me that there are two main types of WordPress installations, small ones for personal use (which tend to be on fairly idle servers) and big ones that have so much traffic that the resource usage of validation would be nothing compared to the ongoing load.

As there seems to be no way of validating my posts before publication my best option is the W3C button I now have on my blog. This allows me to validate the page at a click so while I can’t entirely avoid the risk of publishing a post with invalid XHTML I can at least fix it rapidly enough that hardly anyone will notice.

It also seems like a useful feature to have aggregators like Venus [3] check for valid HTML and not display posts unless they are valid. It’s not a feature that could be enabled immediately (I’m sure that if you click on this link to the W3C validation service [1] from a Planet feed you will see lots of errors and warnings), but once bloggers have time to fix their installation it would allow preventing some of the common annoyances of Planet installations. It’s not uncommon on popular Planets to have unmatched tags in a post which results in significant amounts of the content being bold, underlined, in italics, or for the greatest annoyance struck-out. I know that this may be a controversial suggestion, but please consider why you are blogging – if you are blogging for the benefit of your readers (which seems to be the case for everyone other than sploggers) then it seems that the readers will benefit more by not having a broken post syndicated than they would benefit from having it syndicated and thus messing up the display of many following posts.

The next thing on my todo list in this regard is to do some tests of accessibility. The work that I have done to pass the XHTML validation tests has helped to some degree – if nothing else the images now all have alt= descriptions, but I expect that it will be a lot of work. The WordPress Codex has a page about accessibility, I haven’t read all of it yet [4].

Does anyone have any recommendations for free automated systems that check web sites for accessibility? What would be ideal is a service that allows different levels of warnings, so instead of trying to fix all problems at once I could start by quickly fixing the most serious problems on the most popular posts and finish the job at some later date.

3 comments to Web Site Validation

  • Do yourself a favour Russell and validate to HTML5. :-)

    XHTML has too many problems to mention.

  • Have a look at this:


    There are validation targets that use xmlstarlet and a
    standalone XHTML/1.1 DTD (all I need as everything of
    mine uses it) to validate. Of course, the W3C validator
    *may* do additional things, but this is a basic test I
    use before committing.

    There’s some magic, especially in the former (as it con-
    verts these *.inc files to XML first) but also in the
    latter. The reason for this is that xmlstarlet fails if
    there is an URI given in the doctype, which however must
    be used in the on-site copy. The www/data/Makefile is a
    snippet that can be used well for validating only HTML
    snippets instead of whole pages.


  • Planet aggregators do not have to do a full html validation; they only have to ensure tags are balanced within a post. It’s really inexcusible for such an aggregator not to do so.

    My own aggregator (ikiwiki) provides two ways to do it, either by using the HTML::TreeBuilder module to build a html parse tree and re-emitting with balanced tags, or by calling out to tidy(1), which will go all the way to generate valid html (or in rare cases, give up and declare the post unparseable).