I often want to write blog posts about HTML code and about source code in various languages. One problem I have is that the characters I want to use have special meanings (EG < and >), another is that I indent source code to make it readable and I don’t want the spaces trimmed from the start of lines.
I initially wrote a simple Perl script to replace characters such as < with HTML codes. I then had to extend it to escaping quote characters because WordPress tries to get smart and change quotes in a way that might look nice when dealing with plain text, but is just a pain when dealing with code.
The next problem I had is that when I used the <PRE> tag around some text to preserve the white-space WordPress would double-space the text (IE insert a blank line between every two lines of code). This was annoying when reading it and in some situations would change the meaning of the code! The solution I have found to these problems is to use the below script and not use the <PRE> tag. Also I tried using the <CODE> tag, but it made no difference to the end result as far as I could see.
The below script is what I am currently using. It is working well with shell scripts, HTML, and XML so far.
Update: The way that -- is munged by WordPress to – is something that I find particularly annoying. I already had this in the script but forgot to mention it in the post.
s/ / /g;