For some time people have been telling me about the benefits of SpamAssassin (SA). I have installed it once for a client (at their demand and against my recommendation) but was not satisfied with the result (managing the spam folder was too complex for their users).
The typical configuration of SA has it run after mail has been accepted by the server. Messages that it regards as spam are put into a spam folder. This means that when someone phones you about some important message you didn’t receive then you have to check that folder. Someone who sends mail to a user who has such a SA configuration can not expect that the message will either be received or rejected (thus giving them a bounce message).
Even worse it seems to be quite common for technical users to train the Bayesian part of SA on messages from the spam folder – without reviewing them! Submitting a folder of spam that has been carefully reviewed for Bayesian training can increase the accuracy of classification (including taking account for locality and language differences in spam). Submitting a folder which is not reviewed means that when a false-positive gets into that folder (which will eventually happen) it is used as training for spam recognition thus increasing the incidence of false-positives!
Spam has been becoming more of a problem for me recently, on a typical day between 20 and 40 spam messages would get past the array of DNSBL services I use and be re-sent to pass the grey-listing. Also I have been receiving complaints from people who want to send email to me about some of the DNSBL and RHSBL services I use (the rfc-ignorant.org service gets a lot of complaints – there are a huge number of ignorant and lazy people running mail servers).
So now I have installed spamassassin-milter to have SA run during the SMTP protocol. Then if SA checks indicate that the message is SPAM my mail server can just reject the message with a 55x which will cause the sending mail server to generate a local bounce (if it’s a legitimate message) or to just be discard it in the case of a spam server. Here is how to set it up on Debian/Lenny and CentOS 5:
Install the package yum install spamass-milter or apt-get install spamass-milter spamassassin spamc (spamassassin seems to be installed by default on CentOS). On a Debian system the milter will be setup and running. On CentOS you have to run the following commands:
useradd -m -c "Spamassassin Milter" -s /bin/false spamass-milter
chown spamass-milter /var/run/spamass-milter
chmod 711 /var/run/spamass-milter
echo SOCKET="/var/run/spamass-milter/spamass.sock" >> /etc/sysconfig/spamass-milter
On CentOS edit /etc/init.d/spamass-milter and change the daemon start line to ‘runuser – spamass-milter -s /bin/bash -c "/usr/sbin/spamass-milter -p $SOCKET -f $EXTRA_FLAGS"‘ Then add the following lines below it:
chown postfix:postfix /var/run/spamass-milter/spamass.sock
chmod 660 /var/run/spamass-milter/spamass.sock
The spamass-milter program talks to the SpamAssassin daemon spamd.
On both Debian and CentOS run the command “useradd -c Spamassassin -m -s /bin/false spamassassin” to create an account for SA. The Debian bug #486914  has a request to have SA not run as root by default.
On CentOS it seems that SA wants to use a directory under the spamass-milter home directory, the following commands alllow this. It would be good to have it not do that, or maybe it would be better to have the one Unix account used for SA and the milter.
chmod 711 ~spamass-milter
chown spamassassin ~spamassassin/.spamassassin
On Debian edit the file /etc/default/spamassassin and add “-u spamassassin -g spamassassin” to the OPTIONS line. On CentOS edit the file /etc/sysconfig/spamassassin and add “-u spamassassin -g spamassassin” to the SPAMDOPTIONS line.
To enable the daemons, on CentOS you need to run “chkconfig spamass-milter on ; chkconfig spamassassin on“, on Debian edit the file /etc/default/spamassassin and set ENABLED=1.
Now start the daemons, on CentOS use the command “service spamassassin start ; service spamass-milter start“, on Debian use the command “/etc/init.d/spamassassin start“.
Now you have to edit the mail server configuration, for Postfix on CentOS the command “postconf -e smtpd_milters=unix:/var/run/spamass-milter/spamass.sock” will do it, for Postfix on Debian the command “postconf -e smtpd_milters=unix:/var/spool/postfix/spamass/spamass.sock” will do it.
Now restart Postfix and it should be working.
For correct operation you need to ensure that the score needed for a bounce is specified as the same number in both the spamass-milter and SA configuration. If you have a lower number for the spamass-milter configuration (as is the default in Debian) then bounces can be generated – you should never generate a bounce for a spam. The config file /etc/default/spamass-milter allows you to specify the score for rejecting mail, I am currently using a score of 5. Any changes to the score need matching changes to /etc/mail/spamassassin/local.cf (which has a default required_score of 5 in Debian).
You can grep for “spamd..result..Y” in your mail log to see entries for messages that were rejected.
One problem that I have with this configuration on Debian (not on CentOS) is that spamd is logging messages such as “spamd: handle_user unable to find user: ‘russell’“. I don’t want it to look for ~russell when processing mail for firstname.lastname@example.org because I have a virtual domain set up and the delivery mailbox has a different name. Ideally I could configure it to know the mapping between users and mailboxes (maybe by parsing the /etc/postfix/virtual.db file). But having it simply not attempt to access per-user configuration would be good too. Any suggestions would be appreciated.
Now that I have SpamAssassin running it seems that I am getting about 5 spams a day, the difference is significant. The next thing I will do is make some of the DNSBL checks that are prone to false-positives become SpamAssassin scores instead.
When I started writing this post I was not planning to compare the sys-admin experiences of CentOS and Debian. But it does seem that there is less work involved in the task of installing Debian packages.