Archives

Categories

SpamAssassin During SMTP

For some time people have been telling me about the benefits of SpamAssassin (SA). I have installed it once for a client (at their demand and against my recommendation) but was not satisfied with the result (managing the spam folder was too complex for their users).

The typical configuration of SA has it run after mail has been accepted by the server. Messages that it regards as spam are put into a spam folder. This means that when someone phones you about some important message you didn’t receive then you have to check that folder. Someone who sends mail to a user who has such a SA configuration can not expect that the message will either be received or rejected (thus giving them a bounce message).

Even worse it seems to be quite common for technical users to train the Bayesian part of SA on messages from the spam folder – without reviewing them! Submitting a folder of spam that has been carefully reviewed for Bayesian training can increase the accuracy of classification (including taking account for locality and language differences in spam). Submitting a folder which is not reviewed means that when a false-positive gets into that folder (which will eventually happen) it is used as training for spam recognition thus increasing the incidence of false-positives!

Spam has been becoming more of a problem for me recently, on a typical day between 20 and 40 spam messages would get past the array of DNSBL services I use and be re-sent to pass the grey-listing. Also I have been receiving complaints from people who want to send email to me about some of the DNSBL and RHSBL services I use (the rfc-ignorant.org service gets a lot of complaints – there are a huge number of ignorant and lazy people running mail servers).

So now I have installed spamassassin-milter to have SA run during the SMTP protocol. Then if SA checks indicate that the message is SPAM my mail server can just reject the message with a 55x which will cause the sending mail server to generate a local bounce (if it’s a legitimate message) or to just be discard it in the case of a spam server. Here is how to set it up on Debian/Lenny and CentOS 5:

Install the package yum install spamass-milter or apt-get install spamass-milter spamassassin spamc (spamassassin seems to be installed by default on CentOS). On a Debian system the milter will be setup and running. On CentOS you have to run the following commands:
useradd -m -c "Spamassassin Milter" -s /bin/false spamass-milter
mkdir /var/run/spamass-milter
chown spamass-milter /var/run/spamass-milter
chmod 711 /var/run/spamass-milter
echo SOCKET="/var/run/spamass-milter/spamass.sock" >> /etc/sysconfig/spamass-milter

On CentOS edit /etc/init.d/spamass-milter and change the daemon start line to ‘runuser – spamass-milter -s /bin/bash -c "/usr/sbin/spamass-milter -p $SOCKET -f $EXTRA_FLAGS"‘ Then add the following lines below it:
chown postfix:postfix /var/run/spamass-milter/spamass.sock
chmod 660 /var/run/spamass-milter/spamass.sock

The spamass-milter program talks to the SpamAssassin daemon spamd.

On both Debian and CentOS run the command “useradd -c Spamassassin -m -s /bin/false spamassassin” to create an account for SA. The Debian bug #486914 [1] has a request to have SA not run as root by default.

On CentOS it seems that SA wants to use a directory under the spamass-milter home directory, the following commands alllow this. It would be good to have it not do that, or maybe it would be better to have the one Unix account used for SA and the milter.
chmod 711 ~spamass-milter
mkdir ~spamassassin/.spamassassin
chown spamassassin ~spamassassin/.spamassassin

On Debian edit the file /etc/default/spamassassin and add “-u spamassassin -g spamassassin” to the OPTIONS line. On CentOS edit the file /etc/sysconfig/spamassassin and add “-u spamassassin -g spamassassin” to the SPAMDOPTIONS line.

To enable the daemons, on CentOS you need to run “chkconfig spamass-milter on ; chkconfig spamassassin on“, on Debian edit the file /etc/default/spamassassin and set ENABLED=1.

Now start the daemons, on CentOS use the command “service spamassassin start ; service spamass-milter start“, on Debian use the command “/etc/init.d/spamassassin start“.

Now you have to edit the mail server configuration, for Postfix on CentOS the command “postconf -e smtpd_milters=unix:/var/run/spamass-milter/spamass.sock” will do it, for Postfix on Debian the command “postconf -e smtpd_milters=unix:/var/spool/postfix/spamass/spamass.sock” will do it.

Now restart Postfix and it should be working.

For correct operation you need to ensure that the score needed for a bounce is specified as the same number in both the spamass-milter and SA configuration. If you have a lower number for the spamass-milter configuration (as is the default in Debian) then bounces can be generated – you should never generate a bounce for a spam. The config file /etc/default/spamass-milter allows you to specify the score for rejecting mail, I am currently using a score of 5. Any changes to the score need matching changes to /etc/mail/spamassassin/local.cf (which has a default required_score of 5 in Debian).

You can grep for “spamd..result..Y” in your mail log to see entries for messages that were rejected.

One problem that I have with this configuration on Debian (not on CentOS) is that spamd is logging messages such as “spamd: handle_user unable to find user: ‘russell’“. I don’t want it to look for ~russell when processing mail for russell@coker.com.au because I have a virtual domain set up and the delivery mailbox has a different name. Ideally I could configure it to know the mapping between users and mailboxes (maybe by parsing the /etc/postfix/virtual.db file). But having it simply not attempt to access per-user configuration would be good too. Any suggestions would be appreciated.

Now that I have SpamAssassin running it seems that I am getting about 5 spams a day, the difference is significant. The next thing I will do is make some of the DNSBL checks that are prone to false-positives become SpamAssassin scores instead.

When I started writing this post I was not planning to compare the sys-admin experiences of CentOS and Debian. But it does seem that there is less work involved in the task of installing Debian packages.

4 comments to SpamAssassin During SMTP

  • Did you try policyd-weight yet?

    Content filtering will always be error prone as content alone does not a spam email make.

  • etbe

    http://www.policyd-weight.org/
    Simon, the above page says “This is different from SpamAssassin or amavisd-new: for scoring or filtering with these programs, mail needs to be accepted and queued” which contradicts what I have just done.

    What benefits has policyd-weight got over SA? SA can add scores for the same criteria that policyd-weight uses as well as content scores. If I wasn’t going to use SA then policyd-weight would be very interesting to me as it seems to manage DNSBLs better than my current setup. But it seems that SA can be even better than that.

  • You can use SA in the same manner as policyd-weight. But people inevitably start using content filtering, and content filtering is inherently error prone and can be gamed.

    You probably don’t want to accept spam from dodgy IP address X, because they happened to include some key phrase in the spam (say “Debian”), which your filters have previously regarded as “almost sacrosanct”.

    Hence sometimes you can make a better decision using information that can’t be faked or is expensive to fake/forge, versus considering both that data, and data that is produced by the spammer.

    Perhaps it is the varied interests of our clients, but I find the error prone nature of content filtering frustrating.

    Worse yet a certain distro’s mailing list server is very fussy and seems to get annoyed when you reject too much of the spam it forwards. That led to trouble when I weakened, and allowed some content based filtering, and rejected emails that other people’s Spam Assassin instances had already tagged as spam (Incoming messages that claim to be spam often are!). I should have guessed those admins weren’t too bright, spending CPU resource to find it is spam and sending it anyway. So rejecting or delaying email from mostly good servers may cause other issues.

    Our most spammed server is still using Spamhaus, greylisting, ix.dnsbl.manitu.net, which is not as good as it was but still VERY effective. I’ve still to go over the statistics from policyd-weight for that server.

    I use content filtering in icedove, but it mostly deals with rubbish from “trusted” servers, and I get to hand train it for MY personal preferences. My own email server has policyd-weight, and so far only one known false positive in its default config.

  • etbe

    Simon: Currently I have the SA threshold set at 5.0. I was thinking of assigning weights of 3 or 4 to some of the DNSBLs I use. I currently don’t have it set up for learning (unless it does so automatically) so I hope it would be difficult for content to make the score fall below 5 if a couple of those DNSBLs are hit.

    Anyway I’ll see how it goes. If it doesn’t seem effective enough then I can always add other things. Thanks for the suggestion, I’ll consider policyd-weight as a backup option if my current configuration seems inadequate.