This is the first in a multipart series on better SPAM fighting through log parsing. I have found that better Systems Administration can usually be achieved through proper log handling and analysis. In fact, I will use the data from one of the secondary mail servers in my personal mail setup in order to demonstrate this data analysis. I will do this by going through the report generated by amavisd-logwatch piece meal until complete.
I previously posted about a program that parses your amavisd-new SPAM log file called amavisd-logwatch. Now I am going to give you some tutorials of how to make efficient use of the results. I am assuming that you have access your SpamAssassin scoring config files. I am also assuming that you have access to the log parsing results. I have mine sent via email daily.
One item I would like to mention is that when making changes to SPAMAssassin, ensure that you make them in a separate file from the default configuration files. I use /etc/spamassassin/local_tests.cf. I strongly recommend this setup as this makes it easier to segment your configuration files by type when your rule sets and modifications start to get larger and larger.
Section: Bayes Probability
First things first, skip the majority of the summary sections and go right down to the section on Bayes probability:
You’ll notice that of the 14,627 times that the Bayesian filter was run on messages, that it came up with BAYES_99 11,825 of those times (or 80.85%) . You’ll also notice that all the subsequent BAYES_XX probability tests were extremely low (2nd and 3rd place being 5.4% and 4.5% respectively).
Conclusion: Assuming that you are relatively happy with your current level of SPAM filtering, that would mean that your Bayes filter is doing fairly well (in general). You may not need to tweak it. If you are feeling frisky though, to tweak the impact that the BAYES_99. To change this, open up your local_test.cf and add the line:
score BAYES_99 (1.25)
This increases your BAYES_99 score by 1.25 points from its base. It doesn’t have to be 1.25 points, start small to see what you are comfortable with and slowly work your way up. Be careful as too high a jump will cause false positives which makes for angry users.
Section: SPAM Score Frequency
The SPAM score frequency refers to how often a piece of email scores within a given range.
Conclusion: Taking note of the fact that nearly 60% of the emails scored a 30 or higher, and assuming again that you are comfortable with your SPAM filter, you can adjust the SPAM kill score threshold in amavisd-new accordingly. I trust my SPAM filter, but I have written many rules and made many tweaks to it. So I have set my SPAM kill threshold low enough (15.8 to be exact). As you can see, this is pretty close to the middle of the set of numbers (also known as the median). This eliminates the delivery of the vast majority of the obvious SPAM.
Stay tuned for the next part in the series where we will tweak the individual scores based on the results report.