Spam-fighting content filters

Hans's picture
Tue, 2008-01-15 15:08 by Hans

I just received an email bounce with an error message saying:

Error 554 – Message could not be sent due to inappropriate content

It was an important email. I had worked into the night to prepare some source code from a programmer in Nigeria for somebody else to work on. I was not amused to be told that my email was "inappropriate".

Some investigation revealed that the combination of the words Nigerian and money apparently triggered a content filter that had enough authority to single-handedly block the email.

When I contacted the recipient by other means, he proposed that the person running the mail server could tweak the filter. I didn't agree.

It is a fundamental misconception that content filtering like this can be made reliable by tweaking. This mistake here is one out of a billion possible mistakes. Even if the mail server operator could know all these errors in advance, his lifetime wouldn't be enough to tweak them all away.

If the filter clicks at the mention of Nigeria and money, how utterly stupid is that? Nigeria is the most populous country of Africa. It has 130 million people, it has oil, it has high-tech companies, it has many computer programmers.

What can we learn from this? A content filter that single-handedly blocks an email is a lost cause. The only procedure which I can accept and which I actually use is that a content filter delivers a small contribution to some overall scoring, with the main contribution coming from reliable IP address blacklists like Spamhaus and, with a lower weight, perhaps from less reliable ones like Spamcop.

If you rely on content filtering, sooner or later you will miss an important mail for no good reason, you'll anger potential clients, with some bad luck you may even lose a contract.

Good email filters can never rely on content filtering alone. They have to rely on centralized spam databases, on honeypots, on multiple user reports.

Big email companies like Google are in a relatively good position to detect and filter spam reliably. If you want an automated spam filter that doesn't need tweaking or other babysitting, route your email through a Google Mail account or use a good client filter that connects to at least one central spam database.