Re: Careful with those spamtools.....

Shelby Moore <coolpage@earthlink.net> · Mon, 15 Sep 2003 08:21:00 +0800

> Indeed. These open relay blacklist sites were always a highly questionable
> source for mail filtering. Quite obviously, open relays have no relationship to
> spam...

Agreed and already in public domain:

http://www.imc.org/ube-relay.html

A similar criteria conceptually more correlated to spam filtering, would be a blacklist of relays that are dishonest about the previous IP address in the Received header chain.  I do not think http://www.rfc-ignorant.org/ currently databases such non-compliance.

Then again such a hypothetical database would be mostly useless in implementation, because dishonest proxies come and go faster than we could database them.  Could test in real-time, but tests can be lied to.

There are (some proprietary) reliable way to detect the dishonest proxies, but I agree with Dean, much better to just detect the spam directly.

In terms of detecting spam directly, per message filters which are based solely on content, have such as high false positive cost and are subvertable with content:

http://citeseer.nj.nec.com/androutsopoulos00learning.html (See Page 9 of the PDF linked at top)

Filters based on bulk correlation (DCC) of content, require whitelist maintenance and are subvertable with content.  Filters which required your senders to opt-in are inherently expensive to the email system, as well as generate many false positives, and are subvertable by forged headers (not to mention being patented).  A brief taxonomy is here:

http://www.imc.org/ube-sol.html

Even if these above filter types haven't been subverted in high rates yet, they can be:

http://www1.ietf.org/mail-archive/ietf/Current/msg22190.html

We are working on a filtering mechanism which does not suffer from these sorts of issues, because it actually looks at what it unique about spam, not just some sometimes correlated side effects as other filters above do.

I agree with Dean and I think conceptually that ALL existing anti-spam (that is currently in public domain that I am aware of) is useless and even harmful as Dean points out (in long run) because they filter things which are not spam, just sometimes (even if most of time so far) correlated to spam.

I've been making points like this for a long time:

http://ixazon.dynip.com/pipermail/nilsimsa/2002-December/000041.html
(my warnings on dangers of Bayesian anti-spam filtering, which imo caused Paul Graham to eventually add a disclaimer to his web page)

Shelby Moore
http://AntiViotic.com