Re: [OT] Tom's/Marc's spam filters?

Joe Conway <mail@joeconway.com> · Fri, 23 Apr 2004 22:50:51 -0700

Marc G. Fournier wrote:
On Mon, 19 Apr 2004, Joe Conway wrote:
Marc G. Fournier wrote:
Huh?  I just use Spamassassin myself, with Razor/Pyzor/DCC and Bayes all
enabled ...

I use exactly the same setup. But recently I've noticed that the
spammers are getting smarter -- I think 20% of it is slipping by the
filters. I'm going to need something better.

do you force learn those spam that get through the cracks?  I get about 20
or 30 messages that slip through the cracks, which I process through with
sa-learn nightly ...

Sorry to drag this OT thread on even longer, but it seems to be a topic 
many are interested in ;-)

I wanted to report back that after just 2 days of forced (supervised) 
learning, the bayesian filter is now nailing about 99% of all spam. 
*Many, many, thanks* for the suggestion.

But I wonder why the autolearn feature is so conservative? At this point 
I'm getting lots of stuff like this:

X-Spam-Status: Yes, hits=5.8 required=2.5 tests=BAYES_99,HTML_FONT_BIG,
	HTML_MESSAGE autolearn=no version=2.63
X-Spam-Report:
	*  0.1 HTML_MESSAGE BODY: HTML included in message
	*  0.3 HTML_FONT_BIG BODY: HTML has a big font
	*  5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
	*      [score: 1.0000]

Notice that, even though I get a hit on BAYES_99, I still get 
autolearn=no. Ah well, I guess I should be asking that question of the 
SpamAssassin guys. Also notice that this sucker would have gotten 
through with a score of only 0.4 had it not been for the bayesian filter.

Again, thanks.

Joe

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match