Re: Why Spam is a problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Frank Solensky <fsolensky@premonitia.com> writes:

> Just posted on slashdot: a Bayesian approach to the problem that reports
> to have rates of 0.5% on false positives and 0% false negative:
> http://www.paulgraham.com/spam.html

Nice short-term approach.

Unfortunately, easily defeated with just appending (perhaps as an HTML
comment) a long innocent-looking fragment (e.g., a 30KB piece from a
random book).

Further, in its *present* form, where unfamiliar words are given 0.2
spam probability, easily defeated by just adding a lot of randomly
generated `words' like 9nscS9Ft, iuiF0kKw, 6AycPEbU, nsUdjGeP, etc.
Given enough of these, the Bayesian probability formula will declare
even a piece of mail that consists of a sales pitch for a pornographic
web site have a probability of being spam that is arbitrarily close to
0.2.

-- 
Stanislav Shalunov		http://www.internet2.edu/~shalunov/

"Which one is worse?  Both are worse."		-- V. I. Lenin


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]