Frank Solensky <fsolensky@premonitia.com> writes: > Just posted on slashdot: a Bayesian approach to the problem that reports > to have rates of 0.5% on false positives and 0% false negative: > http://www.paulgraham.com/spam.html Nice short-term approach. Unfortunately, easily defeated with just appending (perhaps as an HTML comment) a long innocent-looking fragment (e.g., a 30KB piece from a random book). Further, in its *present* form, where unfamiliar words are given 0.2 spam probability, easily defeated by just adding a lot of randomly generated `words' like 9nscS9Ft, iuiF0kKw, 6AycPEbU, nsUdjGeP, etc. Given enough of these, the Bayesian probability formula will declare even a piece of mail that consists of a sales pitch for a pornographic web site have a probability of being spam that is arbitrarily close to 0.2. -- Stanislav Shalunov http://www.internet2.edu/~shalunov/ "Which one is worse? Both are worse." -- V. I. Lenin