stanislav shalunov wrote: >Further, in its *present* form, where unfamiliar words are given 0.2 >spam probability, easily defeated by just adding a lot of randomly >generated `words' like 9nscS9Ft, iuiF0kKw, 6AycPEbU, nsUdjGeP, etc. >Given enough of these, the Bayesian probability formula will declare >even a piece of mail that consists of a sales pitch for a pornographic >web site have a probability of being spam that is arbitrarily close to >0.2. > > Is that true? The approach described looks only at the 15 words furthest from 0.5; it seems likely that most messages that would rank at 0.9 or above would have enough spam-words that words at 0.2 wouldn't show up. One thing that would be necessary, and that the author doesn't mention, would be to decode content-encodings before applying the filter; otherwise spammers could just base64 all their messages. -- /=============================================================\ |John Stracke |jstracke@centivinc.com | |Principal Engineer|http://www.centivinc.com | |Centiv |My opinions are my own. | |=============================================================| |*BOOM* "Thank you, Beaker. Now we know that is definitely too| |much gunpowder." -- Dr. Bunsen Honeydew | \=============================================================/