Re: spambayes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2006-05-10 at 13:12 +0100, James Wilkinson wrote:
> Aaron Konstam wrote:
> > But and no one asked it is based on a mistaken assumption that it is
> > useful to have mail identified in addition to spam and ham as unknown. I
> > don't think they call it unknown but that is the purpose. I can't go
> > into the whole argument but to me this tri-classification is not only
> > unnecessary but more trouble to deal with.
> 
> I, on the other hand, find it excellent. The program has the honesty to
> ask for help when it gets stuck.
> 
> What we'd all *like*, ideally, is an antispam program that could
> identify what we considered to be spam with 100% accuracy.
> 
> That turns out to be practically impossible. There will be e-mails that
> are border-line, e-mails that "look" like spam but are actually wanted
> (false positives), e-mails that "look" wanted but are really spam (false
> negatives), and ones that are pretty impossible to automatically
> classify.
> 
> The "unsure" category provides a place for the border-line and the Hard
> Cases, and massively reduces false positives and negatives (they usually
> end up in "unsure", instead of "good" or "spam").
> 
> So you get "good" folders that you can be pretty certain are good. You
> get "spam" folders that *very* *very* rarely have good e-mail in them.
> And you have a folder *marked* "dodgy". So you can quickly deal with it
> when you want, with the expectation that it's probably spam.
> 
> Of course, since the program is based on a modified Bayesian algorithm,
> you are expected to train on errors. You are expected to put a little
> bit of time into helping the program. "Unsure" is simply where e-mails
> go if the program needs to be trained on them.
> 
> James.
All this is too technical a matter to deal with here. Training on the
usure is not different then training on the ham which should be spam and
the spam that should be ham. In any case you can't be over confident
with spambayes. You still have to check for spam that is misclassified
and ham which is misclassified. So now you have three streams to check
rather than two. That to me is an extra pain.

-- 
Aaron Konstam <akonstam@xxxxxxxxxxxxx>

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora Magazine]     [Fedora News]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [SSH]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux