On Tue, 17 Feb 2004, Vernon Schryver wrote: > Thn enclosed example of how not to filter spam is offered for those > who might want to preemptively add accuspam.com or downloadfast.com > to their blacklists. > > It is also a classic example of what is wrong with the MUA filtering > tactics Robert Brown advocates. Vernon, I think you misunderstand what I was saying in my previous note -- the problem was likely mine as I may not have been very clear. If so, I apologize. I actually agree with much of what you say (and thought that was clear on the basis of my earlier posts on the subject). I was objecting to precisely this kind of filtering, and almost forwarded it to the list myself. You might recall that I predicted that my message to the ietf would generate bounces just like this because it contains a few key phrases -- obviously your reply contained the same phrases. In fact, it was in objection to this sort of automated bounce generation for both spam and viruses (especially the latter) that I joined this list and have been participating in this discussion. I also think you misunderstand spamassassin, at least the way it is used here. We as a department don't "filter" (in the sense of rejecting email addressed to them before they have the opportunity to see it) anybody's email at all. Duke as an institution does filter some incoming mail (not mine) for specific viruses (not spam) as viruses have a comparatively unambiguous signature. In the department, where we do USE spam assassin, no bounce messages are generated except when mail fails for one of the standard reasons unrelated to filtering of any sort. SA is implemented so that it passively scores each piece of mail as it arrives in fields added to the message header -- in fact this one (to which I am replying) has: Subject: How Not To Filter Spam References: <Pine.LNX.4.44.0402171555160.20733-100000@xxxxxxxxxxxxxxxxxxxxxx> Sender: owner-ietf@xxxxxxxx Precedence: bulk X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on mail X-Spam-Status: No, hits=-4.6 required=5.0 tests=AWL,BAYES_00,FOR_FREE autolearn=no version=2.63 X-Spam-Level: Spam assassin is pretty smart -- it correctly identified this as non-spam (negative total hit count) with a very high level of confidence. Empirically VERY few messages that score +5 or better are not spam, although a few percent of the spam messages that SA sees end up with scores of between 3 and 5 and get through. SA is smart, but cannot compare with the human eye. When it DOES identify a message as spam it STILL does nothing. It does not act as a mail accept/reject mechanism per se. It just adds a header such as: Subject: Learn about this investment and get the results you are seeking tgm Sender: ylxArnold J <ijci@xxxxxxxxxxx> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Sun, 15 Feb 2004 00:25:53 -0800 X-Mailer: Microsoft Outlook Build 10.0.2627 Message-Id: <20040215082535.5A9E2A77D1@xxxxxxxxxxxxxxxxx> X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on mail X-Spam-Report: * 2.7 NOT_ADVISOR BODY: Not registered investment advisor * 4.1 FORGED_RCVD_NET_HELO Host HELO'd using the wrong IP network * 3.0 MSGID_FROM_MTA_SHORT Message-Id was added by a relay * 0.1 RCVD_IN_SORBS RBL: SORBS: sender is listed in SORBS * [24.99.103.83 listed in dnsbl.sorbs.net] * 0.7 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org * [<http://dsbl.org/listing?ip=24.99.103.83>] * 1.5 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.ne t * [Blocked - see <http://www.spamcop.net/bl.shtml?24.99.103.83>] * 2.6 RCVD_IN_DYNABLOCK RBL: Sent directly from dynamic IP address * [24.99.103.83 listed in dnsbl.sorbs.net] X-Spam-Status: Yes, hits=14.7 required=5.0 tests=FORGED_RCVD_NET_HELO, MSGID_FROM_MTA_SHORT,NOT_ADVISOR,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL, RCVD_IN_DYNABLOCK,RCVD_IN_SORBS autolearn=spam version=2.63 X-Spam-Level: ************** (from another message in my rejected folder). I then use a separate tool, in my case procmail, to sort ALL my mail out into folders (which might be my inbox or might be Spam or might be /dev/null). No bounces are ever generated, no spammer gets any hint that my address is or isn't a good address, and I can filter on the basis of hits or spam level under MY control by simply changing my procmail entry. I can either keep the rejects (I actually do save them for a week in a rotated file) and go through them visually very quickly and delete them (I generally don't, but others I know do, often people who set their personal threshold for spam at 3 or 4 and a smaller fraction of spam delivered into their mail spool but have a higher risk of accidentally rejecting messages that aren't spam) or discard them altogether. I do a fair bit of discussion with hardware vendors and textbook vendors that can easily get a few spam points (but isn't) so I keep mine fairly high. I still have reservations about attempting to apply this or any other content based filtering that isn't looking for highly specific signatures (things like viruses or messages originating from blacklisted networks that one can identify with very few false positives) at the SMTP level. I also have reservations about decisions like the spam-level that triggers a reject being in somebody's hands other than my own. A systems administrator who got a lot of complaints about spam within a LAN might set it down to 3 or 4 to lower the fraction that makes it through for everybody, not realizing (or perhaps not caring) that it might make mail conversations with vendors very difficult. If that rejection occurred during the original transaction and generated a bounce -- well, that's the kind of thing we see above, a cure that can easily be worse than the disease, MUCH worse if the spam is being sent from a breakout address or bounced through many hops to get there with a forged or irrelevant return address. If I understand what you are saying, perhaps there is a way to "do it correctly" -- reject the spam at the original smtp transaction but with a message that goes back to the original sender (only) in spite of the fact that both the From and Return Path header entries might well be forged and the message relayed through one or more open relays. I don't see how, but I admit that I may well be ignorant and there may be a way. Either way, let me make it very clear. I do not like any sort of content filtering that generates automated bounce messages that may or may not be directed to the original senders. In the case of viruses especially (which ALWAYS have forged headers) the bounce message ALWAYS goes to an innocent and acts as a virus amplifier. I use linux -- I CAN'T get most current viruses -- but I receive ten or so messages every day telling me that some message that "I" sent contained a virus. Sometimes the bounce messages THEMSELVES are social engineering wrappers designed to transmit the virus -- bounce messages are so ubiquitous that viruses routinely use them (with an appropriately forged header) to try to transmit themselves. Spam bounces are to my experience no less braindead and are a significant source of "network abuse" in their own right. As was recently pointed out on this list, the automatically generated bounce messages are also a vehicle for DDOS attacks -- a virus that forged the SAME return header as it tried to mail itself all over could arrange for a flood of bounce messages to be directed to some target address. The only bounce messages I like to see are ones that tell me that the addressee is unknown (I mistyped an address), the message could not be delivered (because it was too large), or the message has been held up in a spool en route (a destination host is down somewhere). Having ANY message I send identified as spam, with a bounce or not, is very annoying. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx