Re: How Not To Filter Spam

"Robert G. Brown" <rgb@xxxxxxxxxxxx> · Wed, 18 Feb 2004 16:33:39 -0500 (EST)

On Tue, 17 Feb 2004, Vernon Schryver wrote:

> Thn enclosed example of how not to filter spam is offered for those
> who might want to preemptively add accuspam.com or downloadfast.com
> to their blacklists.
> 
> It is also a classic example of what is wrong with the MUA filtering
> tactics Robert Brown advocates.

Vernon, I think you misunderstand what I was saying in my previous note
-- the problem was likely mine as I may not have been very clear.  If
so, I apologize.  I actually agree with much of what you say (and
thought that was clear on the basis of my earlier posts on the subject).

I was objecting to precisely this kind of filtering, and almost
forwarded it to the list myself.  You might recall that I predicted that
my message to the ietf would generate bounces just like this because it
contains a few key phrases -- obviously your reply contained the same
phrases. In fact, it was in objection to this sort of automated bounce
generation for both spam and viruses (especially the latter) that I
joined this list and have been participating in this discussion.

I also think you misunderstand spamassassin, at least the way it is used
here.  We as a department don't "filter" (in the sense of rejecting
email addressed to them before they have the opportunity to see it)
anybody's email at all.  Duke as an institution does filter some
incoming mail (not mine) for specific viruses (not spam) as viruses have
a comparatively unambiguous signature.

In the department, where we do USE spam assassin, no bounce messages are
generated except when mail fails for one of the standard reasons
unrelated to filtering of any sort.  SA is implemented so that it
passively scores each piece of mail as it arrives in fields added to the
message header -- in fact this one (to which I am replying) has:

Subject: How Not To Filter Spam
References:
    <Pine.LNX.4.44.0402171555160.20733-100000@xxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-ietf@xxxxxxxx
Precedence: bulk
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on mail
X-Spam-Status: No, hits=-4.6 required=5.0 tests=AWL,BAYES_00,FOR_FREE 
        autolearn=no version=2.63
X-Spam-Level: 

Spam assassin is pretty smart -- it correctly identified this as
non-spam (negative total hit count) with a very high level of
confidence.  Empirically VERY few messages that score +5 or better are
not spam, although a few percent of the spam messages that SA sees end
up with scores of between 3 and 5 and get through.  SA is smart, but
cannot compare with the human eye.

When it DOES identify a message as spam it STILL does nothing.  It does
not act as a mail accept/reject mechanism per se. It just adds a header
such as:

Subject: Learn about this investment and get the results you are seeking
tgm
Sender: ylxArnold J <ijci@xxxxxxxxxxx>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Date: Sun, 15 Feb 2004 00:25:53 -0800
X-Mailer: Microsoft Outlook Build 10.0.2627
Message-Id: <20040215082535.5A9E2A77D1@xxxxxxxxxxxxxxxxx>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on mail
X-Spam-Report: 
        *  2.7 NOT_ADVISOR BODY: Not registered investment advisor
        * 4.1 FORGED_RCVD_NET_HELO Host HELO'd using the wrong IP
network
        *  3.0 MSGID_FROM_MTA_SHORT Message-Id was added by a relay
        *  0.1 RCVD_IN_SORBS RBL: SORBS: sender is listed in SORBS
        *      [24.99.103.83 listed in dnsbl.sorbs.net]
        *  0.7 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
        *      [<http://dsbl.org/listing?ip=24.99.103.83>]
        * 1.5 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
bl.spamcop.ne
t
        * [Blocked - see <http://www.spamcop.net/bl.shtml?24.99.103.83>]
        * 2.6 RCVD_IN_DYNABLOCK RBL: Sent directly from dynamic IP
address
        *      [24.99.103.83 listed in dnsbl.sorbs.net]
X-Spam-Status: Yes, hits=14.7 required=5.0 tests=FORGED_RCVD_NET_HELO,
        MSGID_FROM_MTA_SHORT,NOT_ADVISOR,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,
        RCVD_IN_DYNABLOCK,RCVD_IN_SORBS autolearn=spam version=2.63
X-Spam-Level: **************

(from another message in my rejected folder).  I then use a separate
tool, in my case procmail, to sort ALL my mail out into folders (which
might be my inbox or might be Spam or might be /dev/null).  No bounces
are ever generated, no spammer gets any hint that my address is or isn't
a good address, and I can filter on the basis of hits or spam level
under MY control by simply changing my procmail entry.  I can either
keep the rejects (I actually do save them for a week in a rotated file)
and go through them visually very quickly and delete them (I generally
don't, but others I know do, often people who set their personal
threshold for spam at 3 or 4 and a smaller fraction of spam delivered
into their mail spool but have a higher risk of accidentally rejecting
messages that aren't spam) or discard them altogether.  I do a fair bit
of discussion with hardware vendors and textbook vendors that can easily
get a few spam points (but isn't) so I keep mine fairly high.

I still have reservations about attempting to apply this or any other
content based filtering that isn't looking for highly specific
signatures (things like viruses or messages originating from blacklisted
networks that one can identify with very few false positives) at the
SMTP level.  I also have reservations about decisions like the
spam-level that triggers a reject being in somebody's hands other than
my own.  A systems administrator who got a lot of complaints about spam
within a LAN might set it down to 3 or 4 to lower the fraction that
makes it through for everybody, not realizing (or perhaps not caring)
that it might make mail conversations with vendors very difficult.

If that rejection occurred during the original transaction and generated
a bounce -- well, that's the kind of thing we see above, a cure that can
easily be worse than the disease, MUCH worse if the spam is being sent
from a breakout address or bounced through many hops to get there with a
forged or irrelevant return address.

If I understand what you are saying, perhaps there is a way to "do it
correctly" -- reject the spam at the original smtp transaction but with
a message that goes back to the original sender (only) in spite of the
fact that both the From and Return Path header entries might well be
forged and the message relayed through one or more open relays.  I don't
see how, but I admit that I may well be ignorant and there may be a way.

Either way, let me make it very clear.  I do not like any sort of
content filtering that generates automated bounce messages that may or
may not be directed to the original senders.  In the case of viruses
especially (which ALWAYS have forged headers) the bounce message ALWAYS
goes to an innocent and acts as a virus amplifier.  I use linux -- I
CAN'T get most current viruses -- but I receive ten or so messages every
day telling me that some message that "I" sent contained a virus.
Sometimes the bounce messages THEMSELVES are social engineering wrappers
designed to transmit the virus -- bounce messages are so ubiquitous that
viruses routinely use them (with an appropriately forged header) to try
to transmit themselves.  Spam bounces are to my experience no less
braindead and are a significant source of "network abuse" in their own
right.

As was recently pointed out on this list, the automatically generated
bounce messages are also a vehicle for DDOS attacks -- a virus that
forged the SAME return header as it tried to mail itself all over could
arrange for a flood of bounce messages to be directed to some target
address.  The only bounce messages I like to see are ones that tell me
that the addressee is unknown (I mistyped an address), the message could
not be delivered (because it was too large), or the message has been
held up in a spool en route (a destination host is down somewhere).
Having ANY message I send identified as spam, with a bounce or not, is
very annoying.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx