On Tue, 17 Feb 2004, Vernon Schryver wrote: > > (Silently discarding _is_ a bad idea, when done by the SMTP server > > itself. IMHO, it's better to mark for later discard -- which actually > > could be done in such a way as to mark only for those recipients who > > requested the more restrictive filtering.) Or, mark for later accept/reject decisioning AFTER the SMTP server per se, in the filter pipeline between the server and the mail spool of the addressee. Spam assassin does the right thing already (and this is exactly what it does). > > A better positition is that everything should be logged, particularly > including discarded mail, and in that case, enough of bodies to allow > targets to identify senders and the nature of the discarded messages. > Of course, one should assume users won't normally look at those logs. > Spam you read is not filtered, but at most categorized and stigmatized. Logging a message you reject is nearly a waste of time. In order to recover the message (as you note, nobody ever looks at the logs, which are VERY LARGE for a busy mailer and beyond human capacity to scan), it really requires an out-of-band message telling you there is a message to be recovered. As in calling somebody up and asking them why you haven't paid your e-bills (rejected as spam). In most cases it will require a retransmission and out-of-band communications to complete the transaction, because what you have done is made email an unreliable messaging service for valuable, wanted, messages. This is where, and why, I take issue with filtering and discarding at the level of the SMTP server, unless the accept/reject decision can be made with 100% precision (no false positives, no false negatives, and it may not be good even then because MY idea of the correct basis for the decision may not be the same as YOURS). Imagine a networking transport protocol such as TCP, that discarded packets not based on their header information (according to any protocol-level criterion you like) but on their CONTENT, using a moderately arcane ruleset designed to identify high-level human patterns such as spam from the content information, applied to the content in a complex and highly multivariate decision tree. Hmmm, pretty ugly for a TRANSPORT LAYER. It's not that filtering based on non-header-linked aspects of content is or isn't a good idea in some cases. It is that it has no business being in the specification of TCP. TCP is about reliable delivery and sequencing of packets. It (as a protocol) has nothing to do with the content of those packets outside of information contained in its headers. Indeed, rejecting packets based on any mechanical ruleset applied to packet contents will almost certainly REDUCE the reliability of the network, because (for example) an encrypted document might by pure chance have a byte sequence like SEX that caused it to be rejected even though its actual contents were quotes from the stock market. One doesn't write filters looking for buffer overwrite attacks into the TCP stack in the kernel -- one fixes the application. For nearly all filtering programs, it is too easy to create a message that is filtered but shouldn't be. Even rejecting viruses on the basis of their signatures makes it difficult to SEND a virus you've just received to an email drop where somebody can identify it. As we've seen, a naive application of rules mkes it impossible for this most-certainly-not-spam discussion to penetrate various "protected" sites. It is often similarly easy to craft a message that should be filtered but isn't because it slips through your rules. But we've had this discussion; the point is about reliability. SMTP was designed to permit reasonably RELIABLE (simple) transport of addressed mail on top of a TCP connection. In most cases, ethernet forms one of the lowest level protocols that the mail message is wrapped in (ethernet header), then IP (IP header), then TCP (with its header). SMTP pays relatively little attention to the message headers at the transport and network protocol levels as it doesn't matter much which ethernet address was responsible for the last packet hop (likely a router/gateway and the same for all packets and messages). It doesn't care MUCH about IP or TCP, except that the packets have to be correctly wrapped according to the protocols and addressed to SMTP listening on the appropriate port. SMTP does care about packet content, but only to a certain level. Part of the packet content it cares about is the negotiation phase where it manages a connection, executing "commands" based on content in a very structured way designed to communicate key information recursively along the mail message's delivery trajectory, as it builds its own mail header containing the addressee, the nominal sender, and the delivery route complete with timestamps. It is designed to be quite trusting, hence easy to spoof, at least for mail to someone that doesn't read the rest of the header. Part of the packet content it views as "data" -- a message to be reliably delivered to a uniquely specified spool file, encapsulated within its MAIL header information (the TCP, IP and ethernet level header information having been discarded). There a user, or a program, may or may not read it and take actions based on its content. It seems to me to be highly unacceptable to attempt to insert content-based accept/reject decisioning in at this PROTOCOL level in the delivery process. It also seems to be highly risky and possibly legally actionable. I expect, in good faith, that email addressed to me will be delivered if it is deliverable. Otherwise I cannot rely on email as a reliable transport mechanism for important messages. Filtering it "for" me according to ANY CONTENT-BASED RULESET risks discarding at least some messages that are not correctly classified when they are rejected. Important messages can be lost. Bad things can result. Who is responsible when this occurs? Who do I get to sue? Imagine the post office (the real one) opening your mail and examining content -- for most of us this alone would be a nightmare and invasion of privacy -- so fine, with automated anonymous "safe" machines, to eliminate advertisement mailings (only) but pass everything else. One day it rejects and shreds a sweepstakes check (a real one) thinking it is one of the many fake checks sent out by loan companies. Another day it shreds a warning by your bank that a loan payment is due because it also looks like an advertisement. Even humans make this sort of (sometimes very expensive) mistake, but at least if you make it it is "your fault". How would you feel, and react, if the mistake were utterly beyond your control? How long would it be before banks and other businesses rejected the post office as a reliable transport agency? It is perfectly reasonable for you to add content filters that YOU control ABOVE this transport layer. If you want to hire a secretary to open all of your mail for you and sort it and reject all the advertisements, you can. If the secretary makes a mistake and throws away a megamillion dollar contract offer that you subsequently lose, you also bear the responsibility, or at least can direct your anger at something you control and take steps to ensure that similar things don't happen again. Now, all that it would take to control this end stage filtering and make it much more reliable would be a federal law mandating that advertising communications be sent in envelopes clearly labelled as such. No more sending out loan offers in envelopes that strangely resemble official government communications unless they were clearly labelled "advertisement". No more writing the address in by hand to convince you that you are opening personal mail. If somebody violated the law, you would have the envelope and offer in hand and could recover damages in small claims court with very little effort. Otherwise, you could safely discard all advertising. Note that this doesn't really help someone forced to do the final accept/reject step themselves by hand. They still have to look at each envelope to look for the mandatory classification mark. Advertisers would of course try to exploit this by making the envelopes themselves a major part of their messages -- "open ME". It would make it fairly simple to institute automated blocks, though, both for human secretaries and real mail and for e.g. procmail and email. With that all said, there are tools that ALREADY provide the kind of content level filtering mentioned above. The better ones do not themselves discard or bounce any mail to any user. They simply SCORE THE CONTENT with regard to its likelihood of being spam (or a virus) on the basis of a whole battery of tests. Scores that exceed a given threshold can easily and automatically be rejected or binned for a second stage pass by humans later looking for lost checks and bills. The USER (the one that ultimately knows the value TO THEM of a lost message) can set the threshold to whatever level they are comfortable with. Some will prefer to play it loose and never risk losing a message, even though it means that a lot of spam gets through. Others will reject on a fairly low threshold, not caring that they blindly miss out on discussions of spam on ietf.org because those discussions inevitably contain phrases like "Buy Viagra From Us Today" (oops, lost one whole block of recipients), or "Have the Wildest Sex Ever" (now I lost all the elementary and high schools in the country). It is THEIR DECISION, and they can pay the expectation cost "penalty" in lost messages vs lots of spam at whatever level they select. This doesn't require protocol level modifications of SMTP. In fact, it is generally desirable not to alter the SMTP MTA except in VERY carefully thought out ways, although it is fine to graft in content rating/filtering systems after the reliable delivery is complete to the receiving host but before putting the message in a user's mailbox (with the score in a special header where it can be used to drive user-level decisions. Just as spam assassin actually does now. I repeat, I see little for the IETF to do about spam at the protocol level, although it could be a powerful force at the political level, communicating to our lawmakers the real costs of spam and urging them to adopt stringent legal controls and penalties. We could help draft those legal controls and penalties. For example, make it illegal to collect and resell lists of email addresses for commercial purpose. Require all advertisements to be clearly labelled as advertisements in a form that permits them to be automatically and reliably filtered. One thing that we COULD do is create a new mail header line marking mass produced advertising mail as such, and match that up with laws requiring all legitimate corporate clients to use it (with hefty penalties for those that don't). This is not the case for bidirectional encryption of email content. There it won't happen unless and until the IETF works out a practical way to make it work at the protocol level, since clearly ALL MTA's have to be able to manage the encryption. At this point I see no practical way to require or enforce point to point encryption of all mail at the user level without a nearly complete reengineering of mail transport (basically replacing smtp altogether). I CAN imagine all hosts having their own RSA public and private keys, and having all the public keys be part of their domain nameservice registration information and hence automatically available to "everybody". I can imagine both tools and service providers that could generate suitable key pairs and manage distribution of the pairs onto registered systems either for a fee or as a part of routine systems management (as we already do now, for the most part, for ssh). I simply canNOT imagine that process extending down to the level of private users at this time. I don't believe that there is yet a suitable vehicle for the required nameservice that would scale to billions of registered entities and many trillions of bytes of served data (as good keys aren't small) and that has at least the robustness of the existing DNS. It is worth imagining such a service, but we are not terribly close, I think, to being able to engineer it at this point. It also has interesting social costs and risks that I suspect would significantly affect the engineering, as obviously protecting one's private key then becomes an "interesting" problem all by itself. A well managed computer is already a bit of a fortress with facilities that permit protecting data. Users are not fortresses at all (quite the opposite) and if anything are terribly lax with important data. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx