Re: Proposal to define a simple architecture to differentiate legitimate bulk email from Spam (UBE)

Shelby Moore <coolpage@earthlink.net> · Sun, 07 Sep 2003 14:02:30 +0800

IMO, this (whether Hotmail will implement a specific feature) is a fairly irrelevant (an 80 out of 80/20 rule) fork of the debate relative to the main point of the proposal, so let's try to wrap this fork up with one or two go rounds max okay.

>> Interestingly note that Hotmail makes you pay to POP *FROM* hotmail, but no
>> charge to POP from other accounts *TO* Hotmail.  Does that give you any hint
>> about their business model??
>
>Yes.  It's *NOT* a business model where they want to be polling a dozen servers
>on a regular basis for each of their customers for mail that may or may not be
>there, and for the average mailing list, probably is not there at any given
>poll.

Not any more than they want to be POPing any email at all.  Nobody wants to do any work they do not have to do.  But if there is advantage for them, or a profit to be made, they do it.  If they do not, someone else will, then they lose marketshare.

They used to not POP email at all (do you remember or did you know that?).  Then they discovered they were missing a big market of eyeballs.

>  They want eyeballs, and the last thing they want to do is expend more
>effort than needed to get eyeballs.

No disrespect intended, but that sentence is illogical.  You want something and you don't want to do something that gives you what you want.

You mean I guess that they would not agree to add effort just to retain existing eyeballs.  Again I disagree.  I think they will do what ever they have to in order to retain market share, as long as the cost doesn't kill more profit than it retains.

> Sure - they can even optimize the 'POP the
>list' check by only doing it once for all the subscribers - but they're still
>hitting each server for each list on a several-times a day basis.  And under
>the current scheme, they can just *catch* one SMTP transaction with all the
>RCPT TO's piggybacked *when there's actual mail*.  So they'd have to work a lot
>harder under your scheme.

POPing once (one list mailing) versus processing one email with zillion RCPT TOs (one list mailing) is not a very big cost difference.  One might be slightly less than the other and we really can't say which one, but it is irrelevant because the difference is insignificant.

Actually it is more likely that when they POP they will get several messages at once, so less cost than catch several SMTP emails.  

Also they know a priori the correlation of receivers to POP, which can be optimized with time, versus having to build a new mapping table in real-time every time they process an SMTP with RCPT TO.

>And let's *THINK* for a moment here - what is your proposal *REALLY* going to
>change?  We already have many estimates that 50% or so of all e-mail is spam.
>Let's take that as a given, and let's make the rash assumption that the rest is
>25% mailing list traffic and 25% person-to-person.

It be more interesting to know what the real stats are on the other 50%, because I doubt that 25% is legitimate bulk email.  It seems that you live in a different (mailing list centric) world than I and most "normal" people live in.  I join mailing lists for a short time to get something done, then I leave asap.  Most of the people I know and the many thousands of customers I come into contact with, seem to not even know how to use a mailing list.

With 500 million people on the internet, I would venture that 80% don't even know what a mailing list is.  They may use Yahoo Personals, and not even realize it is a mailing list.  Since the email is being directly deposited into the Yahoo account, they have no clue.

Any way, let's follow your line of debate...

>So what you want to do is take the 25% of the list traffic that works just fine
>on the current infrastructure,

No it doesn't work fine.  My gf complained that she couldn't find her Yahoo Personals email amongst the 500 spams she gets per day, of course that makes me happy but that is besides the point :)

> and is usually quite easily whitelistable via a
>number of different methods -

Whitelisting can be subverted by spammers:

http://www.cnn.com/2003/TECH/internet/09/01/spam.chainletter/index.html

"...Herrick, however, admits that the practice could be a good way to bypass e-mail filters which block messages from senders who are not known to the recipient. Spammers could use chain letters to discover the addresses of people with whom you frequently communicate. Spam purporting to be from someone in your address book would sneak by filters. 
"If I were a spammer, I'd be working very hard to perfect this technique," he said..."

> and move it to something totally different.
>And what you're left with is a 2-1 mix of spam and personal mail that you
>yourself admit things like the DCC and spam filters are unable to perfectly
>distinguish.

The whole point of the change is to enable elimination of the spam which can not currently be done.

See my response to John C Klensin, regarding "chicken and egg" and the example benefits to attacking spam:

http://www1.ietf.org/mail-archive/ietf/Current/msg22050.html

>Having exiled the mailing list traffic,  we would then be able to work on
>separating the spam from non-spam - but as you already noted yourself, we don't
>know how to do that yet.

Yes we probably do.  Just because the DCC can not measure bulk email reliably doesn't mean Hosts, ISPs, and other software can not.  BrightMail already is (just signup for an Earthlink account and try really hard to get some spam), and I will also be probably be demonstrating something soon.

>  And getting rid of the mailing list traffic doesn't
>in fact gain us anything at all, since everybody who filters list traffic into
>separate folders for each list knows that isn't the problem - it's the
>unfiltered stuff that's left in the inbox.

You are missing the point, which is until you can say that all bulk email is spam, then you can't target spam.  How could ISPs, Hosts, legislators, and judges know the difference between legitimate email and spam?  Again see the targeting benefits:

http://www1.ietf.org/mail-archive/ietf/Current/msg22050.html

Worse as it stands now, mailing list traffic can often get misidentified as spam, unless it is a well established list.

>I'll note in passing that the two highest SpamAssassin scores I've ever seen
>were both on legitimate postings to mailing lists -  both were humor pieces
>about spam....

I've already written publicly in 2002 that the Bayesian and any content filtering methods cause more harm than they solve.

>Quite frankly, given that at least half the spam I get is already in obvious
>violation of at least one law (pick one - securities fraud, advance-fee scams,
>wire fraud, bogus pharmeceuticals, or hijacking a proxy to send the mail), I
>severely doubt that anything the IETF does in regards to standards won't make a
>difference. The spammers often don't even bother following RFC822 - why should
>they follow your scheme?

Again you are missing the whole point.

It is has nothing to do with what spammers will or will not do.  It has to do with what Hosts, ISPs, etc are currently prevented from doing.  Since they can not determine what is spam, they can not enforce any law.  The practicalities of blocking email based on a wide range of hard to prove laws is none.  There would be too much liability for the enforcer if they do not successfully win the criminal case.  Whereas if you have a simple, clear cut metric as in my proposal, then ISPs, Hosts, etc can take action and will take action because spam is one of their major costs.

>The *only* two ways to get rid of spam both involve making it non-profitable.
>
>The first is lowering the generated income.

I agreed.  I have written a thesis on this entitled, "Fragile (yes I think so!) Economics of Spam"

>  Given that recently, somebody
>hacked the site of a "fertilizer for your body part" scam, and found a list of
>6,000 people who had paid $50 a bottle, I have to sadly conclude that Korbluth
>and Barnum were both correct, there's one born every minute and the rate is
>increasing.  So there's no joy to be found there.

I've read the theories and you realize that the spammer's margins are very thin.  It won't take too much to topple the boat.  The problem is that the architecture is not adequate to increase their costs significantly yet.  That is why I made this proposal.

>The second is raising the cost to the spammer.

Agreed 100%.   See above.

>  Personally, I like the idea of
>taking up a collection among the ISPs and other providers, and hiring some good
>ethnic muscle (there's competition in the field, a number of experienced and
>ruthless groups are available).  I'm sure the spam problem would change
>drastically if the spammer was seriously having to balance the mentioned $300K
>for bogus enhancement pills against having their kneecaps broken by one group
>or worse by one of the other groups...

That is not an effective deterent as evident by the drug war and crime in general.  You actually have to make it more expensive to send than to recieve.  That is what my proposal is all about.

>
>Pity that will never work though.  At least not officially (although one 
>infamous
>New Zealander apparently retired recently...)

:-)