Re: Proposal to define a simple architecture to differentiate legitimate bulk email from Spam (UBE)

Dean Anderson <dean@av8.com> · Mon, 8 Sep 2003 16:49:20 -0400 (EDT)

On Tue, 9 Sep 2003, Shelby Moore wrote:
>
> >> > Information theory says that such things are impossible.  One can not
> >> > construct a spam-free protocol because this is the same problem as
> >> > constructing a system free of covert channels, which information theory
> >> > says is impossible.
>
>
> But information theory also says you can optimize signal-to-noise ratio,
> but only if you know what the characteristics of your signal are.

It actually doesn't say that precisely. It says that you can transmit a
signal with an arbitrarilly low error rate at a speed below the channel
capacity.

The concrete task of altering the signal to noise ratio is accomplished by
enhancing the signal with a harmonic oscillator, so that it is stronger
than the noise.  This is then described as a set of differential equations
that can be optimized with Variational methods.  The limits of this
process are indicated by information theory, the nyquist theorem, etc.
If the channel isn't described by a fourier series, then the differential
equations may not be solvable, and it may be impossible to optimize its
signal to noise ratio. (Well, there are other mathematical methods, but
you get the point.) You are borrowing the concepts by metaphor, but the
concrete methods don't transfer well.

My point is not to discourage you from trying to stop spam, but to focus
your attention on detection, rather than protocol alteration.  It is
impossible to alter the protocol in any way that will force the spammer to
identify themselves a-priori as a spammer.

You could ask for spammers to cooperatively self-mark their messages.
But this hasn't been terribly productive.  It is also pointless to ask for
cooperative identification of non-spammers and identify spammers as those
not in the set of non-spammers. It may be too strong to say this is
pointless, as qsecretary and things that do this are fairly popular,
However, they don't quite solve the general case of things that send mail
but don't receive mail. And of course such schemes are fooled by disposal
accounts, which can respond until they are shut down.  If everyone used
qsecretary, spammers would simply alter their software to send responses.

So given a set of unmarked messages, some spam, some not-spam, the task is
to have a program mark them in the same way that a human would if a human
were reading the messages. Since humans have different definitions of
spam, it would be useful if the program could accept different definitions
as well.  This is the realm of content analysis.

> Thus my whole motivation for an unambiguous definition (spam == all bulk
> email) along the channel and not just a definition at the end points
> (UBE).

You may need a precise definition before you can begin implementation
(just like you need a definition of voltage, current, etc to begin
building a transmitter), but you do not need a precise definition to talk
about the theoretical aspects.  Spam could be defined as UCE, CE, UBE, or
BE.  I have also a more complete and detailed taxonomy of spam:

There are 3 types of email that we generally call spam:

Type 1: Bonafide Messaging with a real Commercial or non-profit(ie
political) purpose.  This also includes contraband (eg drugs) which is
illegal in some or all juridictions, so long as they intend to deliver the
illegal goods. This also includes solicited and unsolicited email, though
it may be useful to distinguish solicited as Type 1A and unsolicted as
Type 1B.

Type 2: Bonafide fraudulent activity. Someone is really trying to get your
money, but has no intentions of honoring their obligations to the purchase
contract.  This includes bonafide attempts at identify theft.  This is
already criminalized by mail fraud and wire fraud, and other laws
concerning fraud.

Type 3: Annoyance activity. This has no bonafide intention of getting
money or even personal information, even though at a casual glance it may
appear so.  Type 3 is broken into 2 subtypes:

Type 3A is a relatively harmless disgruntled person, who is not terribly
sophisticated in their abuse, or in hiding their tracks. This type can be
handled by warnings or account termination. Besides spam, this type is
also involved in small DOS attacks and other unsophisticated abuse.

Type 3B is frequently a career criminal using viruses and rooted machines
to conduct annoyance, which is frequently just another type of DOS attack,
but targeted perhaps at an email address, or perhaps at a domain. This
type of attacker is frequently already a career criminal, having broken
into many, often hundreds of computers, illegally. This type cannot be
dealt with effectively by ISPs, because they are reasonable adept at
partially hiding their tracks by crossing organizational boundaries.
Usually, the ISP only detects the infected computer, but does not identify
or catch the cracker.