Re: Proposal to define a simple architecture to differentiate legitimate bulk email from Spam (UBE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After this issue, I am probably moving the thread to IRTF (as suggested) if possible (but probably after taking a break to do some other work).


>> >> > Information theory says that such things are impossible.  One can not
>> >> > construct a spam-free protocol because this is the same problem as
>> >> > constructing a system free of covert channels, which information theory
>> >> > says is impossible.
>>
>>
>> But information theory also says you can optimize signal-to-noise ratio,
>> but only if you know what the characteristics of your signal are.
>
>It actually doesn't say that precisely. It says that you can transmit a
>signal with an arbitrarilly low error rate at a speed below the channel
>capacity.
>
>The concrete task of altering the signal to noise ratio is accomplished by
>enhancing the signal with a harmonic oscillator, so that it is stronger
>than the noise.


Agreed.

And thus on a "conceptual level", you have to have some idea about the signal characteristics in order to enhance it.

Actually if I remember correctly, your example is how it applies to periodic signals.  The general case is more abstract.


>  This is then described as a set of differential equations
>that can be optimized with Variational methods.  The limits of this
>process are indicated by information theory, the nyquist theorem, etc.


Add Shannon entropy, chaos, etc...


>If the channel isn't described by a fourier series, then the differential
>equations may not be solvable, and it may be impossible to optimize its
>signal to noise ratio. (Well, there are other mathematical methods, but
>you get the point.)


Yes that is what I meant that the general case is more abstract, so I was talking on a "conceptual" or abstract level.


> You are borrowing the concepts by metaphor, but the
>concrete methods don't transfer well.


I was only using it to say we must define the signal how it appears in the channel before we can do any research on it in the channel.

The way spam is currently defined defined as UBE (instead of my proposed *BE), then it means you can only model the signal at the end point.  Given that means in the receivers subjective mind, that is not all that useful for research, unless you want to get into very fuzzy science such pyschology.  If you want to make the point about practicality, then that is a very strong one!


>My point is not to discourage you from trying to stop spam,


You are only 1 of 3 people so far at IETF who has said that to me.  The rest who have commented have tried to discourage me.  So thank you.


> but to focus
>your attention on detection, rather than protocol alteration.  It is
>impossible to alter the protocol in any way that will force the spammer to
>identify themselves a-priori as a spammer.


Disagree strongly.  First benefit is once you define spam == *BE (instead of UBE), then it is easier to model spam and do research on it, because you can model it at any node in the channel, not only at the receiver end point.  That was my whole point about "enforcers".

However, there is a problem.  Some *BE is solicited.  Which is why I proposed moving the solicited *BE to another channel ("pull").

Your point is that it is futile to define a protocol that will separate the solicited from the unsolicited, because spammers will always be able to subvert the protocol.  And you to say thus there are no benefits to detection.  I strongly disagree.  There are two aspects to my response:

1. Spam coming thru the alternate "pull" channel can be modeled differently that spam defined as *BE.  This separation of models provides benefits over trying to model spam as UBE in the receiver's mind (end point).  Other person in this thread has provided one specific example, which is the "pull" delay gives a whole new dynamic to detection.  Also I have pointed about that the membership quality of the solicited channel, gives it unique modeling advantages.

2. Spam coming thru the existing channel can then be modeled as *BE at any node of the channel, instead of as UBE.  Some nodes have a much better model of spam in this definition, than the one at the end point.  For example, ISPs can see a lot more abuse data in real-time, than a single receiver or the current inherently more clumsy attempts to group or poll receivers.

Hopefully that will set the record straight that I am thinking about spam in new conceptual ways...and not rehashing as others have claimed...


>You could ask for spammers to cooperatively self-mark their messages.
>But this hasn't been terribly productive.


Obviously I am not asking for that or any thing like that.  See above.


>  It is also pointless to ask for
>cooperative identification of non-spammers and identify spammers as those
>not in the set of non-spammers.


I am also not asking for this, and it is instructive to understand how I am not.

I am only making a definition, so that one can model under the benefits of that definition.  What people actually do is a different matter, but as I pointed out previously in this thread, once you model spam the way I have proposed, then solicited *BE will have a distinct advantage to adopt the model.  And as I point out above, it doesn't matter what spammers do, because the improved model is helpful for advancing detection in both cases.

And my other point has been that when a channel gets so saturated with noise that you can not longer find the original signal reliably (as you say above the S/N ratio will depend on Nyquist, which is a very crucial point), then solicited *BE and receivers are going to need a different model, else information transmission will no longer occur reliably.


>So given a set of unmarked messages, some spam, some not-spam, the task is
>to have a program mark them in the same way that a human would if a human
>were reading the messages. Since humans have different definitions of
>spam, it would be useful if the program could accept different definitions
>as well.  This is the realm of content analysis.


You see this is the crux of the whole stagnation of anti-spam in my view.  Content has nothing to do with what makes spam annoying.  It is the S/N factor, i.e. that it only gets a 0.005% response rate.

I am trying to shift the whole paradigm from thinking about psychology (will always be fuzzy result), to thinking and modeling the noise factor.

It is a profound paradigm shift that gets you closer to a more robust solution for detection.


>> Thus my whole motivation for an unambiguous definition (spam == all bulk
>> email) along the channel and not just a definition at the end points
>> (UBE).
>
>You may need a precise definition before you can begin implementation
>(just like you need a definition of voltage, current, etc to begin
>building a transmitter),


Exactly.  You need a definition before you can model.


> but you do not need a precise definition to talk
>about the theoretical aspects.


Yes you do.


>  Spam could be defined as UCE, CE, UBE, or
>BE.  I have also a more complete and detailed taxonomy of spam:


Those are all definitions.



>There are 3 types of email that we generally call spam:


This is going down into the psychology line of model, which I am trying to paradigm shift away from, because it is not very well correlated to what makes spam a problem.  If spam had a 5% response rate, it would no longer be a problem.  Modeling the psychology is something other people are working on already.

[snip]

Thanks,
Shelby Moore
http://AntiViotic.com



[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]