Re: several messages

"Chris Lewis" <clewis@xxxxxxxxxx> · Wed, 12 Nov 2008 12:11:57 -0500

<redundant CC list snipped>

I wouldn't ordinarily reply to this, but Dean makes a number of
plausible pronouncements which simply aren't borne out in reality.

I'm using this as an educational opportunity for those with insufficient
experience in the field to make an informed judgement.

Dean Anderson wrote:
> I suggest people look at this document:
> 
> http://tools.ietf.org/html/draft-church-dnsbl-harmful-01

That expired over two years ago, and rightfully so.

It's filled with a great number of inaccuracies, including statistics
that don't even remotely resemble effectiveness and false positive rates
as seen by sites with typical mail flows.  Not to mention assertions
that are entirely at odds with virtually all operators of medium to very
large environments.

For example, the statistics of "BL3" (CBL) showing an effectiveness rate
of 45% and a FP rate (against desired email) of over 2% in a very small
sample set.  If the CBL FP rate was even as high as .01%, we'd not touch
it.  Our email flow in production does as many emails as his entire test
every few minutes, and the traps are peaking to that many emails per second.

[I'm picking on the CBL because I've studied it for a long time.  I'm
studying it because it's doing an amazingly good job on our mail flow,
and our experiences with it seem to be closely reflected by most other
sites, including some of the very largest email infrastructures that exist.]

Our effectiveness rate at catching _all_ spam exceeds 75% from the CBL
alone, with a false positive rate in the 5-6 per million range.  On the
trap, it's above 90%.  That is one of the lowest FP rates of any of the
techniques (DNSBL or otherwise) in our arsenal, and is far away the
highest effectiveness rate.  The non-DNSBL heuristics tend to have worse
FPs.

In our production environment (300,000 valid emails per day), we have
perhaps one false positive due to the CBL per day, and in virtually all
cases the CBL is correct - we can see the spewage of spam and viruses
from that IP in our own quarantine, it just happens to have one or two
valid emails mixed in too.  Which we fix (assist with eradicating the
infection problem, and forward the intercepted valid email) and move on
- no harm (no lost email, at worst delayed), and only good (fixed
infection, less junk) accrues from our blocks.

> Maybe you should do some math to show the response time of a DNSBL
> compared to the re-ip time.

Maybe I could, but I don't need to, because over the past 5 years our
experience with the CBL (as just one example) indicates that any
potential for very short re-ip times versus detection->publication time
isn't a significant factor nor is likely to be anytime soon, hence the
calculations are moot.

>> Many ISPs force DHCP IP-affinity significantly, and it's kinda hard
>> for most BOTs to force their cable modem/access router/whatever (which
>> is the real holder of the DHCP address) to refresh.
> 
> First, 
> 
> Second,
> 
> Third,

Our experience indicates that none of them are a significant factor in
CBL effectiveness over the past 5 years, and there's no indication that
they will become much more significant any time soon.

> I'm sure some IPs do stay static for much longer, particularly when the
> machine infected has a static ip address in a hosting facility.
> 
> But your premise is limited to residential carriers offering cable or
> dsl.  What goes against your premise is the efforts by residential
> carriers to disrupt server activities by keeping customers from having a
> static ip address. Rotating the address faster natually disrupts P2P
> services like bit-torrent, etc.  So, I'm kind of puzzled that you don't
> see this.

Actually, that won't disrupt them, unless the forced re-ip rate is
sufficiently high to interfere with "normal" desired traffic (like web
surfing or email).  Most of such tools (like bit-torrent) are inherently
immune to re-ip'ing, except insofar as a re-ip might break a transfer
mid-stream.  Which breaks email and web transfers too, and an ISP's
customers would get pretty mad about that.

Re-ip'ing would only be a real barrier to those protocols that rely on
being able to "call into a reasonably fixed IP" of, say, a torrent leaf
node instead of "call out to an advertised muster point".  We have 100%
inbound blocking of all connection attempts on all protocols.  Yet,
bit-torrent works here.  In other words, re-ip wouldn't stop torrent
except by breaking connections mid-stream, which causes destructive
interference to the very activity the ISP is contracted to supply to
their customers.

Residential carriers have a strong incentive to keep the same device on
the same IP as long as possible, even across reboots/reconnects -
allowing BOTs to re-ip very quickly lands their entire pool in DNSBLs
like the CBL at best (worse is local mail admin-applied manual
whole-range listings that probably never get reviewed), and worse
greatly complicates analysis and remediation.  In other words, how do
you correlate an IP that's changing every few minutes against an
external report with a timestamp of unknown accuracy (if there's a
useable timestamp at all)?

You can "see" a few places where rapid re-ip is likely being used (eg:
/24s with CBL detection densities exceeding 50% and sometimes hitting
100%), but they're all listed, no spam is getting through, and there's
no legitimate email observable in the stream in any case, because the
valid email is going by an ISP-provided relay.

The CBL benefits from a very important fact: virtually all IPs it
detects were never intended or used to send email directly to the
destination in the first place.  The holders of those IPs send their
valid email through ISP-provided relays.  Those who get listed generally
don't notice they're listed, because it doesn't affect them at all.

> What is this magic that detects the bot type from the contents of an
> email message? A broken header perhaps? What do you do if the bot
> doesn't send any obvious identifying information in its messages?

Other mechanisms, especially the CBL, tend to catch those.  _All_
usefully effective anti-spam implementations use multiple techniques.
They have to, because no technique is sufficiently effective on its own
to get to the effectiveness levels our users/customers demand.

> In
> general, one has to depend on spam-traps, then updating DNS zones, and
> then distributing that information, waiting for ttls to expire on
> previous queries.  Even if you pick up on a broken header or some such,
> it still takes time to distribute the updates. 

It does all that.  But as I have demonstrated, the bad email that gets
through because of this delay is a very small percentage (typically a
few percent) of what was correctly caught by the DNSBL.

> Efforts at text matching are easilly foiled.

Who said anything about text matching?

For the most part the detection heuristics stay static for far longer
than is necessary to achieve reasonable effectiveness.  If, say, the CBL
misses about an hour's worth of a BOT's first emissions from an infected
IP (that seems about right in terms of latencies), it's not missing the
next few days that most BOTs appear to emit at a minimum.

The realtime heuristics themselves generally stay stable for months at a
time.

> But unavoidably, spam from bots should go on for
> quite a while before the DNSBL system can react. Sometime after the
> system reacts, Bot spam should be stopped for a short time, until IP
> addresses change, and then resume, repeating the cycle. And the number
> of bots generally grows, meaning there is an exponential factor that
> works against your effectiveness at blocking.

Our experience is entirely counter to that theory, and there's no
indication the situation is going to change any time soon.

Furthermore, there are DNSBLs that make that theory, even _if_ it was
borne out by any level of empirical proof, entirely irrelevant.

The PBL is not a reactive DNSBL, and is just about as effective as the
CBL on much the same things as the CBL is.  I have seen less than 5
false positives in total since the day the PBL was first published (and
that's against 300,000 emails per day).  And those, as are the CBL's,
easily self-remedied.

The PBL isn't a replacement for the CBL nor vice-versa.  There's some
that one catches that the other doesn't, and vice-versa.  The CBL even
publishes some metrics about that.

> Still, all-in-all, 70%
> sounds much more reasonable than the 99% others sometimes state.

You misunderstand the statistic I think.  That 70% is the percentage of
CBL hits where our detectors can uniquely identify the specific BOT at
fault.  CBL's heuristics and cross-section are obviously broader than
ours - eg: the remaining 30%.  That is only a poor indication of how
much of the total of spam is BOTs - probably above 70%, but that's as
far as that goes.  But, I'll agree, 99% is too high for that number.
The real number is likely at the upper end of 80-95%.

> But as
> DCHP lease times go down, so will your percentage.  It will ultimately
> reach very low numbers; its just a matter of time.

So we've gone from "are very low numbers" (which I've demonstrated they
aren't) to "ultimately reach low numbers".  What next?

The CBL's effectiveness and accuracy has been increasing over the 5
years it's been in operation, not declining.  There is no empirical
indication that this is changing while exploited machines (including
BOTs) remain a significant factor in spam.

There are other more likely scenarios than "re-ip" where CBL-like DNSBLs
might start losing effectiveness as they are used today.  Some that have
been seen in the wild over the past year or two.  But they as yet aren't
a significant percentage either.

> Those who are still
> blocking a high percentage effectiveness have to know the bots in
> advance.

Which is why people tend to use the PBL too.
_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf