[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Poll: Should mhonarc.org mail archives hide mail addresses



On January 1, 2004 at 23:26, Chuq Von Rospach wrote:

> > The problem with this approach is that it won't work with text-based
> > browsers.  Accessibility is something I try to maintain,
> 
> Sure it will. Jeffrey Zeldman has a lot of useful information on how to 
> be accessible and compliant by degrading gracefully. you can start 
> here: http://www.happycog.com/lectures/access/ to get a first cut on 
> this. The idea is to build things that use XHTML/CSS such that if 
> certain features aren't supported by a browser, the site does the 
> "right thing" instead of simply breaking, and does it without building 
> multiple versions with browser sniffing. And accessible means more than 
> sight-limited, it means alternative browsing tools, like my phone's 
> mini-browser, and search engines like google.

You are straying.  The CSS/XHTML approach was brought up as
a form of implementing dynamic obfsucation of addresses.  And as
I have noted, and you emphasize, obfsucation is extremely limited
as spammers will adapt.

> So accessibility is good. CSS/XHTML is good. and since mHonarc gets 
> used in so many sites where people have to skin an interface onto it, I 
> think moving to those models is a great idea (and basically a 
> no-brainer), once you get past a bunch of the myths about those tools.

MHonArc is neutral about CSS/XHTML since a user can customize the
layout as they see fit.  I think talking about CSS/XHTML is off-topic
unless someone provides a case of how it can be used to deal with
the harvesting problem.

> > I first thought of using libgd to have address changed into CGI
> > links that generate an image on the fly with showing email address.
> > I.e. Harvesters would have to use OCR to get the address.
> 
> and there's evidence that some harvester are experimenting in that 
> direction. After all, it's only CPU time, and they're infinitely 
> patient. Even if they only get a 10-15% hit rate on OCR conversions, 
> that merely means that have to hit the site 10 times to get everything. 
> That was the ultimate failure of the slashdot "random" obfuscation 
> tool: spammers didn't have to break all of them, just enough of them to 
> get useful data, and then cycle through the site enough times to get 
> around the versions they didn't crack. took about a week.

Actually, what some are doing is using Net people to do the work.
I.e. They post the image to a porn site and require people to solve
it before entering.  This what some are doing to auto-create Yahoo,
Hotmail, and similiar types of accounts for sending out spam.

Now, there is always cost-benefit ratio.  Wrt to account creations, the
benefits out-weight the cost.  But, to do it for each email address,
it may not be, especially if the graphics include techniques that
OCR systems cannot deal with.

> > Another alternative is to remove linking of addresses, and then
> > using a obfsucation technique like:
> >
> >   earl<!--
> >   -->&#64;<!--
> >   -->example.com
> >
> > This way the address renders like "earl@xxxxxxxxxxx" (and can be
> > copy-n-pasted by readers to their MUA), but a harverster may not
> > catch it.  Of course, a smart harvester that expands entity references
> > and deletes comment declarations would.
> 
> be very wary of "fixes" that merely make the problem more difficult. As 
> soon as they have a financial incentive to crack them, they'll be 
> cracked. you're basically looking to try to implement the "I don't have 
> to outrun the bear, I just have to outrun you" solution, meaning you 
> make it tough enough to crack they go harvest someone else's site.

I made the statement about the problems of obfsucation, even
in reference to the above.

One can look at the obfsucation model as similiar to detering
crime.  For example, a professional car thief can steal any car,
but if you make your car more time consuming to steal, they will
go elsewhere the cost is less.  Also, with certain measures, you
deter amateur thiefs.

Obfsucation works on a similiar principle.  Of course, if you
become a worthy target, a spammer may take the time to break
any obfsucation techniques (with the Slashdot story you provided
as a good example).

> In the case of mHonarc especially, that's a bad design choice. Since so 
> many sites use mHonarc, any change you make to mHonarc will be a focus 
> of the spammers to crack. mHonarc doesn't have the option of making it 
> tough enough for the spammers to go elsewhere. So you risk putting 
> energy into things that won't fix the problem long (if at all), and 
> worse, might create a false sense of security for developers and users 
> of the tools.

Actually, MHonArc allows you to complete hide addresses if you want.
However, there is one slight item that does expose author addresses,
so I personally can create a robot to harvest all author addresses from
an archive despite any resource settings by the archive maintainer.

I hope to fix this gap in a future release (once the Savannah
folks fix some issues with CVS).

> My suggestion: don't get involved in any "solution" that merely makes 
> it "harder" or "causes more work", because they only solve things as 
> long as the spammers don't feel it's worth it. and if you get into an 
> arms race with them, you'll lose.

I already stated something similiar to this.  Ideally, there is a
solution that does not use obfsucation but allows a human to determine
addresses.  Hence, the image idea.  But even that is technically a form
of obfsucation, wrt to a computer.  But when dealing with computers,
some forms of obfsucation may be sufficient if you start getting
to Turing level obfsucation.

> >  Mail-archive.com
> > uses a POST form to obfsucate addresses, but it is straight-forward
> > to customize a harvester to defeat it.
> 
> anythign with a large enough data-set to warrant the spammer's 
> attention will get it. mHonarc, sort of by definition, will be high on 
> their lists.

Technically, not MHonArc, but list archives.

> Obfuscation is a waste of energy. It works only as long as the spammers 
> don't bother worrying about it. Graphic representations are 
> non-accessible, crackable (via OCR) and not easily used by end-users, 

I'm not talking about end-users.  I am talking about the mhonarc.org
list archives, and only those archives.  Now, users may learn some
things in this discussion any what they may want to do with their
archives, but that is it.

The only thing relevant to MHonArc is that it allows users to
apply whatever solutions they want.

> I think a "guest" has no demand on access to sensitive data. I don't 
> allow "guests" open access to private mail lists, for instance, and I 
> see no reason why they should assume they should have access to it.

The mhonarc.org lists are not private lists.  MHonArc is an open
source project, and all the lists are intended to be as open as
possible.

--ewh


[Index of Archives]     [Bugtraq]     [Yosemite News]     [Mhonarc Home]