Re: [PATCH v5] gitweb: redacted e-mail addresses feature.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Georgios Kontaxis via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote:
>> Gitweb extracts content from the Git log and makes it accessible
>> over HTTP. As a result, e-mail addresses found in commits are
>> exposed to web crawlers and they may not respect robots.txt.
>> This can result in unsolicited messages.
>
>> Introduce an 'email-privacy' feature which redacts e-mail addresses
>> from the generated HTML content
>
> A general reply to the topic: have you considered munging
> addresses in a way that is still human readable, but obviously
> obfuscated?
>
> On some other project, I settled on HTML "&#8226;" as a replacement
> for '.' for admins who enable that option.  The $USER@$NO_DOT
> remains as-is for easy identification+recognition of hosts.
>
Thanks for the suggestion.

People have been trying to hinder address harvesting for a while now.
Replacing '@' with "at", the dot with "dot", adding spaces, etc.
was pretty common at some point. May still be.
I would expect crawlers to have caught up and this includes
all sorts of character encodings and unicode look-alike substitutions.

At the end of the day we are looking for something that's easy for humans
to read but hard for scripts to parse as an e-mail address.
(And that scripts cannot learn through an additional regex)
I'm not aware of anything like that. (I know CAPTCHAs, etc.)

> I also considered Unicode homographs which can look identical
> to replacement characters, too; but rejected that idea since
> it would cause grief for legitimate users who would not notice
> the homograph when pasting into their mail client.
>
> Anyways, here's the list of candidates I tried:
>
> homograph∂80x24.org
> homograph@80x24ͺorg
> homograph@80x24·org
> homograph@80x24•org
> homograph@80x24.org
> homograph﹫80x24.org
>
> https://en.wikipedia.org/wiki/Ano_Teleia#Similar_symbols
> https://en.wikipedia.org/wiki/Enclosed_A
>
> homographⒶ80x24.org
> homograph@80x24 org
> homograph@80x24․org
> homograph@80x24ꓸorg
>





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux