> Georgios Kontaxis via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: >> Gitweb extracts content from the Git log and makes it accessible >> over HTTP. As a result, e-mail addresses found in commits are >> exposed to web crawlers and they may not respect robots.txt. >> This can result in unsolicited messages. > >> Introduce an 'email-privacy' feature which redacts e-mail addresses >> from the generated HTML content > > A general reply to the topic: have you considered munging > addresses in a way that is still human readable, but obviously > obfuscated? > > On some other project, I settled on HTML "•" as a replacement > for '.' for admins who enable that option. The $USER@$NO_DOT > remains as-is for easy identification+recognition of hosts. > Thanks for the suggestion. People have been trying to hinder address harvesting for a while now. Replacing '@' with "at", the dot with "dot", adding spaces, etc. was pretty common at some point. May still be. I would expect crawlers to have caught up and this includes all sorts of character encodings and unicode look-alike substitutions. At the end of the day we are looking for something that's easy for humans to read but hard for scripts to parse as an e-mail address. (And that scripts cannot learn through an additional regex) I'm not aware of anything like that. (I know CAPTCHAs, etc.) > I also considered Unicode homographs which can look identical > to replacement characters, too; but rejected that idea since > it would cause grief for legitimate users who would not notice > the homograph when pasting into their mail client. > > Anyways, here's the list of candidates I tried: > > homograph∂80x24.org > homograph@80x24ͺorg > homograph@80x24·org > homograph@80x24•org > homograph@80x24.org > homograph﹫80x24.org > > https://en.wikipedia.org/wiki/Ano_Teleia#Similar_symbols > https://en.wikipedia.org/wiki/Enclosed_A > > homographⒶ80x24.org > homograph@80x24 org > homograph@80x24․org > homograph@80x24ꓸorg >