> "Georgios Kontaxis" <geko1702+commits@xxxxxxxxx> writes: > >>> ... so I doubt >>> the wisdom of munging the patch part at all. >>> >>> I may be sympathetic to the cause of the patch, but, I do not agree >>> with its execution in this iteration of the patch. >>> >> I see your point. >> >> It seems hiding e-mail addresses should be limited to the commit >> message, >> i.e., stop at the "---" line. > > I doubt it makes sense to redact anything in the 'patch' view at > all, actually. What kind of URL does the crawler need to formulate > and what pieces of information (like commit object names or branch > names) does it need to fill in the URL to get a series of patches > out of gitweb? As long as it takes more effort than running "git > clone" against the repository, the crawler would not have much > incentive to crawl and harvest addresses from the 'patch' pages, and > even in the log message part, the downsides of butchering the > payload would outweigh the "privacy benefit", I would have to say. > No effort at all I would say. E..g, somehow the web crawler gets to git.kernel.org. It then follows every link, eventually arriving at a commitdiff page. It then follows every link, which includes the URL for the patch output. See how "wget --mirror" behaves for instance. Just to clarify, my goal is not to stop someone who wants to extract e-mail address from git.kernel.org specifically. They can just "git clone" the repositories and grep through the logs. My goal is to stop generic crawlers (pretty much "wget --mirror | grep" scripts) from making their way to the logs. > Quite honestly, if a site claims to offer a 'patch' download UI but > returns corrupt data back, I would say it is much worse than not > offering the service at all. Perhaps disabling the 'patch' feature > in repositories that enable 'privacy' feature may be a much better > approach. > Good point. I think I'll try that.