[Forgot to hit Reply-to-all instead of Reply. Andrew, I'm sorry for duplicate email] On Sat, Mar 29, 2014 at 2:53 PM, Andrew Keller <andrew@xxxxxxxxxxxxxx> wrote: > > When formatting a diff header line, be sure to escape the raw output from git > for use as HTML. This ensures that when "problem characters" (&, <, >, ?, etc.) > exist in filenames, gitweb displays them as text, rather than letting the > browser interpret them as HTML. Actually gitweb tries to do the right thing, and in most cases it does HTML escaping correctly. The problem is only with *binary* files (this fact should IMHO have been mentioned in commit message). This issue is caused by two problems / errors. First, gitweb misclassify "Binary files a/foo and b/bar differ" as diff header, while it is untypical but it is diff contents. Second, gitweb doesn't HTML-escape unknown diff headers, assuming that it knows about all possible types. I have had those changes in my git repository, but I do not know if I have pushed it before the PC went down, and if I have it in backup. > > Reported-by: Dongsheng Song <dongsheng.song@xxxxxxxxx> > Signed-off-by: Andrew Keller <andrew@xxxxxxxxxxxxxx> Thank you for your work. > --- > Steps to reproduce: > > 1) Create a repository that contains a commit that adds a file: > * with an ampersand in the filename > * with binary contents > 2) Make the repository visible to gitweb > 3) In gitweb, navigate to the page that shows the diff for the commit > that added the file > > Behavior without patch: Page contains unescaped '&' instead of '&' > Page stops rendering when it gets to one of > the instances of the filename (in the diff header, to be specific), and > a light-red box appears a the top of the page, saying something like > this: > > This page contains the following errors: > > error on line 67 at column 22: xmlParseEntityRef: no name > > Below is a rendering of the page up to the first error. > > > (This particular error is what you get with an unescaped ampersand) This is caused by the fact that some browsers use strict XML validation mode when receiving 'application/xml+xhtml' contents, and not 'text/html'. So one might not notice it... > Behavior with patch: You see the ampersand in the file name, and the > page renders as one would expect. A question: it does not escapes HTML twice, i.e. you don't see '&' in rendered output ('&amp;' in page source)? > Other notes: > > Several helper methods exist for escaping HTML, and I was unsure > about which one to use. Although this patch fixes the problem, > it is possible that it may be more correct to use, for example, the > 'esc_html' helper method instead of interacting with $cgi directly. It is preferred to use esc_html instead of $cgi->escapeHTML. Even better use esc_path to escape pathnames. > The first hunk in the diff seems to be all that's required to experience > the good behavior, however upon inspecting the code, it seems > prudent to also include the second one. It's a nearby section of code, > doing similar work, and is called from the same function as the > former, and with similar arguments. I wonder if it would not interfere with processing of diff contents by gitweb, for example adding links to pre-image and post-image version of a file. > gitweb/gitweb.perl | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 79057b7..6c559f8 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2223,6 +2223,8 @@ sub format_git_diff_header_line { > my $diffinfo = shift; > my ($from, $to) = @_; > > + $line = $cgi->escapeHTML($line); > + > if ($diffinfo->{'nparents'}) { > # combined diff > $line =~ s!^(diff (.*?) )"?.*$!$1!; > @@ -2259,6 +2261,8 @@ sub format_extended_diff_header_line { > my $diffinfo = shift; > my ($from, $to) = @_; > > + $line = $cgi->escapeHTML($line); > + > # match <path> > if ($line =~ s!^((copy|rename) from ).*$!$1! && $from->{'href'}) { > $line .= $cgi->a({-href=>$from->{'href'}, -class=>"path"}, I'd have to examine code flow to check where it lands... -- Jakub Narębski -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html