On Tue, May 27, 2014 at 04:22:42PM +0200, Jakub Narębski wrote: > W dniu 2014-05-15 21:28, Jakub Narębski pisze: > > On Thu, May 15, 2014 at 8:48 PM, Michael Wagner <accounts@xxxxxxxxxxx> wrote: > >> On Thu, May 15, 2014 at 10:04:24AM +0100, Peter Krefting wrote: > >>> Michael Wagner: > > >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > >> index a9f57d6..f1414e1 100755 > >> --- a/gitweb/gitweb.perl > >> +++ b/gitweb/gitweb.perl > >> @@ -7138,7 +7138,7 @@ sub git_tree { > >> my @entries = (); > >> { > >> local $/ = "\0"; > >> - open my $fd, "-|", git_cmd(), "ls-tree", '-z', > >> + open my $fd, "-|encoding(UTF-8)", git_cmd(), "ls-tree", '-z', > >> ($show_sizes ? '-l' : ()), @extra_options, $hash > >> or die_error(500, "Open git-ls-tree failed"); > > > > Or put > > > > binmode $fd, ':utf8'; > > > > like in the rest of the code. > > > >> @entries = map { chomp; $_ } <$fd>; > >> > > > > Even better solution would be to use > > > > use open IN => ':encoding(utf-8)'; > > > > at the beginning of gitweb.perl, once and for all. > > Or harden esc_param / esc_path_info the same way esc_html > is hardened against missing ':utf8' flag. > > -- >8 -- > Subject: [PATCH] gitweb: Harden UTF-8 handling in generated links > > esc_html() ensures that its input is properly UTF-8 encoded and marked > as UTF-8 with to_utf8(). Make esc_param() (used for query parameters > in generated URLs), esc_path_info() (for escaping path_info > components) and esc_url() use it too. > > This hardens gitweb against errors in UTF-8 handling; because > to_utf8() is idempotent it won't change correct output. > > Reported-by: Michael Wagner <accounts@xxxxxxxxxxx> > Signed-off-by: Jakub Narębski <jnareb@xxxxxxxxx> > --- > gitweb/gitweb.perl | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index a9f57d6..77e1312 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -1548,8 +1548,11 @@ sub to_utf8 { > sub esc_param { > my $str = shift; > return undef unless defined $str; > + > + $str = to_utf8($str); > $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg; > $str =~ s/ /\+/g; > + > return $str; > } > > @@ -1558,6 +1561,7 @@ sub esc_path_info { > my $str = shift; > return undef unless defined $str; > > + $str = to_utf8($str); > # path_info doesn't treat '+' as space (specially), but '?' must be escaped > $str =~ s/([^A-Za-z0-9\-_.~();\/;:@&= +]+)/CGI::escape($1)/eg; > > @@ -1568,8 +1572,11 @@ sub esc_path_info { > sub esc_url { > my $str = shift; > return undef unless defined $str; > + > + $str = to_utf8($str); > $str =~ s/([^A-Za-z0-9\-_.~();\/;?:@&= ]+)/CGI::escape($1)/eg; > $str =~ s/ /\+/g; > + > return $str; > } > > -- > 1.7.1 > > While trying to view a "blob_plain" of "Gütekritierien.txt", a 404 error occured. "git_get_hash_by_path" tries to resolve the hash with the wrong filename (git ls-tree -z HEAD -- Gütekriterien.txt) and fails. The filename needs the correct encoding. Something like this is probably needed for all filenames and should be done at a prior stage: --- gitweb/gitweb.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 77e1312..e4a50e7 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4725,7 +4725,7 @@ sub git_print_tree_entry { } print " | " . $cgi->a({-href => href(action=>"blob_plain", hash_base=>$hash_base, - file_name=>"$basedir$t->{'name'}")}, + file_name=>"$basedir" . to_utf8($t->{'name'}))}, "raw"); print "</td>\n"; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html