On Thu, May 15, 2014 at 8:48 PM, Michael Wagner <accounts@xxxxxxxxxxx> wrote: > On Thu, May 15, 2014 at 10:04:24AM +0100, Peter Krefting wrote: >> Michael Wagner: >> >>>Decoding the UTF-8 encoded file name (again with an additional print >>>statement): >>> >>>$ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi >>> >>>work/Gütekriterien.txt >>>Content-disposition: inline; filename="work/Gütekriterien.txt" >> >> You should fix the code path that created that URI, though, as it is not >> what you expected. >> >> %C3%83 decodes to U+00C3 Latin Capital Letter A With Tilde >> %C2%BC decodes to U+00BC Vulgar Graction One Quarter >> >> The proper UTF-8 encoding for ü (U+00FC) is, as you can probably guess from >> looking at which two characters the sequence above yielded, C3 BC, which in >> a URI is represented as %C3%BC. >> >> Your QUERY_STRING should thus be >> >> p=notes.git;a=blob_plain;f=work/G%C3%BCtekriterien.txt;hb=HEAD >> >> which probably works as expected. >> >> What is happening is that whatever is generating the URI us UTF-8-encoding >> the string twice (i.e., it generates a string with the proper C3 BC in it, >> and then interprets it as iso-8859-1 data and runs that through a UTF-8 >> encoder again, yielding the C3 83 C2 BC sequence you see above). > > The subroutine "git tree" generates the tree view. It stores the output > of "git ls-tree -z ..." in an array named "@entries". Printing the content > of this array yields the following result: > > 00644 blob 6419cd06a9461c38d4f94d9705d97eaaa887156a 520 Gütekriterien.txt > > This leads to the "doubled" encoding. Declaring the encoding in the call > to open yields the following result: > > 100644 blob 6419cd06a9461c38d4f94d9705d97eaaa887156a 520 Gütekriterien.txt Good catch. Writing test for this would not be easy, and require some HTML parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery, ... or low level HTML::TreeBuilder, or other low level parser). > --- > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index a9f57d6..f1414e1 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -7138,7 +7138,7 @@ sub git_tree { > my @entries = (); > { > local $/ = "\0"; > - open my $fd, "-|", git_cmd(), "ls-tree", '-z', > + open my $fd, "-|encoding(UTF-8)", git_cmd(), "ls-tree", '-z', > ($show_sizes ? '-l' : ()), @extra_options, $hash > or die_error(500, "Open git-ls-tree failed"); Or put binmode $fd, ':utf8'; like in the rest of the code. > @entries = map { chomp; $_ } <$fd>; > Even better solution would be to use use open IN => ':encoding(utf-8)'; at the beginning of gitweb.perl, once and for all. Unfortunately the output equivalent requires creating Perl module for gitweb, to be able to use use open OUT => ':encoding(utf-8-with-fallback)'; -- Jakub Narebski -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html