On Wed, Jun 04, 2014 at 08:47:54PM +0200, Jakub Narębski wrote: > Michael Wagner wrote: > > On Tue, May 27, 2014 at 04:22:42PM +0200, Jakub Narębski wrote: > > >> Subject: [PATCH] gitweb: Harden UTF-8 handling in generated links > >> > >> esc_html() ensures that its input is properly UTF-8 encoded and marked > >> as UTF-8 with to_utf8(). Make esc_param() (used for query parameters > >> in generated URLs), esc_path_info() (for escaping path_info > >> components) and esc_url() use it too. > >> > >> This hardens gitweb against errors in UTF-8 handling; because > >> to_utf8() is idempotent it won't change correct output. > [...] > >> sub esc_param { > >> my $str = shift; > >> return undef unless defined $str; > >> + > >> + $str = to_utf8($str); > >> $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg; > >> $str =~ s/ /\+/g; > >> + > >> return $str; > >> } > > > While trying to view a "blob_plain" of "Gütekritierien.txt", a 404 error > > occured. "git_get_hash_by_path" tries to resolve the hash with the wrong > > filename (git ls-tree -z HEAD -- Gütekriterien.txt) and fails. > > > > The filename needs the correct encoding. Something like this is probably > > needed for all filenames and should be done at a prior stage: > > True. > > First, I wonder why the tests I did for this situation didn't > show any errors even before the "harden href()" patch. What > is different in your config that you see those errors? > Nothing special. It is reproducible with git 1.9.3 (Fedora 20), git instaweb (lighttpd) and LANG=de_DE.UTF-8. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html