Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Gitweb tries hard to properly process UTF-8 data, by marking output > from git commands and contents of files as UTF-8 with to_utf8() > subroutine. This ensures that gitweb would print correctly UTF-8 > e.g. in 'log' and 'commit' views. > > Unfortunately it misses another source of potentially Unicode input, > namely query parameters. The result is that one cannot search for a > string containing characters outside US-ASCII. For example searching > for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L > WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 > bytes in UTF-8 and percent-encoded as %C5%81) result in the following > incorrect data in search field > > MichaÅ Kiedrowicz > > This is caused by CGI by default treating '0xc5 0x82' bytes as two > characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' > query parameter is not processed explicitly as UTF-8 encoded string. > > The solution used here follows "Using Unicode in a Perl CGI script" > article on http://www.lemoda.net/cgi/perl-unicode/index.html: > > use CGI; > use Encode 'decode_utf8; > my $value = params('input'); > $value = decode_utf8($value); > > This is done when filling %input_params hash; this required to move > from explicit $cgi->param(<label>) to $input_params{<name>} in a few > places. I'm sorry but this doesn't work for me. I would be happy to help if you have some questions about it. > > Alternate solution would be to simply use the '-utf8' pragma (via > "use CGI '-utf8';"), but according to CGI.pm documentation it may > cause problems with POST requests containing binary files... and > it doesn't work with old CGI.pm version 3.10 from Perl v5.8.6. > > Noticed-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> > Signed-off-by: Jakub Narębski <jnareb@xxxxxxxxx> > --- > gitweb/gitweb.perl | 12 ++++++------ > 1 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 9cf7e71..55b2c24 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -52,7 +52,7 @@ sub evaluate_uri { > # as base URL. > # Therefore, if we needed to strip PATH_INFO, then we know that we have > # to build the base URL ourselves: > - our $path_info = $ENV{"PATH_INFO"}; > + our $path_info = decode_utf8($ENV{"PATH_INFO"}); > if ($path_info) { > if ($my_url =~ s,\Q$path_info\E$,, && > $my_uri =~ s,\Q$path_info\E$,, && > @@ -816,9 +816,9 @@ sub evaluate_query_params { > > while (my ($name, $symbol) = each %cgi_param_mapping) { > if ($symbol eq 'opt') { > - $input_params{$name} = [ $cgi->param($symbol) ]; > + $input_params{$name} = [ map { decode_utf8($_) } $cgi->param($symbol) ]; > } else { > - $input_params{$name} = $cgi->param($symbol); > + $input_params{$name} = decode_utf8($cgi->param($symbol)); > } > } > } > @@ -2767,7 +2767,7 @@ sub git_populate_project_tagcloud { > } > > my $cloud; > - my $matched = $cgi->param('by_tag'); > + my $matched = $input_params{'ctag'}; > if (eval { require HTML::TagCloud; 1; }) { > $cloud = HTML::TagCloud->new; > foreach my $ctag (sort keys %ctags_lc) { > @@ -5282,7 +5282,7 @@ sub git_project_list_body { > > my $check_forks = gitweb_check_feature('forks'); > my $show_ctags = gitweb_check_feature('ctags'); > - my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef; > + my $tagfilter = $show_ctags ? $input_params{'ctag'} : undef; > $check_forks = undef > if ($tagfilter || $searchtext); > > @@ -6197,7 +6197,7 @@ sub git_tag { > > sub git_blame_common { > my $format = shift || 'porcelain'; > - if ($format eq 'porcelain' && $cgi->param('js')) { > + if ($format eq 'porcelain' && $input_params{'javascript'}) { > $format = 'incremental'; > $action = 'blame_incremental'; # for page title etc > } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html