Gitweb tries hard to properly process UTF-8 data, by marking output from git commands and contents of files as UTF-8 with to_utf8() subroutine. This ensures that gitweb would print correctly UTF-8 e.g. in 'log' and 'commit' views. Unfortunately it misses another source of potentially Unicode input, namely query parameters. The result is that one cannot search for a string containing characters outside US-ASCII. For example searching for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 bytes in UTF-8 and percent-encoded as %C5%81) result in the following incorrect data in search field MichaÅ Kiedrowicz This is caused by CGI by default treating '0xc5 0x82' bytes as two characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' query parameter is not processed explicitly as UTF-8 encoded string. The solution used here follows "Using Unicode in a Perl CGI script" article on http://www.lemoda.net/cgi/perl-unicode/index.html: use CGI; use Encode 'decode_utf8; my $value = params('input'); $value = decode_utf8($value); This is done when filling %input_params hash; this required to move from explicit $cgi->param(<label>) to $input_params{<name>} in a few places. Alternate solution would be to simply use the '-utf8' pragma (via "use CGI '-utf8';"), but according to CGI.pm documentation it may cause problems with POST requests containing binary files... and it doesn't work with old CGI.pm version 3.10 from Perl v5.8.6. Noticed-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> Signed-off-by: Jakub Narębski <jnareb@xxxxxxxxx> --- gitweb/gitweb.perl | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 9cf7e71..55b2c24 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -52,7 +52,7 @@ sub evaluate_uri { # as base URL. # Therefore, if we needed to strip PATH_INFO, then we know that we have # to build the base URL ourselves: - our $path_info = $ENV{"PATH_INFO"}; + our $path_info = decode_utf8($ENV{"PATH_INFO"}); if ($path_info) { if ($my_url =~ s,\Q$path_info\E$,, && $my_uri =~ s,\Q$path_info\E$,, && @@ -816,9 +816,9 @@ sub evaluate_query_params { while (my ($name, $symbol) = each %cgi_param_mapping) { if ($symbol eq 'opt') { - $input_params{$name} = [ $cgi->param($symbol) ]; + $input_params{$name} = [ map { decode_utf8($_) } $cgi->param($symbol) ]; } else { - $input_params{$name} = $cgi->param($symbol); + $input_params{$name} = decode_utf8($cgi->param($symbol)); } } } @@ -2767,7 +2767,7 @@ sub git_populate_project_tagcloud { } my $cloud; - my $matched = $cgi->param('by_tag'); + my $matched = $input_params{'ctag'}; if (eval { require HTML::TagCloud; 1; }) { $cloud = HTML::TagCloud->new; foreach my $ctag (sort keys %ctags_lc) { @@ -5282,7 +5282,7 @@ sub git_project_list_body { my $check_forks = gitweb_check_feature('forks'); my $show_ctags = gitweb_check_feature('ctags'); - my $tagfilter = $show_ctags ? $cgi->param('by_tag') : undef; + my $tagfilter = $show_ctags ? $input_params{'ctag'} : undef; $check_forks = undef if ($tagfilter || $searchtext); @@ -6197,7 +6197,7 @@ sub git_tag { sub git_blame_common { my $format = shift || 'porcelain'; - if ($format eq 'porcelain' && $cgi->param('js')) { + if ($format eq 'porcelain' && $input_params{'javascript'}) { $format = 'incremental'; $action = 'blame_incremental'; # for page title etc } -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html