On Thu, 2 Jan 2012, Michał Kiedrowicz wrote: > Jakub Narebski <jnareb@xxxxxxxxx> wrote: > > > Gitweb tries hard to properly process UTF-8 data, by marking output > > from git commands and contents of files as UTF-8 with to_utf8() > > subroutine. This ensures that gitweb would print correctly UTF-8 > > e.g. in 'log' and 'commit' views. > > > > Unfortunately it misses another source of potentially Unicode input, > > namely query parameters. The result is that one cannot search for a > > string containing characters outside US-ASCII. For example searching > > for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L > > WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 > > bytes in UTF-8 and percent-encoded as %C5%81) result in the following > > incorrect data in search field > > > > MichaÅ Kiedrowicz > > > > This is caused by CGI by default treating '0xc5 0x82' bytes as two > > characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' > > query parameter is not processed explicitly as UTF-8 encoded string. > > > > The solution used here follows "Using Unicode in a Perl CGI script" > > article on http://www.lemoda.net/cgi/perl-unicode/index.html: > > > > use CGI; > > use Encode 'decode_utf8; > > my $value = params('input'); > > $value = decode_utf8($value); > > > > This is done when filling %input_params hash; this required to move > > from explicit $cgi->param(<label>) to $input_params{<name>} in a few > > places. > > I'm sorry but this doesn't work for me. I would be happy to help if you > have some questions about it. Strange. http://www.lemoda.net/cgi/perl-unicode/index.html says that those two approaches should be equivalent. The -utf8 pragma version doesn't work for me at all, while this one works in that if finds what it is supposed to, but shows garbage in search form. Will investigate. > > Alternate solution would be to simply use the '-utf8' pragma (via > > "use CGI '-utf8';"), but according to CGI.pm documentation it may > > cause problems with POST requests containing binary files... and > > it doesn't work with old CGI.pm version 3.10 from Perl v5.8.6. [...] > > @@ -816,9 +816,9 @@ sub evaluate_query_params { > > > > while (my ($name, $symbol) = each %cgi_param_mapping) { > > if ($symbol eq 'opt') { > > - $input_params{$name} = [ $cgi->param($symbol) ]; > > + $input_params{$name} = [ map { decode_utf8($_) } $cgi->param($symbol) ]; > > } else { > > - $input_params{$name} = $cgi->param($symbol); > > + $input_params{$name} = decode_utf8($cgi->param($symbol)); > > } > > } > > } -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html