On Thu, 2 Feb 2012, Jakub Narebski wrote: > On Thu, 2 Feb 2012, Michał Kiedrowicz wrote: > > Jakub Narebski <jnareb@xxxxxxxxx> wrote: > > > > > Gitweb tries hard to properly process UTF-8 data, by marking output > > > from git commands and contents of files as UTF-8 with to_utf8() > > > subroutine. This ensures that gitweb would print correctly UTF-8 > > > e.g. in 'log' and 'commit' views. > > > > > > Unfortunately it misses another source of potentially Unicode input, > > > namely query parameters. The result is that one cannot search for a > > > string containing characters outside US-ASCII. For example searching > > > for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L > > > WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 > > > bytes in UTF-8 and percent-encoded as %C5%81) result in the following > > > incorrect data in search field > > > > > > MichaÅ Kiedrowicz > > > > > > This is caused by CGI by default treating '0xc5 0x82' bytes as two > > > characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' > > > query parameter is not processed explicitly as UTF-8 encoded string. > > > > > > The solution used here follows "Using Unicode in a Perl CGI script" > > > article on http://www.lemoda.net/cgi/perl-unicode/index.html: > > > > > > use CGI; > > > use Encode 'decode_utf8; > > > my $value = params('input'); > > > $value = decode_utf8($value); > > > > > > This is done when filling %input_params hash; this required to move > > > from explicit $cgi->param(<label>) to $input_params{<name>} in a few > > > places. > > > > I'm sorry but this doesn't work for me. I would be happy to help if you > > have some questions about it. > > Strange. http://www.lemoda.net/cgi/perl-unicode/index.html says that > those two approaches should be equivalent. The -utf8 pragma version > doesn't work for me at all, while this one works in that if finds what > it is supposed to, but shows garbage in search form. Is it what you mean by "this doesn't work for me", i.e. working search, garbage in search field? > Will investigate. Damn. If we use $cgi->textfield(-name => "s", -value => $searchtext) like in gitweb, CGI.pm would read $cgi->param("s") by itself - without decoding. To skip this we need to pass -force=>1 or -override=>1 (i.e. further changes to gitweb). -utf8 pragma works with more modern CGI.pm, but does not with 3.10. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html