Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Gitweb tries hard to properly process UTF-8 data, by marking output > from git commands and contents of files as UTF-8 with to_utf8() > subroutine. This ensures that gitweb would print correctly UTF-8 > e.g. in 'log' and 'commit' views. > > Unfortunately it misses another source of potentially Unicode input, > namely query parameters. The result is that one cannot search for a > string containing characters outside US-ASCII. For example searching > for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L > WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 > bytes in UTF-8 and percent-encoded as %C5%81) result in the following > incorrect data in search field > > MichaÅ Kiedrowicz > > This is caused by CGI by default treating '0xc5 0x82' bytes as two > characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' > query parameter is not processed explicitly as UTF-8 encoded string. > > According to "Using Unicode in a Perl CGI script" article on > http://www.lemoda.net/cgi/perl-unicode/index.html the simplest > solution is to just import '-utf8' pragma for CGI module: > > use CGI '-utf8'; > my $value = params('input'); > > According to CGI module documentation, the '-utf8' pragma may cause > problems with POST requests containing binary files... but gitweb > currently do not use POST requests at all, so this should be not a > problem now. This was exactly my feeling when I sent this patch :). > > Alternate solution would be to explicity decode query parameters when > storing them in %input_params (and perhaps also path_info). > > [jn: reworded / rewritten commit message] > > Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> Thanks, I forgot about that. > Signed-off-by: Jakub Narębski <jnareb@xxxxxxxxx> > --- > gitweb/gitweb.perl | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 9cf7e71..a7441ef 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -10,7 +10,7 @@ > use 5.008; > use strict; > use warnings; > -use CGI qw(:standard :escapeHTML -nosticky); > +use CGI qw(:standard :escapeHTML -nosticky -utf8); > use CGI::Util qw(unescape); > use CGI::Carp qw(fatalsToBrowser set_message); > use Encode; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html