Gitweb tries hard to properly process UTF-8 data, by marking output from git commands and contents of files as UTF-8 with to_utf8() subroutine. This ensures that gitweb would print correctly UTF-8 e.g. in 'log' and 'commit' views. Unfortunately it misses another source of potentially Unicode input, namely query parameters. The result is that one cannot search for a string containing characters outside US-ASCII. For example searching for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 bytes in UTF-8 and percent-encoded as %C5%81) result in the following incorrect data in search field MichaÅ Kiedrowicz This is caused by CGI by default treating '0xc5 0x82' bytes as two characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' query parameter is not processed explicitly as UTF-8 encoded string. According to "Using Unicode in a Perl CGI script" article on http://www.lemoda.net/cgi/perl-unicode/index.html the simplest solution is to just import '-utf8' pragma for CGI module: use CGI '-utf8'; my $value = params('input'); According to CGI module documentation, the '-utf8' pragma may cause problems with POST requests containing binary files... but gitweb currently do not use POST requests at all, so this should be not a problem now. Alternate solution would be to explicity decode query parameters when storing them in %input_params (and perhaps also path_info). [jn: reworded / rewritten commit message] Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> Signed-off-by: Jakub Narębski <jnareb@xxxxxxxxx> --- gitweb/gitweb.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 9cf7e71..a7441ef 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -10,7 +10,7 @@ use 5.008; use strict; use warnings; -use CGI qw(:standard :escapeHTML -nosticky); +use CGI qw(:standard :escapeHTML -nosticky -utf8); use CGI::Util qw(unescape); use CGI::Carp qw(fatalsToBrowser set_message); use Encode; -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html