Re: [PATCH/rfc] gitweb: open files (e.g. indextext.html) in utf8 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gerrit Pape <pape@xxxxxxxxxxx> writes:

> From: =?utf-8?q?Recai=20Okta=C5=9F?= <roktas@xxxxxxxxxx>

You don't need to use quoted-printable in 'From:' header embedded in
the mail body.  It should probably read

  From: "Recai Oktaş" <roktas@xxxxxxxxxx>
 
(provided that you can use utf-8 in email).

> gitweb used to use utf8 only in stdout.  As a result, included files
> like indextext.html appeared garbled if they contain utf8 characters.
> Now utf8 is also used when reading files.

It would better read as:

  Gitweb used to use utf8 mode only on STDOUT (actually ":utf8" output
  layer), relying on using to_utf8(...)  to convert input data from uft8
  to Perl internal form.  As a result, included files such as $home_text
  (indextext.html in default build configuration), or repository's
  README.html appeared garbled if they did contain UTF-8 characters.

  Now uft8 mode is used for all open invovations, also when reading files.

> The patch was submitted through
>  http://bugs.debian.org/487465
> 

Probably should have here

  Reported-by: Recai Oktaş <roktas@xxxxxxxxxx>
> Signed-off-by: Gerrit Pape <pape@xxxxxxxxxxx>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 90cd99b..96cb4e0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -16,7 +16,7 @@ use Encode;
>  use Fcntl ':mode';
>  use File::Find qw();
>  use File::Basename qw(basename);
> -binmode STDOUT, ':utf8';
> +use open qw(:std :utf8);
>  
>  BEGIN {
>  	CGI->compile() if $ENV{'MOD_PERL'};

It would be wonderfull if such simple solution worked.  We would be
then able to remove to_utf8() subroutine and do not worry that we
forgot to convert some string to Perl internal encoding, which could
result to curring wide (non US-ASCII) UTF-8 character to be cut in
half.  (On the other hand we wouldn't have $fallback_encoding).

Unfortunately there are two problem (or rather a problem and a half)
with this approach.


First is that with this patch gitweb doesn't pass gitweb test
t/t9500-gitweb-standalone-no-errors.sh (this is with perl v5.8.6)

*   ok 63: encode(commit): utf8
*   ok 64: encode(commit): iso-8859-1
*   ok 65: encode(log): utf-8 and iso-8859-1
[...]
* FAIL 71: URL: no project URLs, no base URL
        gitweb_run "p=.git;a=summary"
[Wed Jul  2 13:10:15 2008] gitweb.perl: utf8 "\xC4" does not map to Unicode \
at /path/to/git/t/trash directory/../../gitweb/gitweb.perl line 2298, \
<$fd> line 1.
[Wed Jul  2 13:10:15 2008] gitweb.perl: Malformed UTF-8 character \
(unexpected end of string) at [...]/gitweb/gitweb.perl line 2303, \
<$fd> line 1.

which is

	open my $fd, '-|', git_cmd(), 'for-each-ref',
		($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
		'--format=%(objectname) %(refname) %(subject)%00%(committer)',
		'refs/heads'
		or return;
2298:	while (my $line = <$fd>) {
		my %ref_item;

		chomp $line;
		my ($refinfo, $committerinfo) = split(/\0/, $line);
2303:		my ($hash, $name, $title) = split(' ', $refinfo, 3);


Second, what is minimal Perl version and Perl configuration (installed
modules) that support "use open qw(:std :utf8);"?  We do have some
minimal requirements for gitweb, and it would be nice if we didn't add
to them.  But we already require PerlIO, so it probably doesn't matter.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux