Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 15, 2014 at 10:04:24AM +0100, Peter Krefting wrote:
> Michael Wagner:
> 
> >Decoding the UTF-8 encoded file name (again with an additional print
> >statement):
> >
> >$ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi
> >
> >work/Gütekriterien.txt
> >Content-disposition: inline; filename="work/Gütekriterien.txt"
> 
> You should fix the code path that created that URI, though, as it is not
> what you expected.
> 
> %C3%83 decodes to U+00C3 Latin Capital Letter A With Tilde
> %C2%BC decodes to U+00BC Vulgar Graction One Quarter
> 
> The proper UTF-8 encoding for ü (U+00FC) is, as you can probably guess from
> looking at which two characters the sequence above yielded, C3 BC, which in
> a URI is represented as %C3%BC.
> 
> Your QUERY_STRING should thus be
> 
>   p=notes.git;a=blob_plain;f=work/G%C3%BCtekriterien.txt;hb=HEAD
> 
> which probably works as expected.

Obviously, you are right, thanks.

> 
> What is happening is that whatever is generating the URI us UTF-8-encoding
> the string twice (i.e., it generates a string with the proper C3 BC in it,
> and then interprets it as iso-8859-1 data and runs that through a UTF-8
> encoder again, yielding the C3 83 C2 BC sequence you see above).
> 

The subroutine "git tree" generates the tree view. It stores the output
of "git ls-tree -z ..." in an array named "@entries". Printing the content
of this array yields the following result:

00644 blob 6419cd06a9461c38d4f94d9705d97eaaa887156a     520 Gütekriterien.txt

This leads to the "doubled" encoding. Declaring the encoding in the call
to open yields the following result:

100644 blob 6419cd06a9461c38d4f94d9705d97eaaa887156a     520 Gütekriterien.txt

---

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index a9f57d6..f1414e1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -7138,7 +7138,7 @@ sub git_tree {
        my @entries = ();
        {
                local $/ = "\0";
-               open my $fd, "-|", git_cmd(), "ls-tree", '-z',
+               open my $fd, "-|encoding(UTF-8)", git_cmd(), "ls-tree", '-z',
                        ($show_sizes ? '-l' : ()), @extra_options, $hash
                        or die_error(500, "Open git-ls-tree failed");
                @entries = map { chomp; $_ } <$fd>;

> -- 
> \\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]