Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michael Wagner:

Decoding the UTF-8 encoded file name (again with an additional print
statement):

$ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi

work/Gütekriterien.txt
Content-disposition: inline; filename="work/Gütekriterien.txt"

You should fix the code path that created that URI, though, as it is not what you expected.

%C3%83 decodes to U+00C3 Latin Capital Letter A With Tilde
%C2%BC decodes to U+00BC Vulgar Graction One Quarter

The proper UTF-8 encoding for ü (U+00FC) is, as you can probably guess from looking at which two characters the sequence above yielded, C3 BC, which in a URI is represented as %C3%BC.

Your QUERY_STRING should thus be

  p=notes.git;a=blob_plain;f=work/G%C3%BCtekriterien.txt;hb=HEAD

which probably works as expected.

What is happening is that whatever is generating the URI us UTF-8-encoding the string twice (i.e., it generates a string with the proper C3 BC in it, and then interprets it as iso-8859-1 data and runs that through a UTF-8 encoder again, yielding the C3 83 C2 BC sequence you see above).

--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]