Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark Junker wrote:
Junio C Hamano schrieb:

I do not know how Macintosh libc implements "struc dirent", but
this approach does not work in general.

IMHO there is no need that this approach works in general because this is a fix for MacOSX systems only. I also use d_namlen which might not be available on other systems. But on MacOSX this works as expected.

yet you can obtain a path component longer than 256 bytes.
Apparently the library allocates longer d_name[] field than what
is shown to the user.

This is not a problem either because on MacOSX we get decomposed UTF8 and we always convert to composed UTF8. This means that the string returned from reencode_string will always be smaller than the original filename that had to be reencoded.


That's not true! There are strings which gets longer when a composing normalization is applied. Please see section 3.3 of Unicode Techical Report 36:

	http://www.unicode.org/reports/tr36/

> People assume that NFC always composes, and thus is the same or
> shorter length than the original source. However, some characters
> decompose in NFC.

(NFC = Normalization Form Composing.)

U+1D160 MUSICAL SYMBOL EIGHT NOTE is given as an example with a 3x expansion factor when encoded in UTF-8 (I don't know what it expands to; seems odd to me.)

	-hpa
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux