Re: [PATCH] Use FIX_UTF8_MAC to enable conversion from UTF8-MAC to UTF8

"H. Peter Anvin" <hpa@xxxxxxxxx> · Mon, 21 Jan 2008 20:08:46 -0800

Mark Junker wrote:
Junio C Hamano schrieb:

I do not know how Macintosh libc implements "struc dirent", but
this approach does not work in general.

IMHO there is no need that this approach works in general because this 
is a fix for MacOSX systems only. I also use d_namlen which might not be 
available on other systems. But on MacOSX this works as expected.

yet you can obtain a path component longer than 256 bytes.
Apparently the library allocates longer d_name[] field than what
is shown to the user.

This is not a problem either because on MacOSX we get decomposed UTF8 
and we always convert to composed UTF8. This means that the string 
returned from reencode_string will always be smaller than the original 
filename that had to be reencoded.

That's not true!  There are strings which gets longer when a composing 
normalization is applied.  Please see section 3.3 of Unicode Techical 
Report 36:

	http://www.unicode.org/reports/tr36/

> People assume that NFC always composes, and thus is the same or
> shorter length than the original source. However, some characters
> decompose in NFC.

(NFC = Normalization Form Composing.)

U+1D160 MUSICAL SYMBOL EIGHT NOTE is given as an example with a 3x 
expansion factor when encoded in UTF-8 (I don't know what it expands to; 
seems odd to me.)

	-hpa
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html