Re: git on MacOSX and files with decomposed utf-8 file names

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 19 Jan 2008 23:26:23 -0800 (PST)

On Sun, 20 Jan 2008, Mike Hommey wrote:
> 
> That said, the locale doesn't necessarily express the language in which
> the document is written.

.. and quite commonly, there are multiple languages per document.

The good news is that sorting is almost never relevant or done over 
general documents. You sort almost only well-behaved data, and quite often 
the exact order is less than important: and when it is, you have very 
specific rules (which probably seldom have anything what-so-ever to do 
with general unicode ;).

> It's easy enough to read documents that are not
> written in your native language on the net. That's already what we are both
> doing right now. Fortunately, HTTP and HTML have ways to indicate the
> language in which a document is written in, but that leaves out plain
> mail, for instance. 

Well, Unicode already handles the "reading" part, just not the sorting.

> That said, the "decomposed" version of UTF-8 has nice side effects on
> OSX, with UTF-8 encoded RockRidge ISO-9660 volumes (with or without
> Joliet ; OSX will use RockRidge by default when it's there), for instance.

I think Unicode in general (and UTF-8 in particular) is a great thing. I 
do not argue against Unicode at all.  It's what I use myself.

The thing I argue against is that they force normalization (and then, as a 
secondary complaint, their insane choice of target format).

Linux is generally UTF-8 too, and does all of this much better. No forced 
normalization, and it uses UTF-8 everywhere as the encoding model. Joliet 
and RR works beautifully.

(I don't think RR is NFD, btw. It's the standard microsoft UTF-16 without 
normalization, afaik. I think you can happily generate a Rock Ridge disk 
that has two _different_ filenames that OS X cannot tell apart, but that 
both Linux and Windows can see peoperly)

		Linus

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html