On Sat, Jan 19, 2008 at 09:45:40PM -0800, Linus Torvalds wrote: > > > On Sun, 20 Jan 2008, Mike Hommey wrote: > > > > But there is no way to know whether 'ä' in a document is the Finnish 'ä' > > or a 'ä' from, say, German, that sorts after 'a'. > > ... without knowing the locale. Correct. > > That's why sorting is locale-dependent, even in Unicode. And why you > should always sort using the *combined* character, not think that you can > sort by decompsed sequence. That said, the locale doesn't necessarily express the language in which the document is written. It's easy enough to read documents that are not written in your native language on the net. That's already what we are both doing right now. Fortunately, HTTP and HTML have ways to indicate the language in which a document is written in, but that leaves out plain mail, for instance. That said, the "decomposed" version of UTF-8 has nice side effects on OSX, with UTF-8 encoded RockRidge ISO-9660 volumes (with or without Joliet ; OSX will use RockRidge by default when it's there), for instance. Mike - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html