Linus Torvalds: > But that's exactly the case he gave - 'ä' vs 'a¨' are exactly that: > different strings (not even characters: the second is actually a > multi-character) that just look the same. But they are not different strings, they are canonically equivalent as far as Unicode is concerned. They're even supposed to map to the same glyph (if the font has an "ä", it should display it in both cases, if it has an "a" and a combining diaeresis, it should make up one). You cannot do a binary comparison of text to see if two strings are equivalent. > You try to twist the argument by just claiming that they are the same > "character". They aren't, unless you *define* character to be the > same as "glyph". Whereas you are confusing characters and code points. "ä" and "a¨" use different code points, but they encode the same character, and from the user's perspective it is the *character* that is interesting (although he might confuse it with the glyph). > I don't know how NTFS works (I know it is Unicode-aware, and I think > it encodes filenames in UCS-2 or possibly UTF-16, Actually, NTFS is a bit broken. It sees file names as a string of 16-bit words. It doesn't check that it is valid UTF-16, or even valid UCS-2, it allows almost anything. Apple made Mac OS X handle filenames properly, by seeing that file names are a string of characters, not code points, so they use a canonical form for all characters (personally, I would have preferred the pre-composed form, though). -- \\// Peter - http://www.softwolves.pp.se/ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html