On Fri, Jan 18, 2008 at 11:19:21AM +0100, Peter Karlsson wrote: > Linus Torvalds: > > > But that's exactly the case he gave - 'ä' vs 'a¨' are exactly that: > > different strings (not even characters: the second is actually a > > multi-character) that just look the same. > > But they are not different strings, they are canonically equivalent as > far as Unicode is concerned. There are canonically equivalent, but they are different sequences of characters as Unicode is concerned. In one case, we have one character in the other case, we have two characters that canonically equivalent to the first one. > They're even supposed to map to the same > glyph (if the font has an "ä", it should display it in both cases, if > it has an "a" and a combining diaeresis, it should make up one). By defition, sequences of characters that are canonically equivalent are both visual and functional equivalent... > You cannot do a binary comparison of text to see if two strings are > equivalent. Of course, you can't. Who argues otherwise? > > You try to twist the argument by just claiming that they are the same > > "character". They aren't, unless you *define* character to be the > > same as "glyph". > > Whereas you are confusing characters and code points. I am afraid it is you who confuses "characters" with "abstract characters", there is no place in the standard saying that "characters" are "abstract characters" only. On contrary, the term "characters" is used to refer non abstract characters. Dmitry - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html