On Fri, Jan 18, 2008 at 06:01:13AM +0800, JM Ibanez wrote: > > With the exception of Unicode. Nice exception... > If you check the standard, The standard of what? Could you provide the exact reference? > two Unicode > codepoints (i.e. the numeric value that gets stored on disk) Does the standard say something about disk storage? > *can* map to the same character, So what? > hence they are the same. non-sequitor. > They don't just look the > same, they are the same character Because? > -- even if the codepoints are > different (i.e. precomposed vs. decomposed characters). And where exactly does the standard says so? > In fact, part of > the Unicode standard deals with that. (Technically, Unicode calls it > equivalence, but what the hey). So they are not the same after all? It is just you don't care about what it actually says, right? How about this: Unicode provides a unique number for every character. So, if numbers are not the same then by definition of the Unicode standard those characters are different. > > In other words, Unicode treats e.g. both U+0065 and U+00E9 as > fundamentally the same character. There is no notion "fundamentally the same character" in the Unicode standard as far as I know, and the characters you mentioned are very different in Unicode: http://www.fileformat.info/info/unicode/char/0065/index.htm http://www.fileformat.info/info/unicode/char/00e9/index.htm There have different names, they have different glyphs, and they are functional different. Dmitry - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html