Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > So the whole "but they _look_ the same" argument is just total BS. In just > about all character encodings there has always been unique and different > "characters" that _look_ the same on screen, and it has never really made > them actually *be* the same, and it has never been a valid argument for > them being considered the same. With the exception of Unicode. If you check the standard, two Unicode codepoints (i.e. the numeric value that gets stored on disk) *can* map to the same character, hence they are the same. They don't just look the same, they are the same character -- even if the codepoints are different (i.e. precomposed vs. decomposed characters). In fact, part of the Unicode standard deals with that. (Technically, Unicode calls it equivalence, but what the hey). In other words, Unicode treats e.g. both U+0065 and U+00E9 as fundamentally the same character. This comes even more into play in such alphabets as Hangul (Korean) and the Japanese Kana. -- JM Ibanez Software Architect Orange & Bronze Software Labs, Ltd. Co. jm@xxxxxxxxxxxxxxxxxxx http://software.orangeandbronze.com/ - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html