Re: git on MacOSX and files with decomposed utf-8 file names

JM Ibanez <jm@xxxxxxxxxxxxxxxxxxx> · Fri, 18 Jan 2008 06:01:13 +0800

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> So the whole "but they _look_ the same" argument is just total BS. In just 
> about all character encodings there has always been unique and different 
> "characters" that _look_ the same on screen, and it has never really made 
> them actually *be* the same, and it has never been a valid argument for 
> them being considered the same.

With the exception of Unicode. If you check the standard, two Unicode
codepoints (i.e. the numeric value that gets stored on disk) *can* map
to the same character, hence they are the same. They don't just look the
same, they are the same character -- even if the codepoints are
different (i.e. precomposed vs. decomposed characters). In fact, part of
the Unicode standard deals with that. (Technically, Unicode calls it
equivalence, but what the hey).

In other words, Unicode treats e.g. both U+0065 and U+00E9 as
fundamentally the same character. This comes even more into play in such
alphabets as Hangul (Korean) and the Japanese Kana.

-- 
JM Ibanez
Software Architect
Orange & Bronze Software Labs, Ltd. Co.

jm@xxxxxxxxxxxxxxxxxxx
http://software.orangeandbronze.com/
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html