On Mon, Jan 21, 2008 at 10:12:01AM -0800, Linus Torvalds wrote: > > > On Mon, 21 Jan 2008, Kevin Ballard wrote: > > On Jan 21, 2008, at 9:14 AM, Peter Karlsson wrote: > > > > > > I happen to prefer the text-as-string-of-characters (or code points, > > > since you use the other meaning of characters in your posts), since I > > > come from the text world, having worked a lot on Unicode text > > > processing. > > > > > > You apparently prefer the text-as-sequence-of-octets, which I tend to > > > dislike because I would have thought computer engineers would have > > > evolved beyond this when we left the 1900s. > > > > I agree. Every single problem that I can recall Linus bringing up as a > > consequence of HFS+ treating filenames as strings [..] > > You say "I agree", BUT YOU DON'T EVEN SEEM TO UNDERSTAND WHAT IS GOING ON. > > The fact is, text-as-string-of-codepoints (let's make the "codepoints" > obvious, so that there is no ambiguity, but I'd also like to make it clear > that a codepoint *is* how a Unicode character is defined, and a Unicode > "string" is actually *defined* to be a sequence of codepoints, and totally > independent of normalization!) is fine. > > That was never the issue at all. Unicode codepoints are wonderful. > > Now, git _also_ heavily depends on the actual encoding of those > codepoints, since we create hashes etc, so in fact, as far ass git is > concerned, names have to be in some particular encoding to be hashed, and > UTF-8 is the only sane encoding for Unicode. People can blather about > UCS-2 and UTF-16 and UTF-32 all they want, but the fact is, UTF-8 is > simply technically superior in so many ways that I don't even understand > why anybody ever uses anything else. Maybe because it's 1.5 times bigger for any text in chinese, japanese or korean ? Mike - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html