On Mon, 21 Jan 2008, Kevin Ballard wrote: > > I'm really surprised that, after all of this, you're still horribly > misunderstanding my argument. I never said it was invisible. NEVER. You said it was invisible when you treat things "as text". Here's the quote: .. when you treat filenames as text, it DOESN'T MATTER if the string gets normalized .. Without ever apparently realizing that "as text" is part of the problem in itself. What is "text" to one person is gibberish to another. In particular, the biggest reason to not normalize is that you don't know it's text or Unicode in the first place. Which is why git doesn't do it. And no, even with filenames you don't know that they are "text". People encode stuff in them. And people don't always use UTF-8. Of course, you could ask everybody to create OS X-only programs that know that under OS X, you only have a subset of filenames. If so, you're complaining about the wrong tool. Especially when the whole point of the tool was to be distributed (not to mention coming from an environment that simply doesn't have the same silly limitations OS X has). So here's a few clues: - "as text" isn't "as unicode": it may well be Latin1 or EUC-JP or something. Yes, it's still used. Git doesn't care, and very consciously has avoided forcing character sets, even if the *default* (and notice how it's overridable) commit message encoding may be utf-8. - In fact, even in unicode, the difference between "identical" and "equivalent" strings exists, and even in the standard, unicode strings are very much defined to be arbitrary codepoint sequences, not normalized. So even for the very specific case of unicode text, it's simply not true that "it doesn't matter if the string gets normalized". The unicode spec itself talks about cases where even canonical normalization makes a difference. Search for this quote: "Not all processes are required to respect canonical equivalence. For example: * A function that collects a set of the General_Category values present in a string will and should produce a different value for <angstrom sign, semicolon> than for <A, combining ring above, greek question mark>, even though they are canonically equivalent. * A function that does a binary comparison of strings will also find these two sequences different." and notice that first case. Even things that are *very*much* aware of Unicode text do actually have cases where canonical equivalence doesn't mean crud. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html