On Jan 21, 2008, at 4:33 PM, Linus Torvalds wrote:
On Mon, 21 Jan 2008, Kevin Ballard wrote:I'm not sure what you mean. I stated a fact - at least on OS X, the filename does not contribute to the listed filesize, so changing the encoding of the filename doesn't change the filesize. This isn't a philosophical point, it's afactual statement.And my point was that your *whole* argument boils down to "normalizationis invisible".When it isn't. It's not invisible for filenames, it's not invisible forfile contents.You're trying to claim that normalization cannot matter. I'm just pointingout that it sure as hell can. Exactly because lots of things don'tactually look at data other than as just a Unicode string. They do look atthe raw format. And that's true both of file contents and file names.I don't, but I do think this discussion revolves around filenames, thereforeit should not surprise you when I talk about filenames.I'm surprised that you make generalized sweeping statements about how it'sok to normalize because normalization is "invisible", and then when I point out that that isn't true, you try to limit it.And no, that normalization is not invisible EVEN IN FILENAMES. If it was,git wouldn't ever have noticed it, would it?
I'm really surprised that, after all of this, you're still horribly misunderstanding my argument. I never said it was invisible. NEVER.
I'm also surprised that you seem to care more about this argument then my offer to stop arguing and work towards fixing the problem.
And git tries to be a general data tool, not a Unicode-specific one.Yes, I realize that. See my previous message about discussing ideal vspracticality.I don't know which argument you're talking about. Git (and, btw, Linux) does the "ideal" thing (don't screw up peoples data), and it turns out to be the "practical" thing too (it can handle a wider range of cases than OSX can). So no, this is not "ideal" vs "practical". They aren't in any conflict here.
You misunderstand my point. In a previous email I specifically used the words "ideal" and "practical" to describe arguments, which is what I was referring to here.
I could argue against this, but frankly, I'm really tired of arguing this same point. I suggest we simply agree to disagree, and move on to actually fixingthe problem... and people have even suggested how. Hide the idiotic OS X choices by making a OS X-specific wrapper around readdir() that turns it into NFC.
And I've responded to that suggestion, multiple times, saying that this doesn't actually fix the problem, it only hides it.
That's just about the best we can do. We can't *fix* the thing that OS X loses information, but a least we can then show the lost information inthe same form it _probably_ was in originally. But no, it won't "fix" git on OS X.
Quite a while ago it was suggested that git uses a table that maps the original byte sequence as seen in the index to the form returned by readdir(). So far this has sounded like the best solution, but as I've said before I don't know git's internals enough (or, really, at all) to be able to work on this myself.
This solution should only "lose" information in the case where the index has 2 filenames that HFS+ treats as a single filename.
Is there some reason this won't work? -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@xxxxxx http://www.tildesoft.com
<<attachment: smime.p7s>>