Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 21, 2008, at 4:33 PM, Linus Torvalds wrote:

On Mon, 21 Jan 2008, Kevin Ballard wrote:

I'm not sure what you mean. I stated a fact - at least on OS X, the filename does not contribute to the listed filesize, so changing the encoding of the filename doesn't change the filesize. This isn't a philosophical point, it's a
factual statement.

And my point was that your *whole* argument boils down to "normalization
is invisible".

When it isn't. It's not invisible for filenames, it's not invisible for
file contents.

You're trying to claim that normalization cannot matter. I'm just pointing
out that it sure as hell can. Exactly because lots of things don't
actually look at data other than as just a Unicode string. They do look at
the raw format.

And that's true both of file contents and file names.

I don't, but I do think this discussion revolves around filenames, therefore
it should not surprise you when I talk about filenames.

I'm surprised that you make generalized sweeping statements about how it's
ok to normalize because normalization is "invisible", and then when I
point out that that isn't true, you try to limit it.

And no, that normalization is not invisible EVEN IN FILENAMES. If it was,
git wouldn't ever have noticed it, would it?

I'm really surprised that, after all of this, you're still horribly misunderstanding my argument. I never said it was invisible. NEVER.

I'm also surprised that you seem to care more about this argument then my offer to stop arguing and work towards fixing the problem.

And git tries to be a general data tool, not a Unicode-specific one.

Yes, I realize that. See my previous message about discussing ideal vs
practicality.

I don't know which argument you're talking about. Git (and, btw, Linux) does the "ideal" thing (don't screw up peoples data), and it turns out to be the "practical" thing too (it can handle a wider range of cases than OS
X can).

So no, this is not "ideal" vs "practical". They aren't in any conflict
here.

You misunderstand my point. In a previous email I specifically used the words "ideal" and "practical" to describe arguments, which is what I was referring to here.

I could argue against this, but frankly, I'm really tired of arguing this same point. I suggest we simply agree to disagree, and move on to actually fixing
the problem.

.. and people have even suggested how. Hide the idiotic OS X choices by making a OS X-specific wrapper around readdir() that turns it into NFC.

And I've responded to that suggestion, multiple times, saying that this doesn't actually fix the problem, it only hides it.

That's just about the best we can do. We can't *fix* the thing that OS X loses information, but a least we can then show the lost information in
the same form it _probably_ was in originally.

But no, it won't "fix" git on OS X.

Quite a while ago it was suggested that git uses a table that maps the original byte sequence as seen in the index to the form returned by readdir(). So far this has sounded like the best solution, but as I've said before I don't know git's internals enough (or, really, at all) to be able to work on this myself.

This solution should only "lose" information in the case where the index has 2 filenames that HFS+ treats as a single filename.

Is there some reason this won't work?

-Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux