Re: git on MacOSX and files with decomposed utf-8 file names

Kevin Ballard <kevin@xxxxxx> · Mon, 21 Jan 2008 19:47:49 -0500

Please read to the bottom of this email. As near as I can figure out,  
you haven't done that on any of my previous emails.

On Jan 21, 2008, at 6:44 PM, Linus Torvalds wrote:

On Mon, 21 Jan 2008, Kevin Ballard wrote:

I find it amusing that you keep arguing against having git treat  
filenames as
unicode when

NO I DO NOT!

Dammit, stop this idiocy.

I think it's fine having git treat filenames "as unicode", as long  
as you
don't do any munging on it.

When I say "treat filenames as unicode" I'm implying the equivalence  
comparisons and everything else that we've been talking about.

Why? Because if it's utf-8, then treating them "as unicode" means  
exactly
the same as treating them "as a user-specified string".

If that's what "as unicode" meant, then the phrase "as unicode" has  
zero meaning.

So stop lying about this whole thing. I have never *ever* argued  
against
unicode per se.

No, you've argued against unicode equivalency in filenames. Can't you  
figure out, when the entire time I've been talking about equivalency,  
that I'm *still* talking about equivalency?

All my complaints - every single one of them - comes down to making  
the
idiotic choice of trying to munge those strings (not even strictly
"normalize") into something they are not.

Yes, I understand quite well that you are against munging strings.

And what you don't seem to understand is that once you accept  
_unmodified_
raw UTF-8 as a good unicode transport mechanism, suddenly other  
encodings
are possible. I'm not out to force my world-view on users. If they are
using legacy encodings (whether in filenames *or* in commit texts or  
in
their file contents), that's *their* choice.

You're not using raw UTF-8, you're just using raw bytes. Calling it  
UTF-8 doesn't mean anything, since you don't actually know that's what  
it is. But this is fairly irrelevant.

I actually personally happen to use UTF-8-encoded unicode.

I'm just not stupid enough to think that (a) corrupting it is a good  
idea,
*or* (b) that I should force every Asian installation of git to also  
force
people to use unicode (or even having all the conversion libraries and
overheads!)

So stop this idiotic "unicode == normalization" crap.

I'm a huge fan of UTF-8. But that does not mean that I think  
normalization
is a good idea.

How many times must I say the same thing over and over? I'm not  
arguing that forced normalization is a good thing. I'm arguing that,  
in a system which is unicode-aware top to bottom, forced normalization  
is irrelevant to the user, since they don't care about the exact byte  
sequence. And I'm also arguing that git should have some solution to  
this problem. I find it interesting that you're perfectly happy to  
rant and rail against your misperception of my argument, and yet you  
consistently and repeatedly ignore my offers to stop this argument and  
work towards a solution, as well as my comments on existing proposed  
solutions.

Are you even reading to the end of my emails?

- Kevin Ballard

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com

<<attachment: smime.p7s>>