Re: git on MacOSX and files with decomposed utf-8 file names

David Kastrup <dak@xxxxxxx> · Thu, 17 Jan 2008 00:58:47 +0100

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Wed, 16 Jan 2008, Kevin Ballard wrote:
>> 
>> There's a difference between "looks similar" as in "Polish" vs "polish", and
>> actually is the same string as in "Ma<UMLAUT MODIFIER>rchen" vs "M<A WITH
>> UMLAUT>rchen". Capitalization has a valid semantic meaning, normalization
>> doesn't. 
>
> That simply isn't true.
>
> Normalization actually has real semantic meaning. If it didn't, there
> would never ever be a reason why you'd use the non-normalized form in
> the first place.

Actually, there is no good reason for non-normalized forms (deficient
software not able to deal with some of the normalized forms is not a
good reason: such software should be fixed).

It is just that the file system is a rather quirky place for enforcing
the normalization.  One should not be able to get unnormalized forms
created easily in the first place, be it command line or script.

> And there *are* cases where there are distinctions. Especially inside
> computers. For one thing, you may not be talking about "characters on
> screen", but you may be talking about "key sequences". And suddenly
> "a<UMLAUT MODIFIER>" is a two-key sequence, and "<a WITH UMLAUT>" is a
> single-key sequence, and THEY ARE DIFFERENT.
>
> See?

No.  Input methods are not the same as their resulting string.  I can
even produce some ASCII characters on my keyboard in more than one way
and would not expect them to lead to different codes.

>> How do you figure? When I type "Märchen", I'm typing a string, not a
>> byte sequence. I have no control over the normalization of the
>> characters.  Therefore, depending on what program I'm typing the name
>> in, I might use the same normalization as the filename, or I might
>> miss. It's completely out of my control. This is why the filesystem
>> has to step in and say "You composed that character differently, but
>> I know you were trying to specify this file".
>
> Pure and utter garbage.
>
> What you are describing is an *input method* issue, not a filesystem
> issue.
>
> The fact that you think this has anything what-so-ever to do with
> filesystems, I cannot understand.

How nice.  We are actually in agreement here.

> See? Putting the conversion in the filesystem IS INSANE. You wouldn't
> make the filesystem convert the characters in the data stream (because
> it would cause strange data conversion issues) AND FOR EXACTLY THE
> SAME REASON it shouldn't do it for filenames either!

Yup.  But that does not mean that normalization is a bad idea.  It is
just that the filesystem is not the right place for it.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html