Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 21, 2008 at 02:05:51PM -0500, Kevin Ballard wrote:
> >
> >But that is *entirely* a separate issue from "normalization".
> >
> >Kevin, you seem to think that normalization is somehow forced on you  
> >by
> >the "text-as-codepoints" decision, and that is SIMPLY NOT TRUE.
> >Normalization is a totally separate decision, and it's a STUPID one,
> >because it breaks so many of the _nice_ properties of using UTF-8.
> 
> I'm not saying it's forced on you, I'm saying when you treat filenames  
> as text,

to treat as text could mean different for different people. Some
may prefer to fi and fi_ligature to be treated as same in some
context.

> it DOESN'T MATTER if the string gets normalized. As long as  
> the string remains equivalent,

As matter of fact it does, otherwise characters would be the
same and we would not have this conversation at all. String
can be equivalent and not equivalent at the time, because there
are different equivalent relations. Finally, what HFS+ does
is even not normalization. In the technote, Apple explains
that they decompose some characters but not others for better
compatibility. So, you see, there is a PROBLEM here.

> YOU DON'T CARE about the underlying  
> byte stream.

It is not about byte stream. After all, if it were UTF-16 instead
of UTF-8, it would be one to one conversion for each character.
So, what gets corrupted by HFS+ are Unicode *characters*.

> 
> Alright, fine. I'm not saying HFS+ is right in storing the normalized  
> version, but I do believe the authors of HFS+ must have had a reason  
> to do that,

I don't say they do that without *any* reason, but I suppose all
Apple developers in the Copland project had some reasons for they
did, but the outcome was not very good...

> The only information you lose when doing canonical normalization is  
> what the original byte sequence was. 

Not true. You lose the original sequence of *characters*.

Dmitry
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux