Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jan 16, 2008, at 4:51 PM, Jakub Narebski wrote:

On the other hand, what you create the file with may not
be what you read back later, since the name has been standardized.
It's hard to say one is better than the other, they're just different
ways of doing it.

But using one encoding to create file, and another when reding filenames is strange. It is IMHO better to simply refuse creating filenames which
are outside chosen encoding / normalization. But having different
encodings used for reading and writing on the level of filesystem
access (not on level of UI) is strange.

It's not using different encodings, it's all Unicode. However, it accepts different normalization variants of Unicode, since it can read them all and it would be folly to require everybody to conform to its own special internal variant. But it does have to normalize them, otherwise how would it detect the same filename using different normalizations? Also, it may seem strange to have different names between reading and writing, but that's only if you think of the name as a sequence of bytes - when treated as a sequence of characters, you get the same result. In other words, you're used to filenames as bytes, HFS+ treats filenames as strings.

However, I have noticed that everybody who's voiced
an opinion on this list in favor of the encoding-agnostic approach
seem to be unwilling to accept that any other approach might have
validity, to the extent of calling an OS/filesystem that does things
different stupid or insane. This strikes me as extremely elitist and
risks alienating what I expect to be a fast-growing group of users
(i.e. OS X users).

First, it is Git philosophy and very core of design to be encoding
agnostic (to be "content tracker"). Second, using the same sequence of
bytes on filesystem, in the index, and in 'tree' objects ensures good
performance... this is something to think about if you want to add
patches which would deal with HFS+ API/UI quirks.

Sure, it makes sense from a performance perspective, but it causes problems with HFS+ and any other filesystem that behaves the same way. In the previous discussion about case-sensitivity, somebody suggested using a lookup table to map between git's internal representation and the name the filesystem returns, which seems like a decent idea and one that could be enabled with a config parameter to avoid penalizing repos on other filesystems. But I don't know enough about the internals of git to even think of trying to implement it myself.

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com


<<attachment: smime.p7s>>


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux