Re: git on MacOSX and files with decomposed utf-8 file names

Kevin Ballard <kevin@xxxxxx> · Wed, 16 Jan 2008 17:06:05 -0500

On Jan 16, 2008, at 4:51 PM, Jakub Narebski wrote:

On the other hand, what you create the file with may not
be what you read back later, since the name has been standardized.
It's hard to say one is better than the other, they're just different
ways of doing it.

But using one encoding to create file, and another when reding  
filenames
is strange. It is IMHO better to simply refuse creating filenames  
which
are outside chosen encoding / normalization. But having different
encodings used for reading and writing on the level of filesystem
access (not on level of UI) is strange.

It's not using different encodings, it's all Unicode. However, it  
accepts different normalization variants of Unicode, since it can read  
them all and it would be folly to require everybody to conform to its  
own special internal variant. But it does have to normalize them,  
otherwise how would it detect the same filename using different  
normalizations? Also, it may seem strange to have different names  
between reading and writing, but that's only if you think of the name  
as a sequence of bytes - when treated as a sequence of characters, you  
get the same result. In other words, you're used to filenames as  
bytes, HFS+ treats filenames as strings.

However, I have noticed that everybody who's voiced
an opinion on this list in favor of the encoding-agnostic approach
seem to be unwilling to accept that any other approach might have
validity, to the extent of calling an OS/filesystem that does things
different stupid or insane. This strikes me as extremely elitist and
risks alienating what I expect to be a fast-growing group of users
(i.e. OS X users).

First, it is Git philosophy and very core of design to be encoding
agnostic (to be "content tracker"). Second, using the same sequence of
bytes on filesystem, in the index, and in 'tree' objects ensures good
performance... this is something to think about if you want to add
patches which would deal with HFS+ API/UI quirks.

Sure, it makes sense from a performance perspective, but it causes  
problems with HFS+ and any other filesystem that behaves the same way.  
In the previous discussion about case-sensitivity, somebody suggested  
using a lookup table to map between git's internal representation and  
the name the filesystem returns, which seems like a decent idea and  
one that could be enabled with a config parameter to avoid penalizing  
repos on other filesystems. But I don't know enough about the  
internals of git to even think of trying to implement it myself.

--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com

<<attachment: smime.p7s>>