On Jan 16, 2008, at 4:51 PM, Jakub Narebski wrote:
On the other hand, what you create the file with may not be what you read back later, since the name has been standardized. It's hard to say one is better than the other, they're just different ways of doing it.But using one encoding to create file, and another when reding filenames is strange. It is IMHO better to simply refuse creating filenames whichare outside chosen encoding / normalization. But having different encodings used for reading and writing on the level of filesystem access (not on level of UI) is strange.
It's not using different encodings, it's all Unicode. However, it accepts different normalization variants of Unicode, since it can read them all and it would be folly to require everybody to conform to its own special internal variant. But it does have to normalize them, otherwise how would it detect the same filename using different normalizations? Also, it may seem strange to have different names between reading and writing, but that's only if you think of the name as a sequence of bytes - when treated as a sequence of characters, you get the same result. In other words, you're used to filenames as bytes, HFS+ treats filenames as strings.
However, I have noticed that everybody who's voiced an opinion on this list in favor of the encoding-agnostic approach seem to be unwilling to accept that any other approach might have validity, to the extent of calling an OS/filesystem that does things different stupid or insane. This strikes me as extremely elitist and risks alienating what I expect to be a fast-growing group of users (i.e. OS X users).First, it is Git philosophy and very core of design to be encoding agnostic (to be "content tracker"). Second, using the same sequence of bytes on filesystem, in the index, and in 'tree' objects ensures good performance... this is something to think about if you want to add patches which would deal with HFS+ API/UI quirks.
Sure, it makes sense from a performance perspective, but it causes problems with HFS+ and any other filesystem that behaves the same way. In the previous discussion about case-sensitivity, somebody suggested using a lookup table to map between git's internal representation and the name the filesystem returns, which seems like a decent idea and one that could be enabled with a config parameter to avoid penalizing repos on other filesystems. But I don't know enough about the internals of git to even think of trying to implement it myself.
-- Kevin Ballard http://kevin.sb.org kevin@xxxxxx http://www.tildesoft.com
<<attachment: smime.p7s>>