Sorry, I forgot to reply all.
Robin Rosenberg wrote:
söndag 08 oktober 2006 12:16 skrev Jakub Narebski:
File content encoding is something (if it is outside US-ASCII of course)
that you would want either to have some default convention, or have it
embedded in the file itself (like XML, HTML, or Emacs' file variables)
to be able to read file _outside_ SCM.
Except for CR/LF, this is best solved outside of the SCM. There aren't that
may tools/users to warrant the complexity or performance hit I imagine to
solve it.
Path name encoding is something that is global property of a repository,
I think. We have i18n.commitEncoding configuration variable; we could
add i18n.pathnameEncoding quite easily I think (and some way for Git to
detect current filesystem pathname encoding, if possible). Although
BTW I think that i18n.commitEncoding information should be made persistent,
and copied when cloning repository.
*I* think git should use UTF-8 internally. Always. Clients could then have
the option to convert to local conventions.
Same for pathname. Internally all paths should be UTF-8 encoded. Encoding
commit info that way would make the i18n option obsolete also.
I am afraid it's not a good idea to convert file content to UTF-8 encoding
as GIT can manage non-text file, it's not safe to modify file content
stealthily by a VCS.
But I agree to use UTF-8 for path name in tree object, or add an encoding
property(not a user defined property) to the head of tree object, so GIT
won't do useless enc -> UTF-8 -> same_enc conversion. The second way has
a fault: two tree objects with same content in different encoding have
different SHA1 digests.
I have a patch for both these, but it's very ugly and probably has some memory
management problems, so I'll refrain from submitting for now. Knowing that it
exists may perhaps serve as starting point for discussion. It encodes
filenames in UTF-8 using LC_CTYPE as the local encoding, as well as commit
messages. An exception is when something looks like UTF-8, in which case it
will not convert input to git. When UTF-8 cannot be converted to the local
encoding on it's way out of git, the data remains in UTF-8 format. Branch and
tags names are not managed (yet, at least).
>
Good, hope GIT can deal with path names that are not in 8859_1 or UTF-8 encoding.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html