Re: Non-ASCII paths and git-cvsserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robin Rosenberg <robin.rosenberg.lists@xxxxxxxxxx> writes:

> måndag 13 november 2006 15:20 skrev Jakub Narebski:
>> sf wrote:
>> > Thanks, Junio. Paths with umlauts are returned correctly now both in
>> > UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic
>> > as git core.
>>
>> By the way, now that git has per user config file, ~/.gitconfig, perhaps
>> it is time to add i18n.filesystemEncoding configuration variable, to
>> automatically convert between filesystem encoding (somthing you usually
>> don't have any control over) and UTF-8 encoding of paths in tree objects.
>
> I'd prefer git to store filenames and comments in UTF-8 and convert on 
> input/output when and if it is necessary rather than forcing everybody to 
> take the hit. Most systems, but far from all, already use UTF-8 so it's a 
> noop for them. The only reason I want conversion is for the years to come 
> where we still live in two worlds of non-utf-8 and utf-8 and then forget 
> about everything non-utf-8, rather than carry around the baggage forever.

Pathnames in git core are encoding agnostic just like UNIX
pathnames are.  As you say, if the project convention is UTF-8
then it would not make any difference either way, so the status
quo is fine for people living in UTF-8 only world.

To people for whom it is inconvenient to work with UTF-8,
including me, it is always wrong to record UTF-8 at the core
level and try to autoconvert.  If (non-git) tools, libraries and
legacy-to-unicode roundtrip conversion were perfect, we would
have already converted and living in UTF-8 only world.  Projects
that choose to run with legacy pathname encoding should be
allowed to do so without taking the roundtrip risk converting to
and from UTF-8.

Interestingly enough, Linus mentioned this once, a lot better
than myself would have, here:

http://thread.gmane.org/gmane.comp.version-control.git/12240/focus=12279

Having said that, I am not opposed to have an option to make the
external interface to do the pathname conversion.  If your
project chooses to use euc-jp for commit messages, your
configuration variable i18n.commitencoding is set to euc-jp, and
if gitweb always wants to do its thing in utf-8 (which is
probably a sensible thing to do), it would make a lot of sense
to take the commit message and convert it from euc-jp to utf-8
before rendering it in HTML.  Maybe i18n.pathnameencoding could
be used for similar purposes for external interfaces.

But the core will stay encoding agnostic; pathnames stored in
the index and tree are what you can feed stat() and open(), and
what you read from readdir().  Maybe we could revisit this
decision in five years, but not now.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]