Robin Rosenberg <robin.rosenberg.lists@xxxxxxxxxx> writes: > måndag 13 november 2006 15:20 skrev Jakub Narebski: >> sf wrote: >> > Thanks, Junio. Paths with umlauts are returned correctly now both in >> > UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic >> > as git core. >> >> By the way, now that git has per user config file, ~/.gitconfig, perhaps >> it is time to add i18n.filesystemEncoding configuration variable, to >> automatically convert between filesystem encoding (somthing you usually >> don't have any control over) and UTF-8 encoding of paths in tree objects. > > I'd prefer git to store filenames and comments in UTF-8 and convert on > input/output when and if it is necessary rather than forcing everybody to > take the hit. Most systems, but far from all, already use UTF-8 so it's a > noop for them. The only reason I want conversion is for the years to come > where we still live in two worlds of non-utf-8 and utf-8 and then forget > about everything non-utf-8, rather than carry around the baggage forever. Pathnames in git core are encoding agnostic just like UNIX pathnames are. As you say, if the project convention is UTF-8 then it would not make any difference either way, so the status quo is fine for people living in UTF-8 only world. To people for whom it is inconvenient to work with UTF-8, including me, it is always wrong to record UTF-8 at the core level and try to autoconvert. If (non-git) tools, libraries and legacy-to-unicode roundtrip conversion were perfect, we would have already converted and living in UTF-8 only world. Projects that choose to run with legacy pathname encoding should be allowed to do so without taking the roundtrip risk converting to and from UTF-8. Interestingly enough, Linus mentioned this once, a lot better than myself would have, here: http://thread.gmane.org/gmane.comp.version-control.git/12240/focus=12279 Having said that, I am not opposed to have an option to make the external interface to do the pathname conversion. If your project chooses to use euc-jp for commit messages, your configuration variable i18n.commitencoding is set to euc-jp, and if gitweb always wants to do its thing in utf-8 (which is probably a sensible thing to do), it would make a lot of sense to take the commit message and convert it from euc-jp to utf-8 before rendering it in HTML. Maybe i18n.pathnameencoding could be used for similar purposes for external interfaces. But the core will stay encoding agnostic; pathnames stored in the index and tree are what you can feed stat() and open(), and what you read from readdir(). Maybe we could revisit this decision in five years, but not now. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html