Am 15.06.2015 um 02:12 schrieb Junio C Hamano: > Karsten Blees <karsten.blees@xxxxxxxxx> writes: > >> diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt >> index e9a1d5d..e5f6233 100644 >> --- a/Documentation/i18n.txt >> +++ b/Documentation/i18n.txt >> @@ -1,18 +1,28 @@ >> -At the core level, Git is character encoding agnostic. >> - >> - - The pathnames recorded in the index and in the tree objects >> - are treated as uninterpreted sequences of non-NUL bytes. >> - What readdir(2) returns are what are recorded and compared >> - with the data Git keeps track of, which in turn are expected >> - to be what lstat(2) and creat(2) accepts. There is no such >> - thing as pathname encoding translation. >> +Git is to some extent character encoding agnostic. > > I do not think the removal of the text makes much sense here unless > you add the equivalent to the new text below. > >> - The contents of the blob objects are uninterpreted sequences >> of bytes. There is no encoding translation at the core >> level. >> >> - - The commit log messages are uninterpreted sequences of non-NUL >> - bytes. >> + - Pathnames are encoded in UTF-8 normalization form C. This > > That is true only on some systems like OSX (with HFS+) and Windows, > no? BSDs in general and Linux do not do any such mangling IIRC. Modern Unices don't need any such mangling because UTF-8 NFC should be the default system encoding. I'm not sure for BSDs, but it has been the default on all major Linux distros for more than 10 years. > I > am OK with mangling described as a notable oddball to warn users, > though; i.e. not as a norm as your new text suggests but as an > exception. > I would guess that non-UTF-8 Unices (or file systems) are the oddball case, which is why I described them last. But I could be wrong. >> + platforms. If file system APIs don't use UTF-8 (which may be >> + file system specific), it is recommended to stick to pure >> + ASCII file names. > > Hmph, who endorsed such a recommendation? It is recommended to > stick to whatever naming scheme that would not cause troubles to > project participants. If your participants all want to (and can) > use ISO-8859-1, we do not discourage them from doing so. > ISO-8859-x file names may be fine if you won't ever need to: - use git-web, JGit, gitk, git-gui... - exchange repos with "normal" (UTF-8) Unices, Mac and Windows systems - publish your work on a git hosting service (and expect file and ref names to show up correctly in the web interface) - store the repo on Unicode-based file systems (JFS, Joliet, UDF, exFat, NTFS, HFS, CIFS...) These restrictions are not that obvious when you start a new git project, and while converting file names after the fact is possible (e.g. using the recodetree script we shipped with Git for Windows 1.7.10), it will destroy history. Thus I think we should strongly discourage users from using anything but UTF-8. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html