On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote: > Alexey Shumkin wrote: > > On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: > > >> 1. Log messages use the configured log output encoding, which is > >> meant to be whatever encoding works best with local terminals > >> (and does not have much to do with what encoding should be used > >> for email) > >> > >> 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw > >> port (which uses Unicode filesystem APIs), always UTF-8 > > > > I cannot say exactly if it makes sense for THIS patch, but I'd like to > > remind about Cygwin port, which definitely does not use UTF-8 encoding > > (in my case it is Windows-1251) for filenames. > > > >> > >> 3. The "This is an automated email" preface uses a project description > >> from .git/description, which is typically in UTF-8 to support > >> gitweb. > > Thanks for clarifying. So in the context you describe, (1) is > configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and > there is no way with current git facilities to force the email to use > a single encoding unless (3) happens to contain no special characters. > > What is the value of the "[i18n] commitEncoding" setting in your > project? commitEncoding is equal to filenames' encoding, Windows-1251, of course. > What encoding do the raw commit messages (shown with > "git log --format=raw") use for their text, and what do they declare > with an in-commit 'encoding' header, if any? Well, despite `git log --help` --8<-- raw The raw format shows the entire commit exactly as stored in the commit object" --8<-- on a Linux box (UTF-8) I can see "readable" commit messages nevertheless they are stored in 'Windows-1251' (so they are converted to UTF-8). To be sure I've checked actual content of them with `git cat-file commit` Actually, to be honest, I usually use modified version of Git (see ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could affect the results, so I've checked `git log --format=raw` with unmodified v1.8.3.3 of Git. But let's go back to the answer to your question. Commit encoding stored as a header in a raw commit messages is 'Windows-1251'. > > Does everyone on this project use Cygwin?i This is a "closed" (commercial) project and every developer uses Cygwin, except me. I use a Linux box as a desktop (mail, IM, web-browsing; but development goes on Cygwin). And sometimes I run utility scripts included to that project on my desktop (as far as Linux works with files much faster than Cygwin does ;)) Also, a Git server is a coLinux box (http://www.colinux.org/) on a Windows Server 2003, but I guess, it does not much matter here. > That should be fine, but > I'd expect there to be problems as soon as someone wants to try the > Mingw port ("Git for Windows"). Yep, one of our developers tried to use modern version of TortoiseGit with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW port transcodes filenames to store them internally in UTF-8. This problem could be solved with converting once that non-ASCII filenames to UTF-8, but I do not want to use MinGW port. I like Cygwin "infrastructure" that is more Linux-like than MinGW. > > I wonder if there should be an "[i18n] repositoryPathEncoding" > configuration item to support this kind of repository. Then git could > be aware of the intended encoding of paths, could recode them for > display to a terminal, and at least on Linux and Mingw could recode > them for use in filenames on disk. "repositoryPathEncoding = none" > would mean the current behavior of treating paths as raw sequences of > bytes. I'd be happy if such a setting exists. That could solve many problems with cross-platform projects with non-ASCII filenames. Indeed, MinGW port does resolve that problem somehow! > > What do you think? > Jonathan -- Alexey Shumkin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html