On Wed, 9 Jan 2008, David Miller wrote: > > > How do you generate those MSG_FILE/PATCH_FILE things? Using > > "git-mailinfo"? Do you by any chance give it the "-n" flag to make it > > *not* do the conversion to UTF8? > > I create them by hand in my editor. Ok. Apparently you get them in latin1, and save them as such. If you can make your editor/mail setup (I assume it's Gnu "bovine excrement" Emacs, since you say that you use your editor for email) use utf8 natively for saving any results, then all your problems should go away. That said, I suspect we could make git-commit just do the same thing that git-am already does, namely if it's not given an explicit character set for the input/output _and_ it's supposed to be in utf8, it could do the "guess_charset()" thing on a per-line basis. It's not perfect, but the reason git-am does that (through "git mailinfo") is exactly the fact that it's very easy indeed to have mixed messages with some parts in UTF-8 (the body, for example) and others *not* in utf-8 (eg have headers in Latin1). Doing the "check each line one at a time, see if it is already in UTF-8, otherwise assume it's the traditional Latin1" is kind of hacky, but it's probably better than just acceping a non-utf8 commit message and writing random data. For people who really want to use Latin1 (or any other non-utf8 model), we already have a way to get the current behaviour, by forcing something like [i18n] commitencoding = binary but we seem to have ended up with UTF-8 being the default encoding, so we should probably just make sure that we do end up writing valid utf-8 unless some other explicit commit encoding has been set up. So I think it's really your own fault for basically giving a latin1 message (and not using the tools that know how to convert emails correctly from *many* different encodings). But I *also* think that git probably should at least have warned you (I think it does, if you use "git commit" rather than "git commit-tree), and preferably have refused to write an invalid encoding or just converted from what is the most common one (and even if I feel a bit bad about just saying "latin1 is the default non-utf8 encoding", I think it makes sense for historical reasons). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html