On 2019-11-01 12:59:21 -0400, Jeff King wrote: > On Fri, Nov 01, 2019 at 03:25:11PM +0700, Doan Tran Cong Danh wrote: > > > for encoding in utf-8 iso-8859-1; do > > # commit using the encoding > > echo $encoding >file && git add file > > echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding | > > git -c i18n.commitEncoding=$encoding commit -F - > > # and then fixup without it > > echo "$encoding fixed" >file && git add file > > git commit --fixup HEAD > > done > > git rebase -i --autosquash --root > > Is it worth adding this as a test in t3900? I think yes, but with a little more work. I'll make it as a separated patch in a re-roll. > > parse_commit(item->commit); > > - commit_buffer = get_commit_buffer(item->commit, NULL); > > + commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8"); > > I think there are several other spots in this file that could use the > same treatment. But I can live with it if you want to just fix the one > that's bugging you and move on. It's still a strict improvement. There're 6 more occurence of get_commit_buffer in sequencer.c, and 13 occurences in other C source files. I'll try to figure out if it's safe to change. Anyway, if we're going to working with a single encoding internally, can we take other extreme approach: reencode the commit message to utf-8 before writing the commit object? (Is there any codepoint in other encoding that can't be reencoded to utf-8?) Since git-log and friends are doing 2 steps conversion for commit message for now (reencode to utf-8 first, then reencode again to get_log_output_encoding()). With this new approach, first step is likely a noop (but must be kept for backward compatible). -- Danh