On 2019-10-31 15:26:50 -0400, Jeff King wrote: > I'm confused about a few things here, though. I agree with you that the > subjects here are only used for finding the fixup/squash relationships. > But I don't understand the musl connection. You're right. Because of musl's iconv implementation, the problem is being shown up earlier. > Wouldn't failure to reencode here always be a problem? E.g., if I do: > > for encoding in utf-8 iso-8859-1; do > # commit using the encoding > echo $encoding >file && git add file > echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding | > git -c i18n.commitEncoding=$encoding commit -F - > # and then fixup without it > echo "$encoding fixed" >file && git add file > git commit --fixup HEAD > done > > GIT_EDITOR='echo; grep -v ^#' git rebase -i --root --autosquash > > then the resulting todo-list output (on my glibc system) is: > > pick 3a5bace éñcödèd with utf-8 > fixup aa9f09c fixup! éñcödèd with utf-8 > pick 6e85d32 éñcödèd with iso-8859-1 > pick 3ceac05 fixup! éñcödèd with iso-8859-1 > > I.e., we don't actually match up the second pair, and I think we > probably ought to. Yes, we ought to match up the second pair, and after changing get_commit_buffer to logmsg_reencode, we do. > > I guess the test in t3900 is less exotic; it uses the same encoding for > both commits. And it's just that "foo" and "!fixup foo" can (and do in > musl) end up with different encodings (because of the specific language, > and the vagaries of each iconv implementation). > > Would we have similar problems in all of the other functions which use > get_commit_buffer() without reencoding? For instance if I do this: > > echo base >file && git add file && git commit -m base > for encoding in utf-8 iso-8859-1; do > echo $encoding >file && git add file > echo "éñcödèd with $encoding" | iconv -f utf-8 -t $encoding | > git -c i18n.commitEncoding=$encoding commit -F - > done > git checkout -b side HEAD~2 > git cherry-pick master master^ > cat .git/sequencer/todo > > then the resulting todo file has a mix of iso-8859-1 and utf-8. > > It seems to me that we should always be working with the subjects in a > single encoding internally, I'm in favour of this idea. > and likewise outputting in that format > (which should probably be git_log_output_encoding(), for the instances > where we show it to the user). This is git's current behaviour but it's get_log_output_encoding() instead of git_log_output_encoding(). > I.e., we should always call logmsg_reencode() instead of > get_commit_buffer(). -- Danh