On Mon, May 13, 2019 at 04:17:24PM -0700, Elijah Newren wrote: > When fast-export encounters a commit with an 'encoding' header, it tries > to reencode in utf-8 and then drops the encoding header. However, if it > fails to reencode in utf-8 because e.g. one of the characters in the > commit message was invalid in the old encoding, then we need to retain > the original encoding or otherwise we lose information needed to > understand all the other (valid) characters in the original commit > message. Minor question: "utf-8" or "UTF-8" ? Mostly we use UTF-8 in Git. > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > --- > builtin/fast-export.c | 7 +++++-- > t/t9350-fast-export.sh | 21 ++++++++++++++++++++ > t/t9350/broken-iso-8859-7-commit-message.txt | 1 + > 3 files changed, 27 insertions(+), 2 deletions(-) > create mode 100644 t/t9350/broken-iso-8859-7-commit-message.txt > > diff --git a/builtin/fast-export.c b/builtin/fast-export.c > index 9e283482ef..7734a9f5a5 100644 > --- a/builtin/fast-export.c > +++ b/builtin/fast-export.c > @@ -642,9 +642,12 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, > printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); > if (show_original_ids) > printf("original-oid %s\n", oid_to_hex(&commit->object.oid)); > - printf("%.*s\n%.*s\ndata %u\n%s", > + printf("%.*s\n%.*s\n", > (int)(author_end - author), author, > - (int)(committer_end - committer), committer, > + (int)(committer_end - committer), committer); > + if (!reencoded && encoding) > + printf("encoding %s\n", encoding); > + printf("data %u\n%s", > (unsigned)(reencoded > ? strlen(reencoded) : message > ? strlen(message) : 0), > diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh > index c721026260..4fd637312a 100755 > --- a/t/t9350-fast-export.sh > +++ b/t/t9350-fast-export.sh > @@ -118,6 +118,27 @@ test_expect_success 'iso-8859-7' ' > ! grep ^encoding actual) > ' > > +test_expect_success 'encoding preserved if reencoding fails' ' > + > + test_when_finished "git reset --hard HEAD~1" && > + test_config i18n.commitencoding iso-8859-7 && > + echo rosten >file && > + git commit -s -F "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file && > + git fast-export wer^..wer >iso-8859-7.fi && > + sed "s/wer/i18n-invalid/" iso-8859-7.fi | > + (cd new && > + git fast-import && > + git cat-file commit i18n-invalid >actual && > + # Make sure the commit still has the encoding header > + grep ^encoding actual && > + # Verify that the commit has the expected size; i.e. > + # that no bytes were re-encoded to a different encoding. > + test 252 -eq "$(git cat-file -s i18n-invalid)" && > + # ...and check for the original special bytes > + grep $(printf "\360") actual && > + grep $(printf "\377") actual) > +' > + > test_expect_success 'import/export-marks' ' > > git checkout -b marks master && > diff --git a/t/t9350/broken-iso-8859-7-commit-message.txt b/t/t9350/broken-iso-8859-7-commit-message.txt > new file mode 100644 > index 0000000000..d06ad75b44 > --- /dev/null > +++ b/t/t9350/broken-iso-8859-7-commit-message.txt > @@ -0,0 +1 @@ > +Pi: ?; Invalid: ? > \ No newline at end of file > -- > 2.21.0.782.gd8be4ee826 >