On Sat, Sep 13, 2014 at 09:01:20PM -0400, Jeff King wrote: > [1] We do use the mbox format in git, and AFAIK do not do any > From-quoting of this nature. I haven't tested, but I suspect that > certain format-patch output would be corrupted when reading back via > "git am", let alone other random mbox readers. If we wanted to do > the QP magic brian suggests, it would probably make sense to do it > as part of format-patch. It looks like we have a reasonably sane is_from_line() function. So at least _we_ will not generally break on reading our own output, except in some extreme circumstances (you'd have to come up with something contrived like "From me, at 10:30 30 minutes before 11!"). We can pretty easily reuse this to make the from-line check in mailinfo more robust. Here's a replacement for the patch I sent earlier that keeps the "magically ignore extra >From headers" behavior but fixes the case that started this discussion. We could still do the QP thing to protect against other downstream tools (or to cover the contrived cases as above), but I think with this patch we at least cover the sane cases. -- >8 -- Subject: mailinfo: make ">From" in-body header check more robust Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line inside body, 2006-05-21), we have treated lines like ">From" in the body as headers. This makes "git am" work for people who erroneously paste the whole mbox entry: From 12345abcd... From: them Subject: [PATCH] whatever into their email body (assuming that an mbox writer then quotes "From" as ">From", as otherwise we would actually mailsplit on the in-body line). However, this has false positives if somebody actually has a commit body that starts with "From "; in this case we erroneously remove the line entirely from the commit message. We can make this check more robust by making sure the line actually looks like a real mbox "From" line. To do this we pull the "is_from_line" definition out of mailsplit, and make it available for general use. Signed-off-by: Jeff King <peff@xxxxxxxx> --- Makefile | 1 + builtin/mailinfo.c | 2 +- builtin/mailsplit.c | 31 ------------------------------- cache.h | 6 ++++++ mbox.c | 32 ++++++++++++++++++++++++++++++++ t/t5100-mailinfo.sh | 18 ++++++++++++++++++ t/t5100/embed-from.expect | 5 +++++ t/t5100/embed-from.in | 13 +++++++++++++ t/t5100/quoted-from.expect | 3 +++ t/t5100/quoted-from.in | 10 ++++++++++ 10 files changed, 89 insertions(+), 32 deletions(-) create mode 100644 mbox.c create mode 100644 t/t5100/embed-from.expect create mode 100644 t/t5100/embed-from.in create mode 100644 t/t5100/quoted-from.expect create mode 100644 t/t5100/quoted-from.in diff --git a/Makefile b/Makefile index e0f15a3..e018450 100644 --- a/Makefile +++ b/Makefile @@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o LIB_OBJS += mailmap.o LIB_OBJS += match-trees.o +LIB_OBJS += mbox.o LIB_OBJS += merge.o LIB_OBJS += merge-blobs.o LIB_OBJS += merge-recursive.o diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c index cf11c8d..a434d39 100644 --- a/builtin/mailinfo.c +++ b/builtin/mailinfo.c @@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line, /* for inbody stuff */ if (starts_with(line->buf, ">From") && isspace(line->buf[5])) { - ret = 1; /* Should this return 0? */ + ret = is_from_line(line->buf + 1, line->len - 1); goto check_header_out; } if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) { diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 763cda0..11ba281 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -12,37 +12,6 @@ static const char git_mailsplit_usage[] = "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]"; -static int is_from_line(const char *line, int len) -{ - const char *colon; - - if (len < 20 || memcmp("From ", line, 5)) - return 0; - - colon = line + len - 2; - line += 5; - for (;;) { - if (colon < line) - return 0; - if (*--colon == ':') - break; - } - - if (!isdigit(colon[-4]) || - !isdigit(colon[-2]) || - !isdigit(colon[-1]) || - !isdigit(colon[ 1]) || - !isdigit(colon[ 2])) - return 0; - - /* year */ - if (strtol(colon+3, NULL, 10) <= 90) - return 0; - - /* Ok, close enough */ - return 1; -} - static struct strbuf buf = STRBUF_INIT; static int keep_cr; diff --git a/cache.h b/cache.h index dfa1a56..9e71ca5 100644 --- a/cache.h +++ b/cache.h @@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv, int fd); int versioncmp(const char *s1, const char *s2); +/* + * Returns true if the line appears to be an mbox "From" line starting a new + * message. + */ +int is_from_line(const char *line, int len); + #endif /* CACHE_H */ diff --git a/mbox.c b/mbox.c new file mode 100644 index 0000000..75f3150 --- /dev/null +++ b/mbox.c @@ -0,0 +1,32 @@ +#include "cache.h" + +int is_from_line(const char *line, int len) +{ + const char *colon; + + if (len < 20 || memcmp("From ", line, 5)) + return 0; + + colon = line + len - 2; + line += 5; + for (;;) { + if (colon < line) + return 0; + if (*--colon == ':') + break; + } + + if (!isdigit(colon[-4]) || + !isdigit(colon[-2]) || + !isdigit(colon[-1]) || + !isdigit(colon[ 1]) || + !isdigit(colon[ 2])) + return 0; + + /* year */ + if (strtol(colon+3, NULL, 10) <= 90) + return 0; + + /* Ok, close enough */ + return 1; +} diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index 3e64a7a..9e1ad1c 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' ' ' +test_expect_success 'mailinfo finds headers after embedded From line' ' + mkdir embed-from && + git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in && + test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 && + git mailinfo embed-from/msg embed-from/patch \ + <embed-from/0001 >embed-from/out && + test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out +' + +test_expect_success 'mailinfo on message with quoted >From' ' + mkdir quoted-from && + git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in && + test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 && + git mailinfo quoted-from/msg quoted-from/patch \ + <quoted-from/0001 >quoted-from/out && + test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg +' + test_done diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect new file mode 100644 index 0000000..06a3a38 --- /dev/null +++ b/t/t5100/embed-from.expect @@ -0,0 +1,5 @@ +Author: Commit Author +Email: commit@xxxxxxxxxxx +Subject: patch subject +Date: Sat, 13 Sep 2014 21:13:23 -0400 + diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in new file mode 100644 index 0000000..5f3f84e --- /dev/null +++ b/t/t5100/embed-from.in @@ -0,0 +1,13 @@ +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 +From: Email Author <email@xxxxxxxxxxx> +Date: Sun, 25 May 2008 00:38:18 -0700 +Subject: [PATCH] email subject + +>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 +From: Commit Author <commit@xxxxxxxxxxx> +Date: Sat, 13 Sep 2014 21:13:23 -0400 +Subject: patch subject + +patch body +--- +patch diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect new file mode 100644 index 0000000..8c9d48c --- /dev/null +++ b/t/t5100/quoted-from.expect @@ -0,0 +1,3 @@ +>From the depths of history, we are stuck with the +flaky mbox format. + diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in new file mode 100644 index 0000000..847e1c4 --- /dev/null +++ b/t/t5100/quoted-from.in @@ -0,0 +1,10 @@ +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001 +From: Author Name <somebody@xxxxxxxxxxx> +Date: Sun, 25 May 2008 00:38:18 -0700 +Subject: [PATCH] testing quoted >From + +>From the depths of history, we are stuck with the +flaky mbox format. + +--- +patch -- 2.1.0.373.g91ca799 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html