Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 13, 2014 at 09:01:20PM -0400, Jeff King wrote:

> [1] We do use the mbox format in git, and AFAIK do not do any
>     From-quoting of this nature.  I haven't tested, but I suspect that
>     certain format-patch output would be corrupted when reading back via
>     "git am", let alone other random mbox readers.  If we wanted to do
>     the QP magic brian suggests, it would probably make sense to do it
>     as part of format-patch.

It looks like we have a reasonably sane is_from_line() function. So at
least _we_ will not generally break on reading our own output, except in
some extreme circumstances (you'd have to come up with something
contrived like "From me, at 10:30 30 minutes before 11!").

We can pretty easily reuse this to make the from-line check in mailinfo
more robust. Here's a replacement for the patch I sent earlier that
keeps the "magically ignore extra >From headers" behavior but fixes the
case that started this discussion.

We could still do the QP thing to protect against other downstream tools
(or to cover the contrived cases as above), but I think with this patch
we at least cover the sane cases.

-- >8 --
Subject: mailinfo: make ">From" in-body header check more robust

Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
inside body, 2006-05-21), we have treated lines like ">From"
in the body as headers. This makes "git am" work for people
who erroneously paste the whole mbox entry:

  From 12345abcd...
  From: them
  Subject: [PATCH] whatever

into their email body (assuming that an mbox writer then
quotes "From" as ">From", as otherwise we would actually
mailsplit on the in-body line).

However, this has false positives if somebody actually has a
commit body that starts with "From "; in this case we
erroneously remove the line entirely from the commit
message. We can make this check more robust by making sure
the line actually looks like a real mbox "From" line.

To do this we pull the "is_from_line" definition out of
mailsplit, and make it available for general use.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 Makefile                   |  1 +
 builtin/mailinfo.c         |  2 +-
 builtin/mailsplit.c        | 31 -------------------------------
 cache.h                    |  6 ++++++
 mbox.c                     | 32 ++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh        | 18 ++++++++++++++++++
 t/t5100/embed-from.expect  |  5 +++++
 t/t5100/embed-from.in      | 13 +++++++++++++
 t/t5100/quoted-from.expect |  3 +++
 t/t5100/quoted-from.in     | 10 ++++++++++
 10 files changed, 89 insertions(+), 32 deletions(-)
 create mode 100644 mbox.c
 create mode 100644 t/t5100/embed-from.expect
 create mode 100644 t/t5100/embed-from.in
 create mode 100644 t/t5100/quoted-from.expect
 create mode 100644 t/t5100/quoted-from.in

diff --git a/Makefile b/Makefile
index e0f15a3..e018450 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
+LIB_OBJS += mbox.o
 LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index cf11c8d..a434d39 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line,
 
 	/* for inbody stuff */
 	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
-		ret = 1; /* Should this return 0? */
+		ret = is_from_line(line->buf + 1, line->len - 1);
 		goto check_header_out;
 	}
 	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 763cda0..11ba281 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -12,37 +12,6 @@
 static const char git_mailsplit_usage[] =
 "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
 
-static int is_from_line(const char *line, int len)
-{
-	const char *colon;
-
-	if (len < 20 || memcmp("From ", line, 5))
-		return 0;
-
-	colon = line + len - 2;
-	line += 5;
-	for (;;) {
-		if (colon < line)
-			return 0;
-		if (*--colon == ':')
-			break;
-	}
-
-	if (!isdigit(colon[-4]) ||
-	    !isdigit(colon[-2]) ||
-	    !isdigit(colon[-1]) ||
-	    !isdigit(colon[ 1]) ||
-	    !isdigit(colon[ 2]))
-		return 0;
-
-	/* year */
-	if (strtol(colon+3, NULL, 10) <= 90)
-		return 0;
-
-	/* Ok, close enough */
-	return 1;
-}
-
 static struct strbuf buf = STRBUF_INIT;
 static int keep_cr;
 
diff --git a/cache.h b/cache.h
index dfa1a56..9e71ca5 100644
--- a/cache.h
+++ b/cache.h
@@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv, int fd);
 
 int versioncmp(const char *s1, const char *s2);
 
+/*
+ * Returns true if the line appears to be an mbox "From" line starting a new
+ * message.
+ */
+int is_from_line(const char *line, int len);
+
 #endif /* CACHE_H */
diff --git a/mbox.c b/mbox.c
new file mode 100644
index 0000000..75f3150
--- /dev/null
+++ b/mbox.c
@@ -0,0 +1,32 @@
+#include "cache.h"
+
+int is_from_line(const char *line, int len)
+{
+	const char *colon;
+
+	if (len < 20 || memcmp("From ", line, 5))
+		return 0;
+
+	colon = line + len - 2;
+	line += 5;
+	for (;;) {
+		if (colon < line)
+			return 0;
+		if (*--colon == ':')
+			break;
+	}
+
+	if (!isdigit(colon[-4]) ||
+	    !isdigit(colon[-2]) ||
+	    !isdigit(colon[-1]) ||
+	    !isdigit(colon[ 1]) ||
+	    !isdigit(colon[ 2]))
+		return 0;
+
+	/* year */
+	if (strtol(colon+3, NULL, 10) <= 90)
+		return 0;
+
+	/* Ok, close enough */
+	return 1;
+}
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 3e64a7a..9e1ad1c 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' '
 
 '
 
+test_expect_success 'mailinfo finds headers after embedded From line' '
+	mkdir embed-from &&
+	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailinfo embed-from/msg embed-from/patch \
+	  <embed-from/0001 >embed-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+'
+
+test_expect_success 'mailinfo on message with quoted >From' '
+	mkdir quoted-from &&
+	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailinfo quoted-from/msg quoted-from/patch \
+	  <quoted-from/0001 >quoted-from/out &&
+	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+'
+
 test_done
diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
new file mode 100644
index 0000000..06a3a38
--- /dev/null
+++ b/t/t5100/embed-from.expect
@@ -0,0 +1,5 @@
+Author: Commit Author
+Email: commit@xxxxxxxxxxx
+Subject: patch subject
+Date: Sat, 13 Sep 2014 21:13:23 -0400 
+
diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
new file mode 100644
index 0000000..5f3f84e
--- /dev/null
+++ b/t/t5100/embed-from.in
@@ -0,0 +1,13 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Email Author <email@xxxxxxxxxxx>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] email subject
+
+>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Commit Author <commit@xxxxxxxxxxx>
+Date: Sat, 13 Sep 2014 21:13:23 -0400
+Subject: patch subject
+
+patch body
+---
+patch
diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
new file mode 100644
index 0000000..8c9d48c
--- /dev/null
+++ b/t/t5100/quoted-from.expect
@@ -0,0 +1,3 @@
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
new file mode 100644
index 0000000..847e1c4
--- /dev/null
+++ b/t/t5100/quoted-from.in
@@ -0,0 +1,10 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: Author Name <somebody@xxxxxxxxxxx>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted >From
+
+>From the depths of history, we are stuck with the
+flaky mbox format.
+
+---
+patch
-- 
2.1.0.373.g91ca799

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]