Re: [RFC/PATCH] mailinfo: do not treat ">From" lines as in-body headers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> It looks like we have a reasonably sane is_from_line() function. So at
> least _we_ will not generally break on reading our own output, except in
> some extreme circumstances (you'd have to come up with something
> contrived like "From me, at 10:30 30 minutes before 11!").
>
> We can pretty easily reuse this to make the from-line check in mailinfo
> more robust. Here's a replacement for the patch I sent earlier that
> keeps the "magically ignore extra >From headers" behavior but fixes the
> case that started this discussion.

Why cache.h when this is still only between mail{info,split}.c both
of which do not really deal with any "Git" data?

For mailsplit, we are trying to detect mbox boundary various MUAs
would use in their output, and is_from_line() may be appropriate,
but I am not sure if the same logic is appropriate for mailinfo.
What are we trying to protect us against?  Those who paste a single
e-mail output from format-patch in full?  Do people paste a single
e-mail received to their mailbox from somebody else and do we need
to protect against that, skipping the ">From " thing, while risking
to skip "From me at 10:30 30 minutes..."?

If we only want to skip ">?From" in pasted format-patch output, we
would want a rule in mailinfo that is tighter than is_from_line() in
mailsplit.

By the way, I see ">From " in is_rfc2822_header(), too.  Do we have
to worry about this comparison as well?

> -- >8 --
> Subject: mailinfo: make ">From" in-body header check more robust
>
> Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
> inside body, 2006-05-21), we have treated lines like ">From"
> in the body as headers. This makes "git am" work for people
> who erroneously paste the whole mbox entry:
>
>   From 12345abcd...
>   From: them
>   Subject: [PATCH] whatever
>
> into their email body (assuming that an mbox writer then
> quotes "From" as ">From", as otherwise we would actually
> mailsplit on the in-body line).
>
> However, this has false positives if somebody actually has a
> commit body that starts with "From "; in this case we
> erroneously remove the line entirely from the commit
> message. We can make this check more robust by making sure
> the line actually looks like a real mbox "From" line.
>
> To do this we pull the "is_from_line" definition out of
> mailsplit, and make it available for general use.
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  Makefile                   |  1 +
>  builtin/mailinfo.c         |  2 +-
>  builtin/mailsplit.c        | 31 -------------------------------
>  cache.h                    |  6 ++++++
>  mbox.c                     | 32 ++++++++++++++++++++++++++++++++
>  t/t5100-mailinfo.sh        | 18 ++++++++++++++++++
>  t/t5100/embed-from.expect  |  5 +++++
>  t/t5100/embed-from.in      | 13 +++++++++++++
>  t/t5100/quoted-from.expect |  3 +++
>  t/t5100/quoted-from.in     | 10 ++++++++++
>  10 files changed, 89 insertions(+), 32 deletions(-)
>  create mode 100644 mbox.c
>  create mode 100644 t/t5100/embed-from.expect
>  create mode 100644 t/t5100/embed-from.in
>  create mode 100644 t/t5100/quoted-from.expect
>  create mode 100644 t/t5100/quoted-from.in
>
> diff --git a/Makefile b/Makefile
> index e0f15a3..e018450 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
>  LIB_OBJS += mailmap.o
>  LIB_OBJS += match-trees.o
> +LIB_OBJS += mbox.o
>  LIB_OBJS += merge.o
>  LIB_OBJS += merge-blobs.o
>  LIB_OBJS += merge-recursive.o
> diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
> index cf11c8d..a434d39 100644
> --- a/builtin/mailinfo.c
> +++ b/builtin/mailinfo.c
> @@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line,
>  
>  	/* for inbody stuff */
>  	if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
> -		ret = 1; /* Should this return 0? */
> +		ret = is_from_line(line->buf + 1, line->len - 1);
>  		goto check_header_out;
>  	}
>  	if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 763cda0..11ba281 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -12,37 +12,6 @@
>  static const char git_mailsplit_usage[] =
>  "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
>  
> -static int is_from_line(const char *line, int len)
> -{
> -	const char *colon;
> -
> -	if (len < 20 || memcmp("From ", line, 5))
> -		return 0;
> -
> -	colon = line + len - 2;
> -	line += 5;
> -	for (;;) {
> -		if (colon < line)
> -			return 0;
> -		if (*--colon == ':')
> -			break;
> -	}
> -
> -	if (!isdigit(colon[-4]) ||
> -	    !isdigit(colon[-2]) ||
> -	    !isdigit(colon[-1]) ||
> -	    !isdigit(colon[ 1]) ||
> -	    !isdigit(colon[ 2]))
> -		return 0;
> -
> -	/* year */
> -	if (strtol(colon+3, NULL, 10) <= 90)
> -		return 0;
> -
> -	/* Ok, close enough */
> -	return 1;
> -}
> -
>  static struct strbuf buf = STRBUF_INIT;
>  static int keep_cr;
>  
> diff --git a/cache.h b/cache.h
> index dfa1a56..9e71ca5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv, int fd);
>  
>  int versioncmp(const char *s1, const char *s2);
>  
> +/*
> + * Returns true if the line appears to be an mbox "From" line starting a new
> + * message.
> + */
> +int is_from_line(const char *line, int len);
> +
>  #endif /* CACHE_H */
> diff --git a/mbox.c b/mbox.c
> new file mode 100644
> index 0000000..75f3150
> --- /dev/null
> +++ b/mbox.c
> @@ -0,0 +1,32 @@
> +#include "cache.h"
> +
> +int is_from_line(const char *line, int len)
> +{
> +	const char *colon;
> +
> +	if (len < 20 || memcmp("From ", line, 5))
> +		return 0;
> +
> +	colon = line + len - 2;
> +	line += 5;
> +	for (;;) {
> +		if (colon < line)
> +			return 0;
> +		if (*--colon == ':')
> +			break;
> +	}
> +
> +	if (!isdigit(colon[-4]) ||
> +	    !isdigit(colon[-2]) ||
> +	    !isdigit(colon[-1]) ||
> +	    !isdigit(colon[ 1]) ||
> +	    !isdigit(colon[ 2]))
> +		return 0;
> +
> +	/* year */
> +	if (strtol(colon+3, NULL, 10) <= 90)
> +		return 0;
> +
> +	/* Ok, close enough */
> +	return 1;
> +}
> diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
> index 3e64a7a..9e1ad1c 100755
> --- a/t/t5100-mailinfo.sh
> +++ b/t/t5100-mailinfo.sh
> @@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name works' '
>  
>  '
>  
> +test_expect_success 'mailinfo finds headers after embedded From line' '
> +	mkdir embed-from &&
> +	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
> +	git mailinfo embed-from/msg embed-from/patch \
> +	  <embed-from/0001 >embed-from/out &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
> +'
> +
> +test_expect_success 'mailinfo on message with quoted >From' '
> +	mkdir quoted-from &&
> +	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
> +	git mailinfo quoted-from/msg quoted-from/patch \
> +	  <quoted-from/0001 >quoted-from/out &&
> +	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
> +'
> +
>  test_done
> diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
> new file mode 100644
> index 0000000..06a3a38
> --- /dev/null
> +++ b/t/t5100/embed-from.expect
> @@ -0,0 +1,5 @@
> +Author: Commit Author
> +Email: commit@xxxxxxxxxxx
> +Subject: patch subject
> +Date: Sat, 13 Sep 2014 21:13:23 -0400 
> +
> diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
> new file mode 100644
> index 0000000..5f3f84e
> --- /dev/null
> +++ b/t/t5100/embed-from.in
> @@ -0,0 +1,13 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Email Author <email@xxxxxxxxxxx>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] email subject
> +
> +>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Commit Author <commit@xxxxxxxxxxx>
> +Date: Sat, 13 Sep 2014 21:13:23 -0400
> +Subject: patch subject
> +
> +patch body
> +---
> +patch
> diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
> new file mode 100644
> index 0000000..8c9d48c
> --- /dev/null
> +++ b/t/t5100/quoted-from.expect
> @@ -0,0 +1,3 @@
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
> new file mode 100644
> index 0000000..847e1c4
> --- /dev/null
> +++ b/t/t5100/quoted-from.in
> @@ -0,0 +1,10 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Author Name <somebody@xxxxxxxxxxx>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] testing quoted >From
> +
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> +---
> +patch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]