[PATCH] mailsplit add option to include sanitized subject in filename

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Jacob Keller <jacob.keller@xxxxxxxxx>

git-am makes use of git-mailsplit to split an mbox into individual files
before attempting to apply the patches. In most cases this works fine,
but it can fail to apply patches in cases where the mbox file is not
properly sorted. This can sometimes happen due to clock skew, or other
issues with the software which saved the mbox.

For example, if you download a t.mbox.gz from a public inbox server such
as lore.kernel.org it may sort the messages in the thread by arrival
time to the list. Due to clock skew or other issues this may not be the
correct order of the patches to apply.

A savvy user may then attempt to directly use git mailsplit to split the
mailbox, only to find that the files are unhelpfully named "0001",
"0002", etc. It requires further digging to figure out which message is
which patch.

Git has a format_sanitized_subject() function which is used by code to
generate a suitable filename from a subject. Add a new --name-by-subject
option to git mailsplit. If enabled, scan for lines beginning with the
"Subject:" header when splitting mail. If found, extract the subject and
pass it to format_sanitized_subject(). Use this to create a new filename
which appends the sanitized subject to the standard sequence number. A
savvy user can invoke git mailsplit with --name-by-subject to help
analyze why the mailbox was not split the intended way.

I originally wanted to avoid the need for an option, but git-am
currently depends on the strict sequence number filenames. It is unclear
how difficult it would be to refactor git-am to work with names that
include the extra subject data.

Suggested-by: Junio C Hamano <gitster@xxxxxxxxx>
Signed-off-by: Jacob Keller <jacob.keller@xxxxxxxxx>
---
 Documentation/git-mailsplit.txt |  5 +++++
 builtin/mailsplit.c             | 25 ++++++++++++++++++++++++-
 t/t5100-mailinfo.sh             | 25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-mailsplit.txt b/Documentation/git-mailsplit.txt
index 3f0a6662c81e..2e5ba45e1988 100644
--- a/Documentation/git-mailsplit.txt
+++ b/Documentation/git-mailsplit.txt
@@ -9,6 +9,7 @@ SYNOPSIS
 --------
 [verse]
 'git mailsplit' [-b] [-f<nn>] [-d<prec>] [--keep-cr] [--mboxrd]
+		[--name-by-subject]
 		-o<directory> [--] [(<mbox>|<Maildir>)...]
 
 DESCRIPTION
@@ -52,6 +53,10 @@ OPTIONS
 	Input is of the "mboxrd" format and "^>+From " line escaping is
 	reversed.
 
+--name-by-subject::
+	Include the sanitized subject in the generated filenames, in
+	addition to the sequence number.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 3af9ddb8ae5c..df81782d05b3 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -8,9 +8,10 @@
 #include "gettext.h"
 #include "string-list.h"
 #include "strbuf.h"
+#include "pretty.h"
 
 static const char git_mailsplit_usage[] =
-"git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]";
+"git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] [--name-by-subject] -o<directory> [(<mbox>|<Maildir>)...]";
 
 static int is_from_line(const char *line, int len)
 {
@@ -46,6 +47,7 @@ static int is_from_line(const char *line, int len)
 static struct strbuf buf = STRBUF_INIT;
 static int keep_cr;
 static int mboxrd;
+static int name_by_subject;
 
 static int is_gtfrom(const struct strbuf *buf)
 {
@@ -66,6 +68,9 @@ static int is_gtfrom(const struct strbuf *buf)
  */
 static int split_one(FILE *mbox, const char *name, int allow_bare)
 {
+	struct strbuf sanitized_filename = STRBUF_INIT;
+	const char *subject_start;
+	size_t subject_len;
 	FILE *output;
 	int fd;
 	int status = 0;
@@ -101,10 +106,26 @@ static int split_one(FILE *mbox, const char *name, int allow_bare)
 			}
 			die_errno("cannot read mbox");
 		}
+
+		/* Get a sanitized filename from the subject */
+		if (name_by_subject && !sanitized_filename.len &&
+		    skip_prefix_mem(buf.buf, buf.len, "Subject:",
+				    &subject_start, &subject_len)) {
+			strbuf_addf(&sanitized_filename, "%s-", name);
+			format_sanitized_subject(&sanitized_filename,
+						 subject_start,
+						 subject_len);
+		}
+
 		if (!is_bare && is_from_line(buf.buf, buf.len))
 			break; /* done with one message */
 	}
 	fclose(output);
+
+	if (name_by_subject && sanitized_filename.len)
+		rename(name, sanitized_filename.buf);
+	strbuf_release(&sanitized_filename);
+
 	return status;
 }
 
@@ -296,6 +317,8 @@ int cmd_mailsplit(int argc, const char **argv, const char *prefix)
 			usage(git_mailsplit_usage);
 		} else if ( arg[1] == 'b' && !arg[2] ) {
 			allow_bare = 1;
+		} else if (!strcmp(arg, "--name-by-subject")) {
+			name_by_subject = 1;
 		} else if (!strcmp(arg, "--keep-cr")) {
 			keep_cr = 1;
 		} else if ( arg[1] == 'o' && arg[2] ) {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index c8d06554541c..4826735c6033 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -44,6 +44,31 @@ do
 	'
 done
 
+test_expect_success 'split sample box with --name-by-subject' '
+	mkdir name-by-subject &&
+	git mailsplit --name-by-subject -oname-by-subject "$DATA/sample.mbox" >last &&
+	last=$(cat last) &&
+	echo total is $last &&
+	test $(cat last) = 18
+'
+
+check_mailinfo_name_by_subject () {
+	mail=$1
+	mo="$(basename "$mail" | cut -c1-4)"
+	echo "$(basename "$mail")" >"sanitized$mo" &&
+	git mailinfo -u "msg$mo" "patch$mo" <"$mail" >"info$mo" &&
+	test_cmp "$DATA/msg$mo" "msg$mo" &&
+	test_cmp "$DATA/patch$mo" "patch$mo" &&
+	test_cmp "$DATA/info$mo" "info$mo" &&
+	test_cmp "$DATA/sanitized$mo" "sanitized$mo"
+}
+
+for mail in name-by-subject/00*
+do
+	test_expect_success "check --name-by-subject $mail" '
+		check_mailinfo_name_by_subject "$mail"
+	'
+done
 
 test_expect_success 'split box with rfc2047 samples' \
 	'mkdir rfc2047 &&
-- 
2.44.0.53.g0f9d4d28b7e6





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux