From: Jacob Keller <jacob.keller@xxxxxxxxx> git-am makes use of git-mailsplit to split an mbox into individual files before attempting to apply the patches. In most cases this works fine, but it can fail to apply patches in cases where the mbox file is not properly sorted. This can sometimes happen due to clock skew, or other issues with the software which saved the mbox. For example, if you download a t.mbox.gz from a public inbox server such as lore.kernel.org it may sort the messages in the thread by arrival time to the list. Due to clock skew or other issues this may not be the correct order of the patches to apply. A savvy user may then attempt to directly use git mailsplit to split the mailbox, only to find that the files are unhelpfully named "0001", "0002", etc. It requires further digging to figure out which message is which patch. Git has a format_sanitized_subject() function which is used by code to generate a suitable filename from a subject. Add a new --name-by-subject option to git mailsplit. If enabled, scan for lines beginning with the "Subject:" header when splitting mail. If found, extract the subject and pass it to format_sanitized_subject(). Use this to create a new filename which appends the sanitized subject to the standard sequence number. A savvy user can invoke git mailsplit with --name-by-subject to help analyze why the mailbox was not split the intended way. I originally wanted to avoid the need for an option, but git-am currently depends on the strict sequence number filenames. It is unclear how difficult it would be to refactor git-am to work with names that include the extra subject data. Suggested-by: Junio C Hamano <gitster@xxxxxxxxx> Signed-off-by: Jacob Keller <jacob.keller@xxxxxxxxx> --- Documentation/git-mailsplit.txt | 5 +++++ builtin/mailsplit.c | 25 ++++++++++++++++++++++++- t/t5100-mailinfo.sh | 25 +++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/Documentation/git-mailsplit.txt b/Documentation/git-mailsplit.txt index 3f0a6662c81e..2e5ba45e1988 100644 --- a/Documentation/git-mailsplit.txt +++ b/Documentation/git-mailsplit.txt @@ -9,6 +9,7 @@ SYNOPSIS -------- [verse] 'git mailsplit' [-b] [-f<nn>] [-d<prec>] [--keep-cr] [--mboxrd] + [--name-by-subject] -o<directory> [--] [(<mbox>|<Maildir>)...] DESCRIPTION @@ -52,6 +53,10 @@ OPTIONS Input is of the "mboxrd" format and "^>+From " line escaping is reversed. +--name-by-subject:: + Include the sanitized subject in the generated filenames, in + addition to the sequence number. + GIT --- Part of the linkgit:git[1] suite diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 3af9ddb8ae5c..df81782d05b3 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -8,9 +8,10 @@ #include "gettext.h" #include "string-list.h" #include "strbuf.h" +#include "pretty.h" static const char git_mailsplit_usage[] = -"git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory> [(<mbox>|<Maildir>)...]"; +"git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] [--name-by-subject] -o<directory> [(<mbox>|<Maildir>)...]"; static int is_from_line(const char *line, int len) { @@ -46,6 +47,7 @@ static int is_from_line(const char *line, int len) static struct strbuf buf = STRBUF_INIT; static int keep_cr; static int mboxrd; +static int name_by_subject; static int is_gtfrom(const struct strbuf *buf) { @@ -66,6 +68,9 @@ static int is_gtfrom(const struct strbuf *buf) */ static int split_one(FILE *mbox, const char *name, int allow_bare) { + struct strbuf sanitized_filename = STRBUF_INIT; + const char *subject_start; + size_t subject_len; FILE *output; int fd; int status = 0; @@ -101,10 +106,26 @@ static int split_one(FILE *mbox, const char *name, int allow_bare) } die_errno("cannot read mbox"); } + + /* Get a sanitized filename from the subject */ + if (name_by_subject && !sanitized_filename.len && + skip_prefix_mem(buf.buf, buf.len, "Subject:", + &subject_start, &subject_len)) { + strbuf_addf(&sanitized_filename, "%s-", name); + format_sanitized_subject(&sanitized_filename, + subject_start, + subject_len); + } + if (!is_bare && is_from_line(buf.buf, buf.len)) break; /* done with one message */ } fclose(output); + + if (name_by_subject && sanitized_filename.len) + rename(name, sanitized_filename.buf); + strbuf_release(&sanitized_filename); + return status; } @@ -296,6 +317,8 @@ int cmd_mailsplit(int argc, const char **argv, const char *prefix) usage(git_mailsplit_usage); } else if ( arg[1] == 'b' && !arg[2] ) { allow_bare = 1; + } else if (!strcmp(arg, "--name-by-subject")) { + name_by_subject = 1; } else if (!strcmp(arg, "--keep-cr")) { keep_cr = 1; } else if ( arg[1] == 'o' && arg[2] ) { diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index c8d06554541c..4826735c6033 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -44,6 +44,31 @@ do ' done +test_expect_success 'split sample box with --name-by-subject' ' + mkdir name-by-subject && + git mailsplit --name-by-subject -oname-by-subject "$DATA/sample.mbox" >last && + last=$(cat last) && + echo total is $last && + test $(cat last) = 18 +' + +check_mailinfo_name_by_subject () { + mail=$1 + mo="$(basename "$mail" | cut -c1-4)" + echo "$(basename "$mail")" >"sanitized$mo" && + git mailinfo -u "msg$mo" "patch$mo" <"$mail" >"info$mo" && + test_cmp "$DATA/msg$mo" "msg$mo" && + test_cmp "$DATA/patch$mo" "patch$mo" && + test_cmp "$DATA/info$mo" "info$mo" && + test_cmp "$DATA/sanitized$mo" "sanitized$mo" +} + +for mail in name-by-subject/00* +do + test_expect_success "check --name-by-subject $mail" ' + check_mailinfo_name_by_subject "$mail" + ' +done test_expect_success 'split box with rfc2047 samples' \ 'mkdir rfc2047 && -- 2.44.0.53.g0f9d4d28b7e6