On Sun, Jan 18, 2009 at 10:50:12AM -0800, Junio C Hamano wrote: > So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into: > > AUTHOR_EMAIL=john.doe@xz > AUTHOR_NAME="John (zzz) Doe (Comment)" > > and leave it like so, I think. Ok, here you are: Subject: [PATCH 1/3] mailinfo: cleanup extra spaces for complex 'From' As described in RFC822 (3.4.3 COMMENTS, and A.1.4.), comments, as e.g. John (zzz) Doe <john.doe@xz> (Comment) should "NOT [be] included in the destination mailbox" On the other hand, quoting Junio: > The above quote from the RFC is irrelevant. Note that it is only about > how you extract the e-mail address, discarding everything else. > > What mailinfo wants to do is to separate the human-readable name and the > e-mail address, and we want to use _both_ results from it. > > We separate a few example From: lines like this: > > Kirill Smelkov <kirr@xxxxxxxxxx> > ==> AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov" > > kirr@xxxxxxxxxx (Kirill Smelkov) > ==> AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov" > > Traditionally, the way people spelled their name on From: line has been > either one of the above form. Typically comment form (i.e. the second > one) adds the name at the end, while "Name <addr>" form has the name at > the front. But I do not think RFC requires that, primarily because it is > all about discarding non-address part to find the e-mail address aka > "destination mailbox". It does not specify how humans should interpret > the human readable name and the comment. > > Now, why is the name not AUTHOR_NAME="(Kirill Smelkov)" in the latter > form? > > It is just common sense transformation. Otherwise it looks simply ugly, > and it is obvious that the parentheses is not part of the name of the > person who used "kirr@xxxxxxxxxx (Kirill Smelkov)" on his From: line. > > So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into: > > AUTHOR_EMAIL=john.doe@xz > AUTHOR_NAME="John (zzz) Doe (Comment)" > > and leave it like so, I think. So let's just correctly remove extra spaces which could be left inside name. We need this functionality to pass all RFC2047 based tests in the next commit. Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx> --- builtin-mailinfo.c | 21 +++++++++++++++++---- t/t5100/info0001 | 2 +- t/t5100/sample.mbox | 4 ++-- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c index dacc8ac..8030823 100644 --- a/builtin-mailinfo.c +++ b/builtin-mailinfo.c @@ -29,6 +29,9 @@ static struct strbuf **p_hdr_data, **s_hdr_data; #define MAX_HDR_PARSED 10 #define MAX_BOUNDARIES 5 +static void cleanup_space(struct strbuf *sb); + + static void get_sane_name(struct strbuf *out, struct strbuf *name, struct strbuf *email) { struct strbuf *src = name; @@ -109,10 +112,14 @@ static void handle_from(const struct strbuf *from) strbuf_add(&email, at, el); strbuf_remove(&f, at - f.buf, el + (at[el] ? 1 : 0)); - /* The remainder is name. It could be "John Doe <john.doe@xz>" - * or "john.doe@xz (John Doe)", but we have removed the - * email part, so trim from both ends, possibly removing - * the () pair at the end. + /* The remainder is name. It could be + * + * - "John Doe <john.doe@xz>" (a), or + * - "john.doe@xz (John Doe)" (b), or + * - "John (zzz) Doe <john.doe@xz> (Comment)" (c) + * + * but we have removed the email part, so trim from both ends, possibly + * removing the () pair at the end for case 'b'. */ strbuf_trim(&f); if (f.buf[0] == '(' && f.len && f.buf[f.len - 1] == ')') { @@ -120,6 +127,12 @@ static void handle_from(const struct strbuf *from) strbuf_setlen(&f, f.len - 1); } + /* Otherwise we want comments to stay. It's just time to cleanup extra + * spaces + */ + cleanup_space(&f); + strbuf_trim(&f); + get_sane_name(&name, &f, &email); strbuf_release(&f); } diff --git a/t/t5100/info0001 b/t/t5100/info0001 index 8c05277..f951538 100644 --- a/t/t5100/info0001 +++ b/t/t5100/info0001 @@ -1,4 +1,4 @@ -Author: A U Thor +Author: A (zzz) U Thor (Comment) Email: a.u.thor@xxxxxxxxxxx Subject: a commit. Date: Fri, 9 Jun 2006 00:44:16 -0700 diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox index 38725f3..4f80b82 100644 --- a/t/t5100/sample.mbox +++ b/t/t5100/sample.mbox @@ -2,10 +2,10 @@ From nobody Mon Sep 17 00:00:00 2001 -From: A +From: A (zzz) U Thor - <a.u.thor@xxxxxxxxxxx> + <a.u.thor@xxxxxxxxxxx> (Comment) Date: Fri, 9 Jun 2006 00:44:16 -0700 Subject: [PATCH] a commit. -- 1.6.1.79.g92b9.dirty Is it ok? And by the way, please pull the whole updated series from git://repo.or.cz/git/kirr.git for-junio-maint Kirill Smelkov (3): mailinfo: cleanup extra spaces for complex 'From' mailinfo: add explicit test for mails like '<a.u.thor@xxxxxxxxxxx> (A U Thor)' mailinfo: tests for RFC2047 examples Thanks, Kirill -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html