Re: John (zzz) Doe <john.doe@xz> (Comment)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx> writes:

> On Sun, Jan 18, 2009 at 10:50:12AM -0800, Junio C Hamano wrote:
>> So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into:
>> 
>> 	AUTHOR_EMAIL=john.doe@xz
>>         AUTHOR_NAME="John (zzz) Doe (Comment)"
>> 
>> and leave it like so, I think.
>
> Ok, here you are:
>
> Subject: [PATCH 1/3] mailinfo: cleanup extra spaces for complex 'From'
>
> As described in RFC822 (3.4.3 COMMENTS, and  A.1.4.), comments, as e.g.
>
>     John (zzz) Doe <john.doe@xz> (Comment)
>
> should "NOT [be] included in the destination mailbox"
>
> On the other hand, quoting Junio:
>
>> The above quote from the RFC is irrelevant.  Note that it is only about
>> how you extract the e-mail address, discarding everything else.
>>
>> What mailinfo wants to do is to separate the human-readable name and the
>> e-mail address, and we want to use _both_ results from it.
>>
>> We separate a few example From: lines like this:
>>
>> 	Kirill Smelkov <kirr@xxxxxxxxxx>
>> ==>	AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov"
>>
>> 	kirr@xxxxxxxxxx (Kirill Smelkov)
>> ==>	AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov"
>>
>> Traditionally, the way people spelled their name on From: line has been
>> either one of the above form.  Typically comment form (i.e. the second
>> one) adds the name at the end, while "Name <addr>" form has the name at
>> the front.  But I do not think RFC requires that, primarily because it is
>> all about discarding non-address part to find the e-mail address aka
>> "destination mailbox".  It does not specify how humans should interpret
>> the human readable name and the comment.
>>
>> Now, why is the name not AUTHOR_NAME="(Kirill Smelkov)" in the latter
>> form?
>>
>> It is just common sense transformation.  Otherwise it looks simply ugly,
>> and it is obvious that the parentheses is not part of the name of the
>> person who used "kirr@xxxxxxxxxx (Kirill Smelkov)" on his From: line.
>>
>> So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into:
>>
>> 	AUTHOR_EMAIL=john.doe@xz
>>         AUTHOR_NAME="John (zzz) Doe (Comment)"
>>
>> and leave it like so, I think.
>
> So let's just correctly remove extra spaces which could be left inside
> name.
>
> We need this functionality to pass all RFC2047 based tests in the next commit.
>
> Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx>
> ---
> ...
> Is it ok?

I think the patch text looks good, but what you have as the proposed
commit log message does not look anything like log message we usually use.

 - If you agree with my comment that "should NOT be included" from the RFC
   you quoted is irrelevant, then I do not think you would even want to
   have anything before "On the other hand,...".

 - If you disagree, then why are you bending the patch text to match what
   I say? ;-)

Also I am not sure "passing RFC2047 based tests" is a valid purpose nor
justification for adding this patch (because you could argue that the
tests and the results the tests expect are faulty).

The way we (and your patches) do thinks is to have the desired outcome
designed first, and then have tests to make sure that the implementation
matches the desired outcome.

Given that, what I would probably suggest would be...

-- 8< -- cut from here -- 8< --

mailinfo: handle comment fields in From: line sanely

Most commonly, people have their human readable name and their e-mail
address on their From: line in one of two formats:

    Kirill Smelkov <kirr@xxxxxxxxxx>
    kirr@xxxxxxxxxx (Kirill Smelkov)

In addition, you can have "Comments" in parentheses at random places, like
this:

    John (zzz) Doe <john.doe@xz> (Comment)

RFC822 defines rules to extract the "destination mailbox" out of such
input (and we correctly extract "kirr@xxxxxxxxxx" and "john.doe@xz" from
these examples).  It however does not specify how to pick up the human
readable name from the remainder, and the existing code randomly dropped
pieces of information in <<<this and that way --- explain the breakage you
wanted to fix with your patch, perhaps "and left a newline in the result"
may be one of the breakages>>>.

This patch changes the rule so that <<<explain what it does here.  I think
what the code does is (1) remove the e-mail (and angle brackets around
it), (2) sanitize LF into a single SP to keep the result a single line,
and (3) as a special case, if the result is enclosed by () pair, remove
them---this rule is to format the second common case listed above
sanely>>>.

A subsequent patch using From: lines taken from the example section of
RFC2047 will test this feature.

-- >8 --






--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux