Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx> writes: > On Sun, Jan 18, 2009 at 10:50:12AM -0800, Junio C Hamano wrote: >> So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into: >> >> AUTHOR_EMAIL=john.doe@xz >> AUTHOR_NAME="John (zzz) Doe (Comment)" >> >> and leave it like so, I think. > > Ok, here you are: > > Subject: [PATCH 1/3] mailinfo: cleanup extra spaces for complex 'From' > > As described in RFC822 (3.4.3 COMMENTS, and A.1.4.), comments, as e.g. > > John (zzz) Doe <john.doe@xz> (Comment) > > should "NOT [be] included in the destination mailbox" > > On the other hand, quoting Junio: > >> The above quote from the RFC is irrelevant. Note that it is only about >> how you extract the e-mail address, discarding everything else. >> >> What mailinfo wants to do is to separate the human-readable name and the >> e-mail address, and we want to use _both_ results from it. >> >> We separate a few example From: lines like this: >> >> Kirill Smelkov <kirr@xxxxxxxxxx> >> ==> AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov" >> >> kirr@xxxxxxxxxx (Kirill Smelkov) >> ==> AUTHOR_EMAIL="kirr@xxxxxxxxxx" AUTHOR_NAME="Kirill Smelkov" >> >> Traditionally, the way people spelled their name on From: line has been >> either one of the above form. Typically comment form (i.e. the second >> one) adds the name at the end, while "Name <addr>" form has the name at >> the front. But I do not think RFC requires that, primarily because it is >> all about discarding non-address part to find the e-mail address aka >> "destination mailbox". It does not specify how humans should interpret >> the human readable name and the comment. >> >> Now, why is the name not AUTHOR_NAME="(Kirill Smelkov)" in the latter >> form? >> >> It is just common sense transformation. Otherwise it looks simply ugly, >> and it is obvious that the parentheses is not part of the name of the >> person who used "kirr@xxxxxxxxxx (Kirill Smelkov)" on his From: line. >> >> So we can separate "John (zzz) Doe <john.doe@xz> (Comment)" into: >> >> AUTHOR_EMAIL=john.doe@xz >> AUTHOR_NAME="John (zzz) Doe (Comment)" >> >> and leave it like so, I think. > > So let's just correctly remove extra spaces which could be left inside > name. > > We need this functionality to pass all RFC2047 based tests in the next commit. > > Signed-off-by: Kirill Smelkov <kirr@xxxxxxxxxxxxxxxxxxx> > --- > ... > Is it ok? I think the patch text looks good, but what you have as the proposed commit log message does not look anything like log message we usually use. - If you agree with my comment that "should NOT be included" from the RFC you quoted is irrelevant, then I do not think you would even want to have anything before "On the other hand,...". - If you disagree, then why are you bending the patch text to match what I say? ;-) Also I am not sure "passing RFC2047 based tests" is a valid purpose nor justification for adding this patch (because you could argue that the tests and the results the tests expect are faulty). The way we (and your patches) do thinks is to have the desired outcome designed first, and then have tests to make sure that the implementation matches the desired outcome. Given that, what I would probably suggest would be... -- 8< -- cut from here -- 8< -- mailinfo: handle comment fields in From: line sanely Most commonly, people have their human readable name and their e-mail address on their From: line in one of two formats: Kirill Smelkov <kirr@xxxxxxxxxx> kirr@xxxxxxxxxx (Kirill Smelkov) In addition, you can have "Comments" in parentheses at random places, like this: John (zzz) Doe <john.doe@xz> (Comment) RFC822 defines rules to extract the "destination mailbox" out of such input (and we correctly extract "kirr@xxxxxxxxxx" and "john.doe@xz" from these examples). It however does not specify how to pick up the human readable name from the remainder, and the existing code randomly dropped pieces of information in <<<this and that way --- explain the breakage you wanted to fix with your patch, perhaps "and left a newline in the result" may be one of the breakages>>>. This patch changes the rule so that <<<explain what it does here. I think what the code does is (1) remove the e-mail (and angle brackets around it), (2) sanitize LF into a single SP to keep the result a single line, and (3) as a special case, if the result is enclosed by () pair, remove them---this rule is to format the second common case listed above sanely>>>. A subsequent patch using From: lines taken from the example section of RFC2047 will test this feature. -- >8 -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html