RE: [PATCH 2/2] mailinfo: Understand forwarded patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Jonathan Tan [mailto:jonathantanmy@xxxxxxxxxx]
> On 01/12/2017 01:20 AM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>
> >
> > Extend the --scissors mechanism to strip off the preamble created by
> > forwarding a patch.  There are a couple of extra headers ("Sent" and
> > "To") added by forwarding, but other than that, the --scissors option
> > will now remove patches forwarded from Microsoft Outlook to a Linux
> > email account.
> >
> > Signed-off-by: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>
> 
> Also add a test showing the kind of message that the current code
> doesn't handle, and that this commit addresses.

OK.  For the sake of discussion, here's what the base64-decoded message looks like:

--- 8< ---

-----Original Message-----
From: Rehas Sachdeva [mailto:aquannie@xxxxxxxxx]
Sent: Wednesday, January 4, 2017 11:55 AM
To: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; riel@xxxxxxxxxxx
Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

Make the output of radix tree test suite less verbose by default and add
-v and -vv command line options for increasing level of verbosity.

--- >8 ---

> > ---
> >  mailinfo.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/mailinfo.c b/mailinfo.c
> > index 2059704a8..fc1275532 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi,
> struct strbuf *subject)
> >
> >  #define MAX_HDR_PARSED 10
> >  static const char *header[MAX_HDR_PARSED] = {
> > -	"From","Subject","Date",
> > +	"From","Subject","Date","Sent","To",
> 
> Are these extra headers used in both the "real" e-mail headers and the
> in-body headers, or only one of them? (If the latter, they should
> probably be handled only in the relevant function - my previous patches
> to this file were in that direction too, if I remember correctly.) Also,
> I suspect that these will have to be handled differently to the other 3,
> but that will be clearer when you add the test with an example message.

Without this part of the patch, using --scissors means it stops parsing headers when it hits the 'Sent' line.  So all I'm looking for is to have 'is_inbody_header()' return true.  If there's a better way to achieve that, I'm all for it.  Without this hunk of the patch, the commit looks like this:

Without any of this patch, the commit looks like this:

Author: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>
Date:   Sat Jan 7 18:33:43 2017 +0000

    FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    -----Original Message-----
    From: Rehas Sachdeva [mailto:aquannie@xxxxxxxxx]
    Sent: Wednesday, January 4, 2017 11:55 AM
    To: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; riel@xxxxxxxxxxx
    Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.

Without this hunk of the patch, the commit looks like this:

Author: Rehas Sachdeva <[mailto:aquannie@xxxxxxxxx]>
Date:   Sat Jan 7 18:33:43 2017 +0000

    FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Sent: Wednesday, January 4, 2017 11:55 AM
    To: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>; riel@xxxxxxxxxxx
    Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.

And with this hunk, it turns into this:

Author: Rehas Sachdeva <[mailto:aquannie@xxxxxxxxx]>
Date:   Sat Jan 7 18:33:43 2017 +0000

    radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.


There's more work to do here, turning that silly <[mailto:]> into a proper email address, for example, and parsing Sent: like we parse Date:, but I thought it worth sending out patches 1 & 2 before writing patch 3.

> > @@ -685,6 +685,13 @@ static int is_scissors_line(const char *line)
> >  			c++;
> >  			continue;
> >  		}
> > +		if (!memcmp(c, "Original Message", 16)) {
> 
> 1) You can use starts_with or skip_prefix.

I was following the existing logic in this function that looks for 8< or >8 or ...

> 2) This seems vulnerable to false positives. If "Original Message"
> always follows a certain kind of line, it might be better to check for
> that. (Again, it will be clearer when we have an example message.)

I don't think it's terribly vulnerable to false positives.  The logic at the end of the function tries to make sure that the scissor marks make up a substantial component of the line.

We could do this differently; I'm not sure if there's a regex library built into git, but we could even custom-write a separate matcher that looks only for ^-{3,}Original Message-{3,}$

Or we could use a different option; eg --forwarded that will trim off the extra gunk associated with forwarding messages, instead of overloading --scissors.

> > +			in_perforation = 1;
> > +			perforation += 16;
> > +			scissors += 16;
> > +			c += 15;
> 
> Why 15? Also, can skip_prefix avoid these magic numbers?

Again, following the rest of the function.  c has already been advanced by 1, now it needs to be advanced to the end of the 16 character string "Original Message".





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]