Re: [PATCH] git-mailinfo fixes for patch munging

Junio C Hamano <junkio@xxxxxxx> · Fri, 30 Mar 2007 14:19:42 -0700

Don Zickus <dzickus@xxxxxxxxxx> writes:

> Don't translate the patch to UTF-8, instead preserve the data as is.  Also
> allow overwriting the primary mail headers (addresses Linus's concern).  
>
> I also revert a test case that was included in the original patch.  Now it
> makes sense why it was the way it was. :)
>
> Cheers,
> Don

Thanks.  Sign-off would have been nice.

> diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
> index d94578c..71b6457 100644
> --- a/builtin-mailinfo.c
> +++ b/builtin-mailinfo.c
> @@ -294,14 +294,14 @@ static char *header[MAX_HDR_PARSED] = {
>  	"From","Subject","Date",
>  };
>  
> -static int check_header(char *line, char **hdr_data)
> +static int check_header(char *line, char **hdr_data, int overwrite)
>  {
>  	int i;
>  
>  	/* search for the interesting parts */
>  	for (i = 0; header[i]; i++) {
>  		int len = strlen(header[i]);
> -		if (!hdr_data[i] &&
> +		if ((!hdr_data[i] || overwrite) &&
>  		    !strncasecmp(line, header[i], len) &&
>  		    line[len] == ':' && isspace(line[len + 1])) {
>  			/* Unwrap inline B and Q encoding, and optionally

This check_header is called from each multi-part boundary with
overwrite=1, so if you have two parts and you have From: or
Subject: in the multi-part header (not in-body), wouldn't they
overwrite what we already have?  That is not desired, I would
think.

For non multi-part case, what we traditionally have done is:

	* Take Subject:, Date:, and From: from RFC2822 headers
          to prime the title and authorship information.

	* The first lines of the body of the message (i.e. after
          the blank line that separates 2822 headers and the
          body) can look like the above header lines to
          override.

	* A line that does not look like an overriding in-body
          header line is the first line of the commit log
          message.  After that, nothing is taken as an
          overriding in-body header.

For a multi-part, I think we only processed the first part as
the commit log message, potentially starting with the overriding
in-body headers.  In other words, in-body headers are what the
user *types* to override what the MUA says in RFC2822 headers.
As the stuff that follow the multi-part boundary (like
content-type and transfer encoding) are of the MUA kind, I
suspect we do not want it to override what the sender said in
the earlier parts of the message.

> @@ -614,6 +614,7 @@ static int find_boundary(void)
>  
>  static int handle_boundary(void)
>  {
> +	char newline[]="\n";
>  again:
>  	if (!memcmp(line+content_top->boundary_len, "--", 2)) {
>  		/* we hit an end boundary */
> @@ -628,7 +629,7 @@ again:
>  					"can't recover\n");
>  			exit(1);
>  		}
> -		handle_filter("\n");
> +		handle_filter(newline);
>  
>  		/* skip to the next boundary */
>  		if (!find_boundary())

These two hunks certainly do not hurt, but why?  Is this about
the constness of the first parameter to handle_filter() and its
call chain?

Having said that, the result of the patch is much better.  

In fact, I couldn't "git am" this patch (the part that reverts
the test vector) with the current tip of 'master' because of the
breakage you are fixing with it ;-).

Now I can.  So I'd probably take this patch for now.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html