Re: [PATCH] combine-diff: use textconv for combined diff format

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 15 Apr 2011 11:34:27 -0700

Michael J Gruber <git@xxxxxxxxxxxxxxxxxxxx> writes:

> git diff -m produces a combined diff!

Hmm, what is the rest of your command line?  I thought -m was a way to ask
pairwise diff with each parent.

> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>  {
>  	struct diff_queue_struct *q = &diff_queued_diff;
>  	struct combine_diff_path *p;
> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>  
>  			hashcpy(p->sha1, q->queue[i]->two->sha1);
>  			p->mode = q->queue[i]->two->mode;
> +			if (textconv)
> +				p->textconv = get_textconv(q->queue[i]->two);
>  			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>  			p->parent[n].mode = q->queue[i]->one->mode;
>  			p->parent[n].status = q->queue[i]->status;
> +			if (textconv)
> +				p->parent[n].textconv = get_textconv(q->queue[i]->one);

This code attempts to handle different textconv set for each different
parents.  But I have to wonder if that is really worth it.

The attribute to decide the content type of the blob is read from the same
set of .gitattributes files, regardless of which parent you are looking at
(and this is not likely to change---the exact procedure that is applied
comes from .git/config that is not even versioned, so there is not much
point in reading from the .gitattributes from the parent tree, trying to
be "precise").

If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
would be the same because one and two came from the same path.  If it is a
rename, they by definition consist of similar contents, and the user would
want the same textconv conversion applied to them to make them comparable.
Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
the right thing to do in theory, doing so wouldn't make a difference in
practice.  More importantly, even if the two textconvs specify different
conversions, it is likely that it is an user error (e.g. the preimage had
"img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
attributes mechanism does not say ".jqg" is a JPEG that wants to get
"exif" run to be texualized for the purpose of diffing, or something).

Besides, if you really want to support "left hand side and right hand
side, depending on which parent we are talking about, may use different
textconv", you would need to defeat the optimization in show_patch_diff()
that calls reuse_combine_diff() when sha1 are the same from other parent
we have already compared---the parent we are looking at may be using a
different textconv procedure.  Even worse, if parent and child have the
same sha1, the result of running parent textconv on the parent blob may be
different from that of the child, which you would never even see in this
codepath.

So I suspect that using only one textconv per "struct combine_diff_path"
would make both the code simpler, and more importantly would make the
result more correct from the end user's point of view.

> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>  			close(fd);
>  	}
>  
> +	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
> +		struct diff_filespec *df = alloc_filespec(elem->path);
> +		fill_filespec(df, elem->sha1, elem->mode);
> +		result_size = fill_textconv(elem->textconv, df, &result);
> +	}

I suspect that these three lines have to become a small helper function to
be used to convert the final blob (done here), and parent blob (done in
combine_diff).  With the "binary" support, it would eventually need to be
enhanced to something like:

	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
        	if (textconv) {
                	do these three lines;
		} else if (is binary) {
                	"Binary blob $SHA-1";
		}
	}

and having a small helper function early in the series would help that
process.

> +		paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));

As an internal API within this file, I would rather see "opt" as a whole
passed to intersect_paths().  We may probably want to determine if the
blob is binary in that function depending on other "opt" fields.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html