Junio C Hamano venit, vidit, dixit 15.04.2011 20:34: > Michael J Gruber <git@xxxxxxxxxxxxxxxxxxxx> writes: > >> git diff -m produces a combined diff! > > Hmm, what is the rest of your command line? I thought -m was a way to ask > pairwise diff with each parent. Sure, but it does not always work like that. Just look at the test from my patch, or do any "git merge --no-commit" and then "git diff -m". I would expect that to compare the worktree to each parent, but in fact it runs "diff --cc". At least I thought that's the only way how combine-diff would ever have to deal with a merge result in the worktree as opposed to a blob. And it seems that "diff -m" does not handle this but relays to "diff --cc" for current git. I have not checked the "-m" codepath. >> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv) >> { >> struct diff_queue_struct *q = &diff_queued_diff; >> struct combine_diff_path *p; >> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, >> >> hashcpy(p->sha1, q->queue[i]->two->sha1); >> p->mode = q->queue[i]->two->mode; >> + if (textconv) >> + p->textconv = get_textconv(q->queue[i]->two); >> hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1); >> p->parent[n].mode = q->queue[i]->one->mode; >> p->parent[n].status = q->queue[i]->status; >> + if (textconv) >> + p->parent[n].textconv = get_textconv(q->queue[i]->one); > > This code attempts to handle different textconv set for each different > parents. But I have to wonder if that is really worth it. > > The attribute to decide the content type of the blob is read from the same > set of .gitattributes files, regardless of which parent you are looking at > (and this is not likely to change---the exact procedure that is applied > comes from .git/config that is not even versioned, so there is not much > point in reading from the .gitattributes from the parent tree, trying to > be "precise"). > > If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv > would be the same because one and two came from the same path. If it is a > rename, they by definition consist of similar contents, and the user would > want the same textconv conversion applied to them to make them comparable. > Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1 > blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be > the right thing to do in theory, doing so wouldn't make a difference in > practice. More importantly, even if the two textconvs specify different > conversions, it is likely that it is an user error (e.g. the preimage had > "img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the > attributes mechanism does not say ".jqg" is a JPEG that wants to get > "exif" run to be texualized for the purpose of diffing, or something). > > Besides, if you really want to support "left hand side and right hand > side, depending on which parent we are talking about, may use different > textconv", you would need to defeat the optimization in show_patch_diff() > that calls reuse_combine_diff() when sha1 are the same from other parent > we have already compared---the parent we are looking at may be using a > different textconv procedure. Even worse, if parent and child have the > same sha1, the result of running parent textconv on the parent blob may be > different from that of the child, which you would never even see in this > codepath. > > So I suspect that using only one textconv per "struct combine_diff_path" > would make both the code simpler, and more importantly would make the > result more correct from the end user's point of view. I'd be happy to take the simpler approach. While I still think the other one is "more correct" (modulo the reuse issue) it should not matter in most cases. > >> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, >> close(fd); >> } >> >> + if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) { >> + struct diff_filespec *df = alloc_filespec(elem->path); >> + fill_filespec(df, elem->sha1, elem->mode); >> + result_size = fill_textconv(elem->textconv, df, &result); >> + } > > I suspect that these three lines have to become a small helper function to > be used to convert the final blob (done here), and parent blob (done in > combine_diff). With the "binary" support, it would eventually need to be > enhanced to something like: > > if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) { > if (textconv) { > do these three lines; > } else if (is binary) { > "Binary blob $SHA-1"; > } > } > "diff -m --oneline" says something like aa01ae1 (from 64c0923) Merge branch 'master' into somebranch diff --git a/a b/a index 72594ed..d8323da 100644 Binary files a/a and b/a differ aa01ae1 (from e85049e) Merge branch 'master' into somebranch diff --git a/a b/a index 86e041d..d8323da 100644 Binary files a/a and b/a differ so I'm wondering whether we shouldn't stay closer to that with "--cc also", e.g.: aa01ae1 Merge branch 'master' into somebranch diff --cc a index 72594ed,86e041d..d8323da Binary files a/a and b/a differ BTW: Currently, "--cc --oneline" produces an extra newline before the diff line, and also note how the diff lines differ ("a/a b/a" vs. "a"). But those are different issues. > and having a small helper function early in the series would help that > process. > >> + paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV)); > > As an internal API within this file, I would rather see "opt" as a whole > passed to intersect_paths(). We may probably want to determine if the > blob is binary in that function depending on other "opt" fields. Yep. Michael [Resent today, sorry. Couldn't get myself to reboot that box yesterday after a disk gave up.] -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html