Re: [PATCH v2] range-diff: optionally include merge commits' diffs in the analysis

Elijah Newren <newren@xxxxxxxxx> · Mon, 11 Nov 2024 08:51:12 -0800

On Sun, Nov 10, 2024 at 4:37 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@xxxxxxxxx>
> writes:
>
[...]
> > +--diff-merges=<format>::
> > +     Instead of ignoring merge commits, generate diffs for them using the
> > +     corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +     and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use, as it is consistent with the idea that a merge is kind of a "meta
> > +patch", comprising all the merged commits' patches into a single one.
>
> I'll let you and Elijah figure out the best phrasing around here,
> including whether we recommend only first-parent or give other
> options.

I'd probably suggest:

Note: When trying to see how users adjusted the contents of a merge
commit, `remerge-diff` is a natural choice.  When viewing merges as a
"meta patch", comprising a squash of all the merged commits' patches
into a single one (similar to `git log --first-parent -p`),
`first-parent` is a natural choice.

> For me personally, I find that use of `first-parent` here is
> consistent with the mindset of those users who prefer to read "git
> log --first-parent -p", and to a smaller degree, those who use
> squash merges.  To them, a merge is merely bringing in lots of
> changes into the mainline as a single unit, and other than that
> these lots of changes are broken out for finer grained inspection
> if/when they wanted to, they prefer to treat the changes brought in
> by merges just like a single-parent commit.  And from that point of
> view, (1) reordering the changes inside the side branch is
> immaterial for the purpose of talking about the result, the net
> effect the merged topic makes to the mainline, and (2) dropping or
> adding new changes to the side branch is treated just like a newer
> iteration of a single parent commit having fewer or more changes.
> So it is very natural that first-parent helps to make the world view
> very simplistic, and more readable range-diff is all about how you
> can present the two iterations in a simple and flattened form.
>
> There _may_ need a tweak of the matching algorithm to allow the
> "same" merge on both sides to match, even if they diverge a lot,
> though.  A range-diff that pairs a merge in the previous iteration
> with a single patch in the updated iteration may be hard to read.

Sounds like you are arguing that there is an angle/usecase from which
`first-parent` makes sense, namely the viewpoint that "a merge is
merely bringing in lots of changes into the mainline as a single unit"
as you put it.  What was surprising to me, though, is that it's a
completely different usecase than the one that was brought up in the
commit message for this feature, namely "so-called 'evil merges' are
sometimes necessary and need to be reviewed too".

If you want to review "evilness" or the differences between the
changes that `git merge` would make and what got recorded in the
commit, then `first-parent` is not that great an approximation of what
you are looking for, BUT it happens to work beautifully in common
cases where you are merely transplanting commits due to the fact that
the huge amounts of extras being put into the diff happen to be equal
on the two sides, so they cancel out and leave you with the "evilness"
you are interested in.  But if you stray from that common case (by
e.g. inserting, removing, or modifying commits while rebasing) and are
still interested in "evilness", your approximation quickly drops from
accurately pulling out just the bits you are interested in towards
being completely swamped by the noise.

My concern with the commit message and patch as they stand, is that I
feel the `first-parent` mode is being recommended as a solution for a
certain usecase where it is really only an approximation that luckily
is highly accurate in special but common scenarios while being very
inaccurate in others.  I think that will lead to users getting
negatively surprised when they move outside those common cases.