Re: Current state / standard advice for rebasing merges without information loss/re-entry?

Martin von Zweigbergk <martinvonz@xxxxxxxxx> · Wed, 20 Apr 2022 16:54:36 -0700

On Tue, Apr 19, 2022 at 10:43 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Martin von Zweigbergk <martinvonz@xxxxxxxxx> writes:
>
> > On Tue, Apr 19, 2022 at 6:57 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >>
> >> Philip Oakley <philipoakley@iee.email> writes:
> >>
> >> > So, essentially, it's talking a small part of the rerere-train at each
> >> > step in the replay, so that it's more focussed.
> >>
> >> That reminds me of one topic.
> >
> > And it reminds me of a discussion about first-class conflicts vs
> > rerere I had recently [1] (Philip's email hasn't been delivered to me
> > yet). As I wrote there, I think most of rerere's use cases can be
> > fulfilled by first-class conflicts. I understand that it would be a
> > huge project (much more than appropriate for GSoC :)) to add such
> > support to Git. I just want to make sure the project is aware of the
> > idea.
> >
> > [1] https://github.com/martinvonz/jj/issues/175#issuecomment-1079831788
>
> I saw that before, but neither of these two "use cases" solve a
> problem relevant to what I have to do often.  It may be a case where
> you have a hammer while rerere is a screwdriver, perhaps?  Each is
> useful in its own ways and is good at different applications.

Yes, that's probably true. I understand that there are scenarios that
rerere helps with that first-class conflicts (at least the way I
implemented them) do not.

> Rebuilding of 'seen' multiple times every day may superficially be
> similar to "test merge" case you mention there, but the desired end
> result from keeping multiple topics in master..seen chain, and have
> selected ones (not necessarily in the order in 'master..seen')
> graduate while keeping others and rebuilding 'seen' with them never
> involves artificially linearlized history in the end, and that is an
> explicit goal---to avoid the last-minute rebasing to the upstream,
> which can introduce unnecessary bugs.
>
> When I merge topics from 'seen' to 'next', I first reorder the
> topics so that these topics that are planned to be merged to 'next'
> come directly on top of the tree that matches 'next' in the
> 'master..seen' chain, so that the exact state planned to be in
> 'next' in the next iteration appears in 'seen' and be tested.  The
> merge of these topics to 'next' happens in the next integration
> iteration after this preparatory step passes.  It is the same way
> when topics that have been cooking in 'next' are (first planned to
> and then actually) merged to 'master'.  There is no "final last
> minute" rebase involved.

Thanks for explaining it in such detail. I'm afraid I still don't
understand how it's related to first-class conflicts vs rerere (I've
read the text at least 5 times).

> Another thing that I didn't quite see in your "I see rebase as
> replaying the change between parent and child" is how different
> order of merging is handled.  It often happens that topic A and
> topic B have funny interactions, and the resolution rerere records
> when I first merge topic A to 'seen' and then topic B (at which time
> the conflict we are interested in happens) is later cleanly reused
> if topic B turns out to go first long before topic C graduates.
> When such a reordering happens, topic B will be merged first
> (without causing the conflict between topics A and B), then topic A
> is merged.  Dealing with such a reordering of topics was an explicit
> goal of 'rerere' and it works reasonably well, but it is no clear
> how [1] you cited above handles such a use case.

Good point! That's not a use case I had considered. To make sure I
understand you correctly, the reordering you're talking about is
something like the difference between the following two graphs
(children on top, not on the right).

  N
  |\
  M |
 /| |
X Y Z

  P
 /|
| O
| |\
X Y Z

The problem (for my tool) here is that commit N contains resolutions
for conflicts between X and Z *and* between Y and Z, so when the
merges are done in the opposite order, you'll want to put some of the
conflict resolutions from M in O and some in P. There are commands for
moving changes (including conflict resolutions) between commits, so
you could use that here, but rerere is way smoother since it's
automatic.

> The most importantly, at the philosophical level, in order to allow
> earlier mistakes to be corrected later, Git tries to avoid casting
> heuristic decisions in immutable objects when possible.
>
> Not recording "in this commit, parent and child trees rename path A
> to B, combine some contents of path C and D to create a new path E"
> and instead computing renames when we actually compare these two
> trees, is an example of the application of the philosophy.  It
> allows rename detection heuristics at the runtime to improve over
> time and a commit you made 5 years ago will be shown better with the
> improved rename detection logic.  We do avoid recomputing the same
> information over and over again by having long lived cache data
> structure like commit-graph, but they are left out of the central
> data structure and can be reproducible.
>
> Keeping the rerere database outside the commit object is another
> application of the same philosophy.  There needs a clear way to nuke
> an earlier recorded resolution that was faulty without having to
> rewrite the history, and having it outside the commit object is a
> must, and having database in .git/rr-cache/ is one possible
> implementation to achieve that goal.

I agree with all of that. I guess there's some implication about
first-class conflicts vs rerere here too? Is the concern that if you
leave some conflict unresolved for years, it might be that the tool
now could have actually resolved that conflict instead of marking it
as a conflict in a file? So by not being forced to redo the merge, you
are instead trying to resolve an auto-resolvable conflict. Yes, that
is a problem, but it seems very small. I'm probably missing a more
serious problem.