Re: first-class conflicts?

Elijah Newren <newren@xxxxxxxxx> · Tue, 7 Nov 2023 23:31:00 -0800

Hi Martin,

On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
<martinvonz@xxxxxxxxxx> wrote:
>
[...]
> > One thing to think about if we ever want to implement this is what other
> > data we need to store along with the conflict trees to preserve the
> > context in which the conflict was created. For example the files that
> > are read by "git commit" when it commits a conflict resolution. For a
> > single cherry-pick/revert it would probably be fairly straight forward
> > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
> > transferred along with the conflicts. For a sequence of cherry-picks or
> > a rebase it is more complicated to preserve the context of the conflict.
> > Even "git merge" can create several files in addition to MERGE_HEAD
> > which are read when the conflict resolution is committed.
>
> Good point. We actually don't store any extra data in jj. The old
> per-path conflict model was prepared for having some label associated
> with each term of the conflict but we never actually used it.
>
> If we add such metadata, it would probably have to be something that
> makes sense even after pushing the conflict to another repo, so it
> probably shouldn't be commit ids, unless we made sure to also push
> those commits. Also note that if you `jj restore --from <commit with
> conflict>`, you can get a conflict into a commit that didn't have
> conflicts previously. Or if you already had conflicts in the
> destination commit, your root trees (the multiple root trees
> constituting the conflict) will now have conflicts that potentially
> were created by two completely unrelated operations, so you would kind
> of need different labels for different paths.
>
> https://github.com/martinvonz/jj/issues/1176 has some more discussion
> about this.

Interesting link; thanks for sharing.

I am curious more about the data you do store.  My fuzzy memory is
that you store a commit header involving something of the form "A + B
- C", where those are all commit IDs.  Is that correct?  Is this in
addition to a normal "tree" header as in Git, or are one of A or B
found in the tree header?  I think you said there was also the
possibility for more than three terms.  Are those for when a
conflicted commit is merged with another branch that adds more
conflicts, or are there other cases too?  (Octopus merges?)

What about recursive merges, i.e. merges where the two sides do not
have a unique merge base.  What is the form of those?  (Would "- C" be
replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
merge base V and then do a " - V"?  Or do we only have "A + B"?)

You previously mentioned that if someone goes to edit a commit with
conflicts, and resolves the conflicts in just one file, then you can
modify each of the trees A, B, and C such that a merging of those
trees gives the partially resolved result.  How does one do that with
special conflicts, such as:
   * User modifies file D on both sides of history, in conflicting
ways, and also renames D -> E on one side of history.  User checks out
this conflicted commit and fixes the conflicts in E (but not other
files) and does a "git add E".  When they go to commit, does the
machinery need a mapping to figure out that it needs to adjust "D" in
two of the trees while adjusting "E" in the other?
   * Similar to the above, but the side that doesn't rename D renames
olddir/ -> newdir/, and the side that renames D instead renames
D->olddir/E.  For this case, the file will end up at newdir/E; do we
need the backward mapping from newdir/E to both olddir/E and D?
   * Slightly different than the above: User renames D -> E on one
side of history, and D -> F on the other.  That's a rename/rename
(1to2) conflict.  User checks out this conflicted commit and does a
"git add F", marking it as okay, but leaving E conflicted.  How can
one adjust the tree such that no conflict for F appears, but one still
appears for E?
   * Similar to above with an extra wrinkle: User renames D -> E on
one side of history, and on the other side both renames D -> F and
adds a slightly different file named E.  That's both a rename/rename
(1to2) conflict for E & F, and an add/add conflict for E.  Users
checks out this conflicted commit and resolves textual conflict in E
(in favor of the "other side"), and does a "git add E", marking it as
resolved.  When they go to commit, we not only need to worry about
making sure a conflict for F appears, we also need to figure out how
to adjust the tree such that the merge result gives you the expected
value in E without affecting F.  How can that be done?

On the first two bullet points, there's no such thing as a reverse
mapping from conflicted files to original files from previous commits
in current Git.  Creating one, if possible, would be a fair amount of
work.  But, I'm not so sure it's even possible, due to the fact that
conflicts and files do not always have one-to-one (or even one-to-many
or many-to-one) relationships; many-to-many relationship can exist, as
I've started alluding to in the last two bullet points (see also
https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/Documentation/git-merge-tree.txt#L266-L271).
In fact, they can get even more complicated (e.g.
https://github.com/git/git/blob/master/t/t6422-merge-rename-corner-cases.sh#L1017-L1022).

> > > But we'd also have to be careful and think through usecases, including
> > > in the surrounding community.  People would probably want to ensure
> > > that e.g. "Protected" or "Integration" branches don't get accept
> > > fetches or pushes of conflicted commits,
> >
> > I think this is a really important point, while it can be useful to
> > share conflicts so they can be collaboratively resolved we don't want to
> > propagate them into "stable" or production branches. I wonder how 'jj'
> > handles this.
>
> Agreed. `jj git push` refuses to push commits with conflicts, because
> it's very unlikely that the remote will be able to make any sense of
> it. Our commit backend at Google does support conflicts, so users can
> check out each other's conflicted commits there (except that we
> haven't even started dogfooding yet).

I'm curious to hear what happens when you do start dogfooding, on
projects with many developers and which are jj-only.  Do commits with
conflicts accidentally end up in mainline branches, or are there good
ways to make sure they don't hit anything considered stable?

> > > git status would probably
> > > need some special warnings or notices, git checkout would probably
> > > benefit from additional warnings/notices checks for those cases, git
> > > log should probably display conflicted commits differently, we'd need
> > > to add special handling for higher order conflicts (e.g. a merge with
> > > conflicts is itself involved in a merge) probably similar to what jj
> > > has done, and audit a lot of other code paths to see what would be
> > > needed.
> >
> > As you point out there is a lot more to this than just being able to
> > store the conflict data in a commit - in many ways I think that is the
> > easiest part of the solution to sharing conflicts.
>
> Yes, I think it would be a very large project. Unlike jj, Git of
> course has to worry about backwards compatibility. For example, you
> would have to decide if your goal - even in the long term - is to make
> `git rebase` etc. not get interrupted due to conflicts.

...and whether to copy jj's other feature in this area in some form:
auto-rebasing any descendants when you checkout and amend an old
commit (e.g. to resolve conflicts).  :-)