Re: first-class conflicts?

Elijah Newren <newren@xxxxxxxxx> · Fri, 10 Nov 2023 13:41:43 -0800

Hi Martin,

On Wed, Nov 8, 2023 at 10:23 AM Martin von Zweigbergk
<martinvonz@xxxxxxxxxx> wrote:
> On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@xxxxxxxxx> wrote:
> > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
> > <martinvonz@xxxxxxxxxx> wrote:
> > >
[...]
> > I am curious more about the data you do store.  My fuzzy memory is
> > that you store a commit header involving something of the form "A + B
> > - C", where those are all commit IDs.  Is that correct?
>
> We actually store it outside the Git repo (together with the "change
> id"). We have avoided using commit headers because I wasn't sure how
> well different tools deal with unexpected commit headers, and because
> I wanted commits to be indistinguishable from commits created by a
> regular Git binary. The latter argument doesn't apply to commits with
> conflicts since those are clearly not from a regular Git binary
> anyway, and we don't allow pushing them to a remote.
>
> >  Is this in
> > addition to a normal "tree" header as in Git, or are one of A or B
> > found in the tree header?
>
> It's in addition. For the tree, we actually write a tree object with
> three subtrees:
>
> .jjconflict-base-0: C
> .jjconflict-side-0: A
> .jjconflict-side-1: B
>
> The tree is not authoritative - we use the Git-external storage for
> that. The reason we write the trees is mostly to prevent them from
> getting GC'd.

Oh, that seems like a clever way to handle reachability and make sure
the relevant trees are automatically included in any pushes or pulls.

> Also, if a user does `git checkout <conflicted commit>`,
> they'll see those subdirectories and will hopefully be reminded that
> they did something odd (perhaps we should drop the leading `.` so `ls`
> will show them...). They can also diff the directories in a diff tool
> if they like.

Oh, so they don't get a regular top-level looking tree with
possibly-conflicted-files present?  Or is this in addition to the
regular repository contents?  If in addition, are you worried about
users ever creating real entries named ".jjconflict-base-<N>" in their
repository?

> >  I think you said there was also the
> > possibility for more than three terms.  Are those for when a
> > conflicted commit is merged with another branch that adds more
> > conflicts, or are there other cases too?  (Octopus merges?)
>
> Yes, they can happen in both of those cases you mention. More
> generally, whenever you apply a diff between two trees onto another
> tree, you might end up with a higher-arity conflict. So merging in
> another branch can do that, or doing an octopus merge (which is the
> same thing at the tree level, just different at the commit level), or
> rebasing or reverting a commit.
>
> We simplify conflicts algebraically, so rebasing a commit multiple
> times does not increase the arity - the intermediate parents were both
> added and removed and thus cancel out. These simple algorithms for
> simplifying conflicts are encapsulated in
> https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of
> them are independent of the type of values being merged; they can be
> used for doing algebra on tree ids, content hunks, refs, etc. (in the
> test cases, we mostly merge integers because integer literals are
> compact).

It's done on content hunks as well?  That's interesting.

When exactly would it be done on refs, though?  I'm not following that one.

And what else is in that "etc."?

> > What about recursive merges, i.e. merges where the two sides do not
> > have a unique merge base.  What is the form of those?  (Would "- C" be
> > replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
> > merge base V and then do a " - V"?  Or do we only have "A + B"?)
>
> We do that by recursively creating a virtual tree just like Git does,
> I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71).
> I think the main difference is that by modeling conflicts, we can
> avoid recursive conflict markers (if that's what Git does), and we can
> even automatically resolve some cases where the virtual tree has a
> conflict.

Okay, but that talks about the mechanics of creating a recursive
merge, omitting all the details about how the conflict header is
written when you record the merge.  Is the virtual merge base
represented in the algebraic "A + B - C" expressions, or is the "- C"
part omitted?  If it is represented, and the virtual merge base had
conflicts which you could not automatically resolve, what exactly does
the conflicted header for the outer merge get populated with?

[...]

> Great questions! We don't have support for renames, so we haven't had
> to worry about these things. We have talked a little about divergent
> renames and the need for recording that in the commit so we can tell
> the user about it and maybe ask them which name they want to keep. I
> had not considered the interaction with partial conflict resolution,
> so thanks for bringing that up. I don't have any answers now, but
> we'll probably need to start thinking about this soon.

I was wondering if that might be the answer.  When you do tackle this,
I'd be interested to hear your thoughts.  I'm wondering if we just
need to augment the data in the conflict header to handle such cases
(though I guess this could risk having commit objects that are
significantly bigger than normal in theoretical cases where many such
paths are involved?)

> > I'm curious to hear what happens when you do start dogfooding, on
> > projects with many developers and which are jj-only.  Do commits with
> > conflicts accidentally end up in mainline branches, or are there good
> > ways to make sure they don't hit anything considered stable?
>
> That won't happen at Google because our source of truth for "merged
> PRs" (in GitHub-speak) is in our existing VCS. We will necessarily
> have to translate from jj's data model to its data model before a
> commit can even be sent for review.

That makes sense, but I was just hoping we'd have an example to look
to for how to keep things safe if we were to implement this.  Sadly, I
don't think we have the benefit of relying on folks to first push
their commits into some other VCS which lacks this feature.  ;-)