Hi Phillip, On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote: > > Hi Elijah > > [I've cc'd Martin to see if he has anything to add about how "jj" > manages the issues around storing conflicts.] +1. I'll add some other questions for him too while we're at it, separately in this thread. [...] > > Martin also gave us an update at the 2023 Git Contributors summit, and > > in particular noted a significant implementation change to not have > > per-file storage of conflicts, but rather storing at the commit level > > the multiple conflicting trees involved. That model might be > > something we could implement in Git. And if we did, it'd solve > > various issues such as people wanting to be able to stash conflicts, > > or wanting to be able to partially resolve conflicts and fix it up > > later, or be able to collaboratively resolve conflicts without having > > everyone have access to the same checkout. > > One thing to think about if we ever want to implement this is what other > data we need to store along with the conflict trees to preserve the > context in which the conflict was created. For example the files that > are read by "git commit" when it commits a conflict resolution. For a > single cherry-pick/revert it would probably be fairly straight forward > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets > transferred along with the conflicts. This is a great thing to think about and bring up. However, I'm not sure what part of it actually needs to be preserved; in fact, it's not clear to me that any of it needs preserving -- especially not the files read by "git commit". A commit was already created, after all. It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to clue in that we are in-the-middle-of-<op>, and the conflict header (the "tree A + tree B - tree C" thing; whatever that's called) similarly provides signal that the commit still has conflicts. Secondarily, these files contain information about the tree we came from and its parent tree, which allows users to investigate the diff between those...but that information is also available from the conflict header in the recorded commit. The CHERRY_PICK_HEAD and REVERT_HEAD files could also be used to access the commit message, but that would have been stored in the conflicted commit as well. Are there any other pieces of information I'm missing? > For a sequence of cherry-picks or > a rebase it is more complicated to preserve the context of the conflict. I think the big piece here is whether we also want to adopt jj's behavior of automatically rebasing all descendant commits when checking out and amending some historical commit (or at least having the option of doing so). That behavior allows users to amend commits to resolve conflicts without figuring out complicated interactive rebases to fix all the descendant commits across all relevant branches. Without that feature, I agree this might be a bit more difficult, but with that feature, I'm having a hard time figuring out what context we actually need to preserve for a sequence of cherry-picks or a rebase. Digging into a few briefly... Many of the state files are about the status of the in-progress operation (todo-list, numbers of commits done and to do, what should be done with not-yet-handled commits, temporary refs corresponding to temporary labels that need to be deleted, rescheduling failed execs, dropping or keeping redundant commits, etc.), but if the operation has completed and new commits created (potentially with multiple files with conflict headers), I don't see how this information is useful anymore. There are some special state files related to half-completed operations (e.g. squash commits when we haven't yet reached the final one in the sequence, a file to note that we want to edit a commit message once the user has finished resolving conflicts, whether we need to create a new root commit), but again, the operation has completed and commits have been created with appropriate parentage and commit messages so I don't think these are useful anymore either. Other state files are related to things needing to be done at the end of the operation, like invoke the post-rewrite hook or pop the autostash (with knowledge of what was rewritten to what). But the operation would have been completed and those things done already, so I don't see how this is necessary either. Some state files are for controlling how commits are created (setting committer date to author date, gpg signing options, whether to add signoff), but, again, commits have already been created, and can be further amended as the user wants (hopefully including resolving the conflicts). The biggest issue is perhaps that REBASE_HEAD is used in the implementation of `git rebase --show-current-patch`, but all information stored in that is still accessible -- the commit message is stored in the commit, the author time is stored in the commit, and the trees involved are in the conflict header. The only thing missing is committer timestamp, which isn't relevant anyway. The only ones I'm pausing a bit on are the strategy and strategy-options. Those might be useful somehow...but I can't currently quite put my finger on explaining how they would be useful and I'm not sure they are. Am I missing anything? > Even "git merge" can create several files in addition to MERGE_HEAD > which are read when the conflict resolution is committed. That's a good one to bring up too, but I'm not sure I understand how these could be useful to preserve either. Am I missing something? My breakdown: * MERGE_HEAD: was recorded in the commit as a second parent, so we already have that info * MERGE_MSG: was recorded in the commit as the commit message, so again we already have that info * MERGE_AUTOSTASH: irrelevant since the stashed stuff isn't part of the commit and was in fact unstashed after the merge-commit-with-conflicts was created * MERGE_MODE: irrelevant since it's only used for reducing heads at time of git-commit, and git-commit has already been run * MERGE_RR: I think this is irrelevant; the conflict record (tree A + tree B - tree C) lets us redo the merge if needed to get the list of conflicted files and textual conflicts found therein So I don't see how any of the information in these files need to be recorded as additional auxiliary information. However, that last item might depend upon the strategy and strategy-options, which currently is not recorded...hmm.... > > But we'd also have to be careful and think through usecases, including > > in the surrounding community. People would probably want to ensure > > that e.g. "Protected" or "Integration" branches don't get accept > > fetches or pushes of conflicted commits, > > I think this is a really important point, while it can be useful to > share conflicts so they can be collaboratively resolved we don't want to > propagate them into "stable" or production branches. I wonder how 'jj' > handles this. Yeah, figuring this out might be the biggest sticking point. > > git status would probably > > need some special warnings or notices, git checkout would probably > > benefit from additional warnings/notices checks for those cases, git > > log should probably display conflicted commits differently, we'd need > > to add special handling for higher order conflicts (e.g. a merge with > > conflicts is itself involved in a merge) probably similar to what jj > > has done, and audit a lot of other code paths to see what would be > > needed. > > As you point out there is a lot more to this than just being able to > store the conflict data in a commit - in many ways I think that is the > easiest part of the solution to sharing conflicts. Yeah, another one I just thought of is that the trees referenced in the conflicts would also need to affect reachability computations as well, to make sure they both don't get gc'ed and that they are transferred when appropriate. There are lots of things that would be involved in implementing this idea.