Re: Should --update-refs exclude refs pointing to the current HEAD?

Elijah Newren <newren@xxxxxxxxx> · Wed, 6 Mar 2024 21:36:47 -0800

On Wed, Mar 6, 2024 at 1:00 PM Stefan Haller <lists@xxxxxxxxxxxxxxxx> wrote:
>
> On 06.03.24 03:57, Elijah Newren wrote:
>
> > 1) What if there is a branch that is "just a copy" of one of the
> > branches earlier in the "stack"?  Since it's "just a copy", shouldn't
> > it be excluded for similar reasons to what you are arguing?  And, if
> > so, which branch is the copy?
>
> This is a good point, but in my experience it's a lot more rare. Maybe
> I'm looking at all this just from my own experience, and there might be
> other usecases that are very different from mine, but as far as I am
> concerned, copies of branches are not long-lived.

> There is no point in having two branches point at the same commit.

But isn't that what you're doing?

> When I create a copy of a
> branch, I do that only to rebase the copy somewhere else _immediately_,
> leaving the original branch where it was.

If it is inherently tied like this, why not create the new branch
immediately after the rebase (with active_branch@{1} as the start
point), instead of creating it immediately before?

> Which means that I encounter
> copied branches only at the top of the stack, not in the middle. Which
> means that I'm fine with keeping the current behavior of "rebase
> --update-ref" to update both copies of that middle-of-the-stack branch,
> because it never happens in practice for me.

You've really lost me here; are you saying you're fine changing the
design to add inherent edgecase bugs to the code because those edge
cases "never happen in practice for me"?  I've spent a lot of time
dealing with built up cruft in git from partial solutions and fixes
that overlooked subsets of relevant testcases, so I'm not a fan of
that statement and in particular the last two words of it.  Perhaps
I'm reading it wrong, and if so I apologize, but it triggered unhappy
memories of mine from merge-recursive.c and dir.c and elsewhere.

> > 2) Further, a "stack", to me at least, suggests a linear history
> > without branching (i.e. each commit has at most one parent _and_ at
> > most one child among the commits in the stack).  I designed `git
> > replay` to handle diverging histories (i.e. rebasing multiple branches
> > that _might_ share a subset of common history but none necessarily
> > need to fully contain the others, though perhaps the branches do share
> > some other contained branches), and I want it to handle replaying
> > merges as well.  While `git rebase --update-refs` is absolutely
> > limited to "stacks", and thus your argument might make sense in the
> > context of `git rebase`, since you are bringing `git replay` into the
> > mix, it needs to apply beyond a stack of commits.  It's not clear to
> > me how to genericize your suggestions to handle cases other than a
> > simple stack of commits, though.
>
> I don't see a contradiction here. I don't tend to do this in practice,
> but I can totally imagine a tree of stacked branches that share some
> common base branches in the beginning and then diverge into different
> branches from there. It's true that "rebase --update-refs", when told to
> rebase one of the leaf branches, will destroy this tree because it pulls
> the base branches away from under the other leaf branches, but this is
> unrelated to my proposal, it has this problem today already. And it's
> awesome that git replay has a way to avoid this by rebasing the whole
> tree at once, keeping everything intact. Still, I don't see what's bad
> about excluding branches that point at the same commits as the leaf
> branches it is told to rebase when using "replay --contains".

By "leaf branches", do you mean (a) those commits explicitly mentioned
on the command line for being replayed, (b) only the subset of the
branches mentioned on the command line which aren't an ancestor of
another commit being replayed, or (c) something else?

> (I suppose
> what I'm suggesting is to treat "--contains" to mean "is included in the
> half-open interval from base to tip" of the revision range you are
> rebasing, rather than the closed interval.)

"half-open interval"?  That to me again implies a simple stack, which
since we're trying to address the more general case, makes me more
confused rather than less.

Let me re-ask my question another way.  If someone runs
    git replay --onto A --contained ^B ^C D E F
when branches G, H, & I are in the revision range of "^B ^C D E F",
with G in particular pointing where D does and H pointing where E
does, and E contains D in its history, and F contains commits that are
in neither D nor E, how do I figure out which of D-I should be
updated?

> Maybe I should make this more explicit again: I'm not trying to solve
> the problem of making a copy of a stack of branches, and rebasing that
> copy somewhere else. I think this can't be solved except by making
> branch stacks a new concept in git, which I'm not sure we want to do.

Oh, I hadn't even thought of that.  Yeah, that'd be even more complex.

> > 3) This is mostly covered in (1) and (2), but to be explicit: `git
> > replay` is completely against the HEAD-is-special assumptions that are
> > pervasive within `git rebase`, and your problem is entirely phrased as
> > HEAD-is-special due to your call out of "the current branch".  Is your
> > argument limited to such special cases?  (If so, it might still be
> > valid for `git rebase`, of course.)
>
> No, I don't think I need HEAD to be special. "The thing that I'm
> rebasing" is special, and it is always HEAD for git rebase, but it can
> be something else for replay.

But what exactly should that something else be?  I still don't
understand what that is from your explanation so far.

> > 4a) `git replay` does what Junio suggests naturally, since it doesn't
> > update the refs but instead gives commands which can be fed to `git
> > update-ref --stdin`.  Thus, users can inspect the output of `git
> > replay` and only perform the updates they want (by feeding a subset of
> > the lines to update-ref --stdin).
>
> At this point I probably need to explain that I'm rarely using the
> command line. I'm a user and co-maintainer of lazygit, and I want to
> make lazygit work in such a way that "it does the right thing" in as
> many cases as possible.

...and I'm pointing out that `git replay` has the necessary tools to
enable you to do so.  Unlike `git rebase --update-refs` it doesn't
automatically update the branches, but just creates the new commits
and tells you what it could update each branch to, in a format that
you can pass along to another tool to actually do the updates of the
branches.  As such, you can write your tool to take that output, pick
out the bits you like, and only pass those bits along so that only
some of the branches are updated.

> > 4b) For `git replay`, --contained is just syntactic sugar -- it isn't
> > necessary.  git replay will allow you to list multiple branches that
> > you want replayed, so you can specify which branches are relevant to
> > you.
>
> That's great, even if it means that I have to redo some of the work that
> --contains would already do for me, just because I want a slightly
> different behavior.

Right, but I thought you were maintaining lazygit, meaning that
programming it to select the branches you want is a one time cost?

Something like `git log --format=%D --decorate-refs=refs/heads/
${base}..HEAD^1 | grep -v ^$`, plus adding in the current branch,
right?

Or is the concern with this suggestion the performance hit you'd take
(which admittedly might be a problem with this solution, since you
walk the commits an extra time)?

> > 4c) For `git rebase --update-refs`, you can add `--interactive` and
> > then delete the `update-ref` line(s) corresponding to the refs you
> > don't want updated.
>
> Yes, that's what I always do today to work around the problem. It's just
> easy to forget, and I find it annoying that I have to take this extra
> step every time.

And if you forget, then after the rebase it's trivial to move the
updated branch back to where you want it, right?

   git branch -f ${copy_branch_name} ${current_branch_name}@{1}

In fact, that's probably easier than making the rebase interactive,
and should be easier to remember since you only ever create these
branches precisely when you want to do a rebase.

> One last remark: whenever I describe my use case involving copies of
> branches, people tell me not to do that, use detached heads instead, or
> other ways to achieve what I want. But then I don't understand why my
> proposal would make a difference for them. If you don't use copied
> branches, then why do you care whether "rebase --update-refs" or "replay
> --contained" moves those copies or not? I still haven't heard a good
> argument for why the current behavior is desirable, except for the one
> example of a degenerate stack that Phillip Wood described in [1].

The current behavior is easy to describe and explain to users, and
generalizes nicely to cases of replaying multiple diverging and
converging branches.

To me, the behavior you're proposing doesn't seem to share either of
those qualities, at least not as you've explained it so far.

But, perhaps that's because I still don't really understand your
usecase.  I'm trying to, and it's possible I could be convinced there
is a proposal here that is easy to explain to users and generalizes
nicely.  My way of attempting to get that out is to make
counter-proposals and ask questions as a way of teasing out what your
usecase is and what a refined proposal might be.  Currently, it seems
there are two trivial alternative solutions that would solve this
problem more cleanly (namely, either creating the branch after the
fact instead of beforehand, or simply updating the branch after the
fact)...but maybe I'm still just missing something?