Re: [PATCH v3 15/15] rebase: change the default backend from "am" to "merge"

Elijah Newren <newren@xxxxxxxxx> · Thu, 16 Jan 2020 13:30:38 -0800

On Thu, Jan 16, 2020 at 12:35 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
>
> > Of course, arguing against myself, we could always tell them to just
> > implement a post-rewrite hook.
>
> That's an interesting facet I hadn't thought about before.
>
> It's tempting since it kind of minimizes the blast surface while
> providing people a way to keep doing what they had wanted.
>
> Historically "git cherry-pick" ran "git commit", causing it to run the

What do you mean "historically"?  I'm pretty sure this piece of code
is shared in the sequencer between rebase -i, rebase --merge, and
cherry-pick, so all three currently call post-commit unless I'm
mistaken.

> post-commit hook.  There's some related discussion at
> https://lore.kernel.org/git/20f33df8-7ba8-af26-e0c8-16152345c85b@xxxxxxxxxxxx/
> of other side effects of having run "git commit".
>
> ... oh! git cherry-pick doesn't run post-rewrite.  Interesting.
>
> To summarize what we've discussed so far:
>
>  A. run post-commit hook consistently in rebase --am and --merge modes

Could you spell these out a bit more?  I think by this option you mean
"Always run post-commit when commits are created", which includes
whether those commits were created by git-commit, git-rebase (any
backend), git-merge, git-cherry-pick, git-revert, git-am, etc. (but
with a glaring exception for fast-import and things that call it, much
like the exception post-rewrite already carves out for it).

Much as I don't see a reason to make rebase --merge different than
rebase --am, I don't see a reason to differentiate between git-merge
and git-commit.  I also don't see why rebase --am should be different
than am.

>     Advantages:
>     * consistent with historic "rebase -i" behavior
>     * supports hooks like ctags generation that want to update state to
>       match what has been committed in the worktree (especially when
>       preparing for the rebase to stop due to conflicts)
>     * consistent with other commands like "git cherry-pick" that run the
>       post-commit hook
>
>     Disadvantages:
>     * invokes hooks more often in a setting they are not used to being
>       invoked in
>     * slows down rebase
>     * when post-commit hooks are used for deployment, exposes
>       intermediate states in the middle of a rebase to the deployment
>       environment
>     * inconsistent with "git am"

I don't want to introduce additional inconsistencies.  git-am and git
merge should be changed at the same time if we go this route.

>     Mitigations:
>     * could go through the normal warn about change, opt-in config,
>       opt-out config cycle to make the change smoother
>     * could provide new hooks (e.g., one run after a batch of objects
>       is created for applications similar to "git gc --auto", one run
>       when HEAD is updated for applications like the ctags one, one
>       run on ref update for applications like the deployment case) and
>       encourage authors to migrate to them
>     * could run the post-commit hook in "git am", too

>  B. stop running the post-commit hook in rebase

AND sometimes in both cherry-pick and git-revert, depending on the
number of commit(s) picked/reverted (see below).

Also, even if we go this route, I think the post-commit hook should be
added to git-merge whenever it creates a merge commit.

>     Advantages:
>     * makes --am and --merge consistent with minimal user impact

...and also makes --interactive and --exec and other cases consistent too.

>     Disadvantages:
>     * meaning of the post-commit hook remains a muddle

Why?  "Commands which create no more than one commit (git-commit,
git-merge, maybe single-commit git-revert or git-cherry-pick) call
post-commit, commands which create several commits derived from others
(git-am, git-rebase, git-cherry-pick, sometimes git-revert ) call
post-rewrite instead for performance reasons" (with the standard
exception for fast-import; also, git-merge could probably call
post-rewrite in fast-forward cases since no new commit is created to
call post-commit on).

Or we could just say that commands which can create multiple commits
(cherry-pick & revert) only call post-rewrite even in cases where they
are only called with one commit (since am and rebase can also be
called on just one commit).

>     * in particular, "git am" and "git rebase" would not be consistent
>       with "git cherry-pick"

Yeah, I'd rather not introduce more inconsistencies; we should pick a
rule and enforce it universally.  Thus, I'd say option B would
probably be thought of as the "always creates at most one commit" vs.
"creates multiple commits" split for post-commit vs. post-rewrite.

>     Mitigations:
>     * could provide new hooks for people to migrate to to replace that
>       muddle
>
>  C. stop running the post-commit hook in rebase --merge or commit --amend
>
>     Advantages:
>     * produces a consistent definition of post-commit ("it's about new
>       commits only")

But cherry-pick never creates "new" commits if commit --amend doesn't,
so we'd also want to turn the post-commit hook off for cherry-pick,
right?  And we'd want to turn post-commit on for git-merge, at least
for non-fast-forward cases since those create new commits.

>     * hook authors get the same expressiveness using the post-rewrite hook

Isn't this also an advantage for case B?  Any reason it was left off up there?

>     Disadvantages:
>     * the change to historical "git commit --amend" behavior is likely
>       to be surprising

That is certainly one issue, but I'd say this case is riddled with
problems and is the most muddled of all.  Once you change git commit
--amend behavior because it's about *new* commits only then you've got
a huge can of worms from rebase -i:

For interactive rebases, we have to avoid the post-commit hook for
anything that is a pick, but if they break or edit and insert new
commits then we have to call the post-commit hook for those.  If the
rebase stops due to a merge conflict, then the next commit created
needs to avoid the post-commit hook.  If they pick to squash or fixup
commits, then we have to deploy rename/break detection to determine
whether the commit is "new" or not and thus whether the post-commit
hook should be triggered for it.  Actually, maybe we need the
rename/break-detection logic when the rebase stops at a conflict too
due to user edits.  And then we start getting into questions about
whether the normal heuristics for rename/break-detection apply or if
"new-ness" should have its own set of heuristics...

Also,  I can't tell if a reverted commit count as "new" or not.  Even
if it does, what about the case of a reverted revert, or higher order
iterations of reverts (which I see occasionally)?

Does this definition mean we should use the git-cherry machinery on
any new commits to search for "newness" (maybe even with an extension
to search across all branches) before we decide whether to call the
post-commit hook?

I think case C is opening up a pandora's box of craziness.

>     Mitigations:
>     * could go through the warn, opt-in config, opt-out config cycle
>
> Any bits I've missed?
>
> Thanks,
> Jonathan

I think it was useful to phrase it this way and list the various
advantages, disadvantages, and mitigations; thanks.

Is there anything I've missed in my additional details?  Anything
sound crazy (beyond what I said I think case C leads to)?

Elijah