Re: rebase invoking pre-commit

Sean Allred <allred.sean@xxxxxxxxx> · Sun, 31 Dec 2023 04:52:00 -0600

Elijah Newren <newren@xxxxxxxxx> writes:

> On Thu, Dec 21, 2023 at 12:59 PM Sean Allred <allred.sean@xxxxxxxxx> wrote:
>> Is there a current reason why pre-commit shouldn't be invoked during
>> rebase, or is this just waiting for a reviewable patch?
>>
>> This was brought up before at [1] in 2015, but that thread so old at
>> this point that it seemed prudent to double-check before investing time
>> in a developing and testing a patch.
>>
>> [1]: https://lore.kernel.org/git/1m55i3m.1fum4zo1fpnhncM%25lists@xxxxxxxxxxxxxxxx/
>
> I'm very opinionated here.  I'm just one person, so definitely take
> this with a grain of salt, but in my view...
>
> Personally, I think implementing any per-commit hook in rebase by
> default is a mistake. It enforces a must-be-in-a-worktree-and-the-
> worktree-must-be-updated-with-every-replayed-commit mindset, which I
> find highly problematic[2], even if that's "what we always used to
> do".
>
> [2] https://lore.kernel.org/git/20231124111044.3426007-1-christian.couder@xxxxxxxxx/

I'm not hip with what most pre-commit hooks do, but I'll point out that
a hook like pre-commit assuming there is a worktree is the fault of the
hook implementation, not of the infrastructure that invokes the hook. I
imagine most folks on this list are aware that a worktree is not needed
to create a commit and update a branch to point at it.

FWIW, I would also find such a mindset to be highly problematic :-) I'll
take a moment here to thank you, Christian, and everyone else in that
effort for your interest in and work on git-replay; I've been trying to
watch its activity on-list closely in the hopes that we can adopt it
into our system once it's ready.

> Because of that, I would prefer to see this at most be a command line
> flag. However, we've already got a command line flag that while not
> identical, is essentially equivalent: "--exec $MY_SCRIPT" (it's not
> the same because it's a post-commit check, but you get notification of
> any problematic commits, and an immediate stop of the rebase for you
> to fix up the problematic commit; fixing up the commit shouldn't be
> problematic since you are, after all, already rebasing).

Indeed, and an

    --exec 'git hook run pre-commit || git reset --soft HEAD~'

would probably get you farther. I can certainly see an argument for
this, but from the perspective of designing a system for other
developers to use, such a rebase would have to be triggered
automatically (perhaps on pre-push).

> I see Phillip already responded and suggested not running the
> pre-commit hook with every commit, but only upon the first commit
> after a "git rebase --continue".  That seems far more reasonable to me
> than running on every commit...though even that worries me in regards
> to what assumptions that entails about what is present in the working
> tree.

It's worth noting the context here is to prevent developers from
committing conflict markers, so this would actually be exactly
sufficient.

Invoking pre-commit at this time would also be consistent with the
behaviors of prepare-commit-msg, commit-msg, and post-commit -- at least
when I reword a commit during a rebase.

However, post-commit is executed after each picked commit during a
rebase, so pre-commit there would also be consistent.

> (For example, what about folks with large repositories, so large that
> a branch switch or full checkout is extremely costly, and which could
> benefit from resolving conflicts in a separate sparse-checkout
> worktree, potentially much more sparse than their main checkout?

As it happens, a single checkout of our source runs upwards of 2GB, so
I'm exactly in the population you're describing :-) The main reason
we're moving to Git from SVN is that an SVN checkout can take upwards of
an hour for us today -- even with some real shenanigans to make them go
faster. On the Git side, we've also looked into (though I don't recall
if we had much success with) narrowing the sparsity patterns to just the
conflicts for conflict resolution workflows -- particularly when moving
feature code between separate trunks. So I guess I'm also glad we
weren't too far off in left field on that one! (As I recall, one of the
main challenges we faced there was ensuring there was enough stuff
'still around' so that both binary and project references could resolve
and folks could use that information to help resolve conflicts.
Hopefully git-replay can be smart enough to allow some customization on
that front. We found some success with feeding the list of conflicted
files into some arbitrary logic that spat out the sparsity pattern to
use.)

> And what if people like that really fast rebase resolution (namely,
> done in a separate very sparse checkout which also has the advantage
> of not polluting your current working tree) so much that they use it
> on smaller repositories as well? Can I not even experiment with this
> idea because of the historical per-commit-at-least-as-full-as-main
> -worktree-checkout assumptions we've baked into rebase?)

I'd be interested in reading more about this baked-in assumption. Are
these mostly laid out in replay-design-notes.txt[3]?

> While at it, I should also mention that I'm not a fan of the broken
> pre-rebase hook; its design ties us to doing a single branch at a
> time.  Maybe that hook is not quite as bad, though, since we already
> broke that hook and no one seemed to care (in particular, when
> --update-refs was implemented).  But if no one seems to care about
> broken hooks, I think the proper answer is to either get rid of the
> hook or fix it.

If I were to guess, this likely stems either from an inexact definition
of the hook in documentation (ultimately resulting in incomplete tests)
or folks incorrectly assuming what each hook should do based purely on
its name.

Which leads to an interesting point: pre-commit specifically states that
it is invoked by git-commit -- not that it's invoked whenever a commit
is created. So perhaps the correct thing to do here (if a hook is in
fact needed) would be to define a new hook -- but I worry about doing
that in the current state where there doesn't *seem* to be very rigid
coordination of when client hooks are invoked in terms of plumbing
rather than porcelain.

> Anyway, as I mentioned, I'm quite opinionated here.  To the point that
> I deemed git-rebase in its current form to possibly be unfixable
> (after first putting a lot of work into improving it over the past
> years) and eventually introduced a new "git-replay" command which I
> hope to make into a competent replacement for it.  Given that I do
> have another command to do my experiments, others on the list may
> think it's fine to send rebase further down this route, but I was
> hoping to avoid further drift so that there might be some way of
> re-implementing rebase on top of my "git-replay" ideas/design.

I appreciate your perspective; you've certainly thought a lot about this
space -- and I definitely share your goal of consolidating
implementations for obvious reasons.

So I suppose that leaves me with four possible paths forward:

1. Pursue invoking pre-commit before each commit in `git rebase` (likely
   generic in the sequencer) to be consistent with post-commit.

   It sounds like this isn't a popular option, but I'm curious to folks'
   thoughts on the noted behavior of post-commit here.

2. Pursue invoking pre-commit on `git rebase --continue` (likely on any
   --continue in the sequencer). This has the benefit of using existing
   configuration on developer machines to purportedly 'do the right
   thing' when its likely humans are touching code during conflict
   resolution. It's worth noting this isn't the only reason you might
   --continue, though, since the naive interpretation of this approach
   completely ignores sequencer commands like 'break', though it could
   probably just do what commit-msg does.

3. Define and implement a new hook that is called whenever a new commit
   is about to be (or has been?) written. Such a hook could be
   specifically designed to discourage assuming there's a working copy,
   though we're kidding nobody by thinking it won't be used downstream
   with that assumption. At least we could be explicit about
   expectations, though.

   This is *probably* a lot more design work than this little paragraph
   lets on, but I've not personally watched the introduction of a new
   hook so I don't have context for what to expect.

4. Trigger a rebase --exec in our pre-push. This is certainly the least
   work in git.git (i.e., no work at all), but it comes with the
   distinct disadvantage of playing whiplash with the developer's focus.
   During conflict resolution, they're thinking about conflicts. When
   you're ready to push, its likely that you're no longer thinking about
   conflicts.

Does the behavior of post-commit here change any minds?

[3]: https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/replay-design-notes.txt#L162

--
Sean Allred