Re: [PATCH v3 0/6] config-based hooks restarted

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 24 Aug 2021 22:29:33 +0200

On Wed, Aug 18 2021, Emily Shaffer wrote:

> This is the config-based hooks topic rebased onto v4 of Ævar's
> branch[1]. There is a happy CI build of it on GitHub[2].
> [...]
> Right now I'm trying to focus on this series first and foremost, hence
> sending two rerolls based on the same version of Ævar's base restart.
> I'll try to perform a code review on Ævar's latest tomorrow.
> [...]

First thank for the review on the base topic at
https://lore.kernel.org/git/YR2jLdYQA5CVzX5h@xxxxxxxxxx/ &
https://lore.kernel.org/git/YR7r3h1AG4Zyn7x7@xxxxxxxxxx/ and related.

I'll re-roll it soon pending your feedback on comments I left on this
series that pertain to it. I.e. there's some things that I think are
better just fixed up in the base topic (from trivial things like
whitespace changes, to some behavior changes), makes for easier reading
rather than going back & forth between the two.

I've read this whole series through, as promised in
https://lore.kernel.org/git/87lf4qeh86.fsf@xxxxxxxxxxxxxxxxxxx/:

    [I'll discuss my opinions on the new and revised config schema in
    another mail, just commenting on the patch here in terms of its stated
    goals].

I.e. here's some general comments, numbered for ease of reference:

0)

I think this is in much much better shape vis-as-vis the simplified
config schema that's now being proposed re our discussion starting
around https://lore.kernel.org/git/87bl6ttmlv.fsf@xxxxxxxxxxxxxxxxxxx/

I.e. the main complexity of the "skip" mechanism is gone, and also the
conflation of hook names with hook commands (the "rm -rf /" as a
<name-or-cmd> discussed in the above).

So before going any further, I'll just say that I wouldn't object much
to this design going in as-is. What I'm about to mention here below is
much closer to bikeshedding in my mind than "this is really to complex
to go in-tree", which was my opinion of the config schema before.

1)

On the current config design: First, before going into any detail on
that, I think whatever anyone's opinion is on that that the
design-focused patches as they stand could really use more a more
extended discussion of the design.

I.e. talk about the previously considered config schema, why it evolved
into its current form. The trade-offs involved, and why the patch
proposed to implement the schema it's implementing over another earlier
or alternate design.

I.e. https://lore.kernel.org/git/20210819033450.3382652-6-emilyshaffer@xxxxxxxxxx
is two very short paragraphs. We won't be able to summarize all our
month-long discussion on the config design in one commit message, but I
think at least discussing it somewhat / linking to relevant on-list
discussions would make future source spelunking easier.

2)

So that out of the way, a comment on the current config design, which
should be read in the context of what I noted in #0. I.e. I'm *much*
happier with this version.

That being said I'm still not convinced that the simple 1=1 mapping of
"hook.<name>.command" and its "value" should be followed by the 1=many
mapping of "hook.<name>.event" and its "value".

I.e. we're making the trade-off of saving the user from typing out or
specifying:

    [hook "my-pre-commit"]
    command = ~/hooks/pre-commit-or-push
    event = pre-commit
    [hook "my-pre-push"]
    command = ~/hooks/pre-commit-or-push
    event = pre-push

And instead being able to do:

    [hook "my-pre-commit-or-push"]
    command = ~/hooks/pre-commit-or-push
    event = pre-commit
    event = pre-push

So for the very common case, saving two config lines. "Two" because as
we discussed[1] as there's currently no GIT_HOOK_TYPE env var. So this
form will work pretty much only for that case.

I.e. unlike with .git/hook/<name> the hook run via config can't
determine what <hook-type> it's being run at, so as it stands this is
only useful for those hooks listed in githooks(5) where someone would
want to do the exact same thing for one or more <hook-name> names. You
can't use it as a general routing mechanism for any hook type as it
stands.

I *think* that's only these two, perhaps "update" and "pre-receive",
with the hook seeing if it consume stdin/has arguments to disambiguate
the two.

But even with a GIT_HOOK_TYPE passed the trade-off, as discussed in [1],
and downthread in [2], is that by having it 1=many we're closing the
door on any future hook.<name>.<whatever>. I.e. config that would change
the behavior of that hook, but you'd want to change it in another way
for at least one of the N event types.

Well, "closing the door" as in if you'd want that you'd have to split up
the section from the "my-pre-commit-or-push" example above to the
"my-pre-commit" and "my-pre-push" example.

But again, on the "is the complexity worth it" we're then having to
explain to users that they can do it one way if the want no config other
than hook.<name>.{command,event*}, but another if they have another key
in that namespace.

You've said that you wanted to add something like a GIT_HOOK_TYPE
environment variable. Fair enough, and I guess we could add it in a
re-roll of this series. I'm mainly commenting on the end-state of *this*
series in particular. I.e. I think it leaves the user & implementation
with a config schema that still seems to be needlessly complex for the
very limited benefits that complexity brings us in what you're able to
do with it now.

But some of that goes back to the comments I had on 5/6[3], i.e. I'm
willing to be convinced, but I think that the current commit message &
added docs aren't really selling the idea of why it's worth it.

3) 

As an extension to my comments on 5/6[3], I think this whole notion of
"git hook run <name>" as invoked by a user of git is just more confusing
the more I think about it.

I.e. 5/6[4] is apparently seeking to implement a way to just make that
facility a general way for users to run some command on their system to
do whatever, instead of say using /usr/bin/parallel or a shell alias.

But then we also use that command as our own dispatch mechanism for our
own known hooks, except mostly not, since we mostly use the C API
ourselves directly.

It's particularly confusing that if you say run "git hook run
pre-auto-gc" as a user to test your hook you'll have that hook run in
the same way that git-gc(1) would run it. So someone developing a hook
might think they can use "git hook run" for testing it.

But if you do the same with say "git hook run pre-receive" or anything
else that feeds arguments or stdin (e.g. "update", or "pre-receive"),
you'll have your hook happily being run by git, but in a way that's not
at all how such a hook will be run when it's run by git "for real".

So I wonder if we shouldn't just have the thing die() if you try to run
any hook that's in githooks(5) itself, except for sendemail-validate and
the p4 hooks, since we need to run those ourselves.

Or have those use an internal-only "git hook--helper", and start out
with "git hook" just supporting "git hook list", and then later on have
"git hook run" (or perhaps "git hook run-configured"?) be an entry point
for this facility of running some arbitrary script that's not a "real"
hook.

I don't know, maybe I'm the only one that finds this confusing...

1. https://lore.kernel.org/git/87bl6ttmlv.fsf@xxxxxxxxxxxxxxxxxxx/
2. https://lore.kernel.org/git/877dh0n1b3.fsf@xxxxxxxxxxxxxxxxxxx/
3. https://lore.kernel.org/git/87lf4qeh86.fsf@xxxxxxxxxxxxxxxxxxx/
4. https://lore.kernel.org/git/20210819033450.3382652-6-emilyshaffer@xxxxxxxxxx/