Re: [PATCH 8/9] hook: teach 'hookcmd' config to alias hook scripts

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 23 Jul 2021 09:41:35 +0200

On Thu, Jul 22 2021, Emily Shaffer wrote:

> On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> 
>> On Thu, Jul 15 2021, Emily Shaffer wrote:
>> 
>> > To enable fine-grained options which apply to a single hook executable,
>> > and to make it easier for a single executable to be run on multiple hook
>> > events, teach "hookcmd.<alias>.config". These can be configured as
>> > follows:
>> > [...]
>> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
>> > index a97b980cca..5b35170664 100644
>> > --- a/Documentation/config/hook.txt
>> > +++ b/Documentation/config/hook.txt
>> > @@ -3,6 +3,11 @@ hook.<command>.command::
>> >  	executable on your device, a oneliner for your shell, or the name of a
>> >  	hookcmd. See linkgit:git-hook[1].
>> >  
>> > +hookcmd.<name>.command::
>> > +	A command to execute during a hook for which <name> has been specified
>> > +	as a command. This can be an executable on your device or a oneliner for
>> > +	your shell. See linkgit:git-hook[1].
>> > +
>> > [...]
>> > +Global config
>> > +----
>> > +  [hook "post-commit"]
>> > +    command = "linter"
>> > +    command = "~/typocheck.sh"
>> > +
>> > +  [hookcmd "linter"]
>> > +    command = "/bin/linter --c"
>> > +----
>> > +
>> > +Local config
>> > +----
>> > +  [hook "prepare-commit-msg"]
>> > +    command = "linter"
>> > +  [hook "post-commit"]
>> > +    command = "python ~/run-test-suite.py"
>> > +----
>> > +
>> > +With these configs, you'd then run post-commit hooks in this order:
>> > +
>> > +  /bin/linter --c
>> > +  ~/typocheck.sh
>> > +  python ~/run-test-suite.py
>> > +  .git/hooks/post-commit (if present)
>> > +
>> > +and prepare-commit-msg hooks in this order:
>> > +
>> > +  /bin/linter --c
>> > +  .git/hooks/prepare-commit-msg (if present)
>> >  
>> 
>> I still have outstanding feedback on the fundamental design
>> here. I.e. why is this not:
>> 
>>     hook.<name>.event = post-commit
>>     hook.<name>.command = <path>
>> 
>> See:
>> 
>> https://lore.kernel.org/git/87mtv8fww3.fsf@xxxxxxxxxxxxxxxxxxx/
>> 
>> As noted there I don't see why not, and more complexity results from the
>> design choice of doing it the way you're proposing. I.e. we can't
>> discover hooks based on config prefixes, and we end up sticking full FS
>> paths in config keys.
>
> Interesting. My gut says that it would make it harder to neatly write a
> config with the same hook running at the very beginning of one event and
> the very end of another, but I'm not sure whether that's a case to
> optimize for.
>
> I would also need twice as many lines to run a script I wrote as a hook
> - that is, the base case which is probably all most people will need. So
> with your proposal I *must* name every hook I want to add. Instead of
> "hook.pre-commit.command = ~/check-for-debug-strings" I need to
> configure "hook.check-debug-strings.event = pre-commit" and
> "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
> little frustrating, and if I never want to skip that check or move it
> later or something, it will be extra overhead for no gain for me.

The gain is that "git hook list" becomes a trivial "git config
-show-origin --show-scope --get-regexp" wrapper.

So the series either doesn't need "git hook list" or such a thing
becomes much less complex, especially given the proposed addition of
other features in the area like "git hook edit", i.e. (quoting the
linked E-Mail):

    As just one example; surely "git config edit <name>" would need to
    run around and find config files to edit, then open them in a loop
    for you, no?

    Which we'd eventually want for "git config" in general with an
    --edit-regexp option or whatever, which brings us (well, at least
    me) back to "then let's just add it to git-config?".

> I do agree that your approach bans the regrettably awkward
> "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure
> whether or not it's worth it.

That design choice also means that you can't expand the path using "git
config --get --type=path.

We do have that with the "includeIf" construct, but if we can avoid it
we should, it makes it play nicer with other assumptions and features of
the config system.

As noted in the follow-up reply while we don't case normalize the LeVeL"
part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1
mapping on some
FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@xxxxxxxxxxxxxxxxxxx/

> To help us visualize the change, here's some common and complicated
> tasks and how they look with either schema (let mine be A and yours be
> B):

Before diving into that, I'll just say I don't care about the trivial
specifics of how this is done, i.e. the bikeshedding of what the config
keys etc. are named.

Just in (as noted above) design choices here forcing avoidable
complexity in other areas.

> #1 - Add a oneliner (not executing a script)
> A:
> [hook "post-commit"]
> 	command = echo post commit
> B:
> [hook "oneliner"]
> 	event = post-commit
> 	command = echo post commit
> #2 - Execute a script
> A:
> [hook "post-commit"]
> 	command = ~/my-post-commit-hook
> B:
> [hook "script"]
> 	event = post-commit
> 	command = ~/my-post-commit-hook

...

> #3 - Run a handful of scripts in a specific order on one event
> A:
> [hook "post-commit"]
> 	command = ~/my-post-commit-1
> 	command = ~/my-post-commit-2
> 	command = ~/my-post-commit-3
> B:
> [hook "script 1"]
> 	event = post-commit
> 	command = ~/my-post-commit-1
> [hook "script 2"]
> 	event = post-commit
> 	command = ~/my-post-commit-2
> [hook "script 3"]
> 	event = post-commit
> 	command = ~/my-post-commit-3

To reply to all the above, yes, your suggestion comes out ahead in being
less verbose.

But isn't the real difference not in the differing prefixes, i.e. hook.*
and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly
focusing on, i.e. what requires the added complexity.

But that in that in your proposed way of doing it it's:

    hook.<event-or-name>.*

V.s. my suggestion of:

    hook.<name>.*

And thus whenever you have a <event-or-name> that just happens to be a
built-in hook listed in githooks(5) we in (A) implicitly expand config
like:

    hook.post-commit.command = echo foo

To:

    hook.post-commit.command = echo hi
    hook.post-commit.type    = post-commit

But not knowing about "foo" we don't expand:

    hook.foo.command = echo foo

To:

    hook.foo.command = echo hi
    hook.foo.type    = foo # This would be an error, or ignored.

But rather leave "dangling" for the user to later supply the "*.event"
themselves, i.e.:

    hook.foo.command = echo hi
    hook.foo.event = post-commit

And means that you presumably need to detect this case and error about
it, but my proposed model does not:

    hook.post-commit.command = echo hi
    # User error: "*.type" and <event-or-name>" does not match for
    # "hook.*.command"
    hook.post-commit.type    = pre-commit

And furthermore, that if someone in your model were to do:

    hook.verify-commit.command = echo hi

It's "dangling" today, but if a later version of git learns about a
"verify-commit" hook we'll start running it unexpectedly.

Because your design conflates the slot for known hook type names and
user-supplied names.

On balance I think it's better just to always supply two lines per-hook,
but whether we have this proposed shorthand or not is mostly orthogonal
to everything I mentioned in
https://lore.kernel.org/git/87mtv8fww3.fsf@xxxxxxxxxxxxxxxxxxx/

I.e. my proposed version could also have it, but thinking about it I
think it's not worth it, we should always use <name>, not
<event-or-name> for the reasons noted if you'll read ahead...

> #4 - Skip a script configured more globally
> A:
> (original config)
> [hook "post-commit"]
> 	command = /corp/stuff/corp-post-commit
> (local config)
> [hookcmd "/corp/stuff/corp-post-commit"]
> 	skip = true
> OR,
> (original config)
> [hookcmd "corp-pc"]
> 	command = /corp/stuff/corp-post-commit
> [hook "post-commit"]
> 	command = corp-pc
> (local config)
> [hookcmd "corp-pc"]
> 	skip = true
> B:
> (original config)
> [hook "corp-pc"]
> 	event = post-commit
> 	command = /corp/stuff/corp-post-commit
> (local config)
> [hook "corp-pc"]
> 	skip = true

...which are among other things that no, my proposed version doesn't
have "skip" at all, see point #3 at
https://lore.kernel.org/git/87mtv8fww3.fsf@xxxxxxxxxxxxxxxxxxx/

I.e. I think the "skip" is a thing that falls out of a complexity of
your design that I'm proposing to do away with.

That complexity being that you use <event-or-name> and I use <name>, and
you want to in turn support N number of "*.command" for any
"hook.<event-or-name>".

I'm suggesting we use the typical "last key wins" semantics, if you want
multiple commands for a given hook type you'll just supply multiple
"hook.<name>" sections with the same "hook.*.event = type" in all.

The way to skip hooks in my proposal is:

    hook.<name>.command = true

Or whatever noop command you want on the RHS. In practice we wouldn't
invoke "true", just like we don't invoke "cat" for "PAGER=cat".

But unlike "*.skip" that doesn't require complexity in implementation or
user understanding, it just falls naturally out of the rest of the
model.

> #5 - Execute one script for multiple events
> A:
> [hookcmd "reusable-hook"]
> 	command = /long/path/reusable-hook
> [hook "post-commit"]
> 	command = reusable-hook
> [hook "pre-commit"]
> 	command = reusable-hook
> [hook "prepare-commit-msg"]
> 	command = reusable-hook
> B:
> [hook "reusable-hook"]
> 	command = /long/path/reusable-hook
> 	event = post-commit
> 	event = pre-commit
> 	event = prepare-commit-msg

It's been so long since I wrote the original E-Mail that I'm having to
skim my own summary there & come up with things as I go along, so maybe
this conflicts with something I said earlier.

But no, I don't think I meant that *.event should be multi-value, or
that that magic is worthwhile. I.e. I think we just want:

    [hook "not-a-reusable-section-1"]
        command = /long/path/reusable-hook
        event = post-commit
    [hook "not-a-reusable-section-2"]
        command = /long/path/reusable-hook
        event = pre-commit
    [hook "not-a-reusable-section-3"]
        command = /long/path/reusable-hook
        event = prepare-commit-msg

I.e. is wanting to use the same command for N different hooks really
that common a use-case to avoid a little verbosity, and quite a lot of
complexity?

How would such a hook even be implemented?

We don't have anything in your hook execution model (and I don't
remember you adding this) which passes down a "you are this type of
hook" to hooks.

It's now implicit in that the hook invoked at .git/hooks/post-commit or
.git/hooks/pre-commit can check its own basename, but it won't be with
configurable hooks.

We could start passing down a GIT_HOOK_TYPE in the environment or
whatever, but I think the simpler case of just having the user do it is
better.

I'm assuming that mainly because people have wanted a
"/long/path/reusable-hook" router script because we have not supported
executing N hooks, or supported concurrency. Once we do that the
complexity of such routing scripts (mostly, not everyone's needs will be
met) will be replaced by a little bit of config.

I don't see why it's worth it to micro-optimize for lines in that config
at the cost of increased config complexity.

For example, how do you "skip" an "event" type? You have it for
"*.command"? So let's say in your model (A) I have system config like:

    [hook "my-reusable"]
    command = /some/commit-msg-hook
    event = prepare-commit-msg

Now I want to skip that, easy enough, in my local config:

    [hook "my-reusable"]
    skip = true

Or if I wanted to also invoke it for commit-msg events, in my local
config (which piggy-backs on the system one):

    # Now runs for both prepare-commit-msg and commit-message
    [hook "my-reusable"]
    event = commit-msg

Or, if I want to skip the command name and substitute my own:

    [hook "my-reusable"]
    skip = true
    command = /my/commit-msg-hook

I'm just assuming you mean that to work, as discussed above I think all
this is leading us down the road of too much complexity.

But anyway, now I want to instead not run it as a prepare-commit-msg
hook, but a commit-msg hook.

As far as I can tell you haven't specified that, but I think something
consistent with your model would be

    [hook "my-reusable"]
    event = skip
    event = commit-msg

Or rather, because that conflates "skip" with a hook event type name:

    [hook "my-reusable"]
    skipEvent = true
    event = commit-msg

I.e. the "skip" you have now really means "skipCommand", so we'll need a
skipSomeName to skip other "hook.*.someName".

As noted above I think all of this is too complex, let's just do the
system config like this (exact same as the above):

    [hook "my-reusable"]
    command = /some/commit-msg-hook
    event = prepare-commit-msg

To skip it, in my local config:

    [hook "my-reusable"]
    command = true

And to re-use /some/commit-msg-hook without having to re-include it in
your local config the answer is that we don't support that level of
cleverness. Just do this:

    # First skip the system hook
    [hook "my-reusable"]
    command = true

    # Set up another hook, just copy/pasting the /some/commit-msg-hook command
    [hook "my-local"]
    command = /some/commit-msg-hook
    event = commit-msg

I.e. I think these cases of someone having say a /etc/gitconfig hook on
their system not under their control that they want to not run, *but*
not as the "event" there, but as another event type is too specific a
use-case for us to worry about.

> #6 - Execute the same script early for one event and late for another
> event
> A:
> (global)
> [hookcmd "reusable-hook"]
> 	command = /long/path/reusable-hook
> [hook "pre-commit"]
> 	command = reusable-hook
> (local)
> [hook "post-commit"]
> 	command = other
> 	command = hooks
> 	command = reusable-hook

Even with I think it's fair to say deep knowledge of your proposal at
this point I still needed to read this a few times to see if that:

    command = reusable-hook

Is referring to:

    [hookcmd "reusable-hook"]

I.e. is it going to run:

    command = /long/path/reusable-hook

Or is it just re-specifying /long/path/reusable-hook but relying on a
$PATH lookup?

Having reasoned through that I think the answer is the former. But that
also means that in your model:

    [hookcmd "rm -rf /"]
    command = echo this will not actually destroy your data
    [hook "pre-commit"]
    command = rm -rf /

Is going to run that friendly "echo" command, since "command = rm -rf /"
just refers to the "rm -rf /" <name>, not <command>, right the "hookcmd"
line is removed, at which point we'll stop treating it as a <name> and
run it as a <command>?

In practice I think users are unlikely to use actively harmful names
like that.

I'm just making the point that I should not need to know about previous
config to see if a "hook.pre-commit.command = rm -rf /" is harmless or
not, or need to carefully squint to see if the "reusable-hook" is
referring to a section name or command name.

Or am I just confused at this point?

> B:
> (global)
> [hook "reusable-hook"]
> 	command = /long/path/reusable-hook
> 	event = pre-commit
> (local)
> [hook "other"]
> 	event = post-commit
> 	command = other
> [hook "hooks"]
> 	event = post-commit
> 	command = hooks
> [hook "reusable-hook"]
> 	event = reusable-hook

No, that's not what I'm proposing. I.e. "*.event" is reserved for
built-in events listed in githooks(5) that git itself knows about.

If you say "event = reusable-hook" that's going to be one of ignored,
warned or errored about.

(Probably "ignored" is the best thing to support multi-version setups,
but that's not hook-specific, just how we treat unknown config in
general)

My version of this would be the same as noted with "we don't support
that level of cleverness[...]" above.

I.e. you'd not re-use that "/long/path/reusable-hook", you can skip it
through "command = true", then just copy the relevnt part into your new
config. So:

        (global)
	[hook "reusable-hook"]
		command = /long/path/reusable-hook
		event = pre-commit

	(local)
	[hook "reusable-hook"]
		command = true # skip it

        # The "hooks" name is arbitrary, "my-hooks" or whatever would be
        # clearer, but just going with your example...

	[hook "hooks"]
		event = post-commit
		command = hooks
        # Not very reusable then...   
	[hook "reusable-hook"]
                command = /long/path/reusable-hook
		event = pre-commit

> [...]Please feel free to chime in with more use cases that you think would
> be handy which I forgot :)

I couldn't find this at a quick glance but I think we also had a
disussion at some point about hooks controlling parallelism. AFAICT your
current implementation just has global:

    hook.jobs=N

And we then blacklist certain hooks to always have hook.jobs=1, e.g. the
commit-msg hook that needs an implicit "lock" on that file (or rather,
we think that's the most common use-case).

I think my version of always having hook.<name>.{event,command} be one
value is also better in that case, i.e. we'd then:

    [hook "myhook"]
    command = some-command
    event = pre-receive
    parallel = true # the default

    [hook "myhook2"]
    command = some-command2
    event = pre-receive
    parallel = true # the default

    [hook "myhook3"]
    command = some-unconcurrent-command
    event = pre-receive
    parallel = false # I'm not OK with concurrency

To go along with something like:

    hook
	jobs = 8

Or:

    [hookEvent "pre-receive"]
    jobs = 4

But if you have N numer of "command" in a section it gets murky, does
"parallel = false" then apply to the whole section, but sections can
have one or more values. So we'd need both a
"hookEvent.pre-receive.jobs=N" and per-section config to
control/suppress parallelism?

I haven't thought about it deeply, but have a hunch that having sections
be a 1=1 mapping with a specific command instead of either 1=1 or 1=many
is going to make that easier to implement and understand.