Re: [PATCH v4 1/9] doc: propose hooks managed by the config

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 29 Oct 2020 21:04:23 +0100

On Thu, Oct 29 2020, Emily Shaffer wrote:

> On Fri, Oct 23, 2020 at 09:10:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> You already use "hookdir" for something else though, so that's a bit
>> >> confusing, perhaps s/hookcmd/definehookcmd/ would be less confusing, or
>> >> perhaps more confusing...
>> >
>> > "Hookdir" might be the wrong word to use, too - maybe it's better to
>> > mirror "hookspath" there. Eitherway, "hookdir" and "hookspath" are
>> > similar enough that I think it would be confusing, and "hookcmd" is
>> > already getting some side-eye from me for not being a great choice.
>> >
>> > Some thoughts for "a path to a directory in which multiple scripts for a
>> > single hook live":
>> >  - hookset
>> >  - hookbatch (ugh, redundant with MS scripting)
>> >  - hook.pre-commit.all-of = ~/last-minute-checks/
>> >  -  "   "  .everything-in = "   "
>> > ...?
>> >
>> > I think I named a couple silly ideas for "hookcmd" in another mail.
>> 
>> To both of the above: Yeah I'm not saying you need to do the work, just
>> that I think it would be a useful case to bikeshed now since it seems
>> inevitable that we'll get a "find hooks in this dir by glob" once we
>> have this facility. So having a config syntax for that which isn't
>> overly confusing / extensible to that case would be useful, i.e. as the
>> current syntax uses "dir" already.
>
> Yeah. I'm not sure that it needs to happen right away. Because
> hook.*.command // hookcommand.*.command gets passed right into
> run_command()-with-shell, it's possible for a user who's keen to also
> set `hook.*.command = find -type f /some/path | xargs` in the meantime.
> And also because it's passed right into run_command()-with-shell, it's
> hard to do some smart wildcarding on the .command config and try to
> figure out the right syntax. I'd just as soon see something explicit
> like the configs I mentioned above, which can be added pretty easily
> after the fact. I think what you're mostly saying, though, is "Leave
> some words for glob execution!" and that I can appreciate.

Yeah, or rather, just now in config key naming think about if the key
naming makes sense if it's expanded to support such glob inclusion,
which seems like a desired addition. But I won't belabor that point.

Just one thing to add: We don't really need to come up with a syntax &
semantics for glob inclusion special to this, we'd use the sort of glob
patterns "Conditional includes" use, as documented in  git-config(1).

>> > Hum. This seems to say "folks who started their hooks with the same
>> > number agree that their hooks should also run simultaneously" - which
>> > sounds like an even harder problem than "how do I know my ordering
>> > number isn't the same as someone else's in another config file". Or else
>> > I'm misunderstanding your pseudo :)
>> 
>> The prefix number isn't meaningful in that way, i.e. if you have 10
>> threads and 5 hooks starting with 250-* they won't all be invoked at the
>> same time.
>
> Ok. I misunderstood, then.
>
>> > I know I rambled a lot - I was trying to convince myself :) For now, I'd
>> > prefer to add more detail to the "future work" section of the doc and
>> > then not touch this problem with a very long pole... ;) Thoughts
>> > welcome.
>> 
>> I'm replying to much of the above in general here, particularly since
>> much of it was in the form of a question you answered yourself later :)
>> 
>> Yes as you point out the reason I'm raising the parallel thing now is
>> "keep users from assuming serial execution", i.e. any implementation
>> that isn't like that from day 1 will need more verbose syntax to opt-in
>> to that.
>> 
>> I think parallel is the sane default, although there's a really strong
>> case as you point out with the "commit-msg" hook for treating that on a
>> hook-type basis. E.g. commit-msg (in-place editing of as single file)
>> being non-parallel by default, but e.g. post-commit, pre-applypatch,
>> pre-receive and other "should we proceed?" hooks being parallel.
>
> Yeah. I think you've sold me. So what I will do is thus: before I send
> the next reroll (as I'm pretty much done, locally, and hope to be ready
> for nits next time) I'll take a look in 'git help githooks' and see
> which ones expect writes to occur. I think there are more than just
> "commit-msg". I'll add a bit to run_hooks() and a corresponding flag to
> 'git hook run', plus relevant documentation. I'll also plan to add
> explicit documentation to 'git help githooks' mentioning parallel vs.
> serial execution.

Sounds good.

> But I will plan on writing it stupidly - user configurable job number
> but no dependency checking; and let the user turn off parallel execution
> for everyone (hook.jobs=1) or for just one hook
> (hook.pre-commit.parallel = false (?)). Like you and Jonathan N say, we
> can add more sugar like hookcmd.*.depends later on when we need it.

Yeah, that sounds great. As long as there's parallelism that stuff can
always be tweaked later.

>> 
>> But I'm also raising a general concern with the design of the API /
>> command around this.
>> 
>> I don't see the need for having a git hook list/edit/add command at
>> all. We should just keep this simpler and be able to point to "git
>> config --add/--get-regexp" etc.
>> 
>> It seems the reason to introduce this command API around it is because
>> you're imagining that git needs to manage hooks whose relative execution
>> order is important, and to later on once this lands aim to implement a
>> much more complex dependency management schema.
>
> No, I don't think that's the reason to have list/edit/add. The reason is
> more for discoverability (if I 'git help git' or 'git^TAB', do I see
> something handy in the command list that I didn't know about before?)
> and user friendliness ("I can't remember the right config options to set
> this up every dang time"). And 'list', I think, is handy for giving
> users a dry run of what they can expect to see happen (and where to fix
> them, since it lists the origin). Yes, a user could put it all together
> from invocations of 'git config', but I personally think it's more
> useful for Git to tell me what Git is going to do/what Git wants than
> for my meat brain to try and guess :)

Okey, that makes sense & I've got nothing against that, just clarifying
since it *looked* like it was the first step in some future addition of
complexity around this.

It would be nice if the docs for the new command were modified to state
that clearly, even to the point of saying "this is really just sugar for
this similar git-config invocation".

>> 
>> I just can't imagine a case that needs that where say those 10 hooks
>> need to execute in exact order 1/2/3/4 where the author of that tight
>> coupling wouldn't also desire to roll that all into one script, or at
>> least that it's an obscure enough case that we can just say "do that".
>> 
>> Whereas I do think "run a bunch of independent checks, if all pass
>> proceed" is *the* common case, e.g. adding a bunch of pre-receive
>> hooks. If we tell the user we'll treat those as independent programs we
>> can run them in parallel. The vast majority of users will benefit from
>> the default faster execution.
>> 
>> The "glob order" case I mentioned is extra complexity on top of that,
>> yes, but I think that concession is sane for the common case of "yes
>> parallel, but I want to always run the always-exit-0 log
>> hook". E.g. I've used this to setup a hook to run push
>> attempts/successes in a hook framework that runs N pre-receive hooks.
>
> Reading this, I think I'm still missing something key about what you
> think glob ordering provides. 

For context, I feel strongly that we should do parallel by default for
implementing something like this, it's great that per the above
discussion you're open to that.

This "glob ordering" is an entirely separate idea I'm not strongly
advocating, there's pros & cons of doing that v.s. config ordering.

 * Con: less obvious than config order, you write hooks "a c b" in the
   config and we execute in "a b c" order.

 * Pro: Sidesteps the issues you noted in "Execution ordering" in the
   docs you're adding, i.e. now it'll be impossible to execute a
   repo-local hook before a system-wide one, you can override that with
   having a local one called "000-something".

   I.e. now we'd read the config in the normal config order, and thus if
   there's a system hook there's no way to define a local hook to run
   first, until we get some sort of override for that.

> I'm not following why having the log hook set early requires glob
> ordering over config ordering (since the config ordering schema allows
> reordering via replacement)
> [...]
>  and I'm not following why it's required to halt on failure.

I realize I didn't elaborate on this, there's some past discussion[1][2]
about this. 

I.e. when running N hooks sometimes you'd want to run them all (e.g. to
send notifications), but for others such as pre-receive.d guard checks
you don't have to run all N, if one check (say one checks commit format
validity, another code syntax) fails you'd like to abort early.

So halting on failure is just saving CPU, you might have 10 hooks that
each take 1 second, no point in making the user wait on all 10 checks
for 10 seconds if a failure of any fails the push.

But OTOH you have other use-cases where users want to run them all
(talked about in the [1][2] discussion above), so it's been anticipated
as something we'd grow config for with multi-hook support.

The glob ordering allows common cases for things that aren't possible
with config-order with such early abort.

E.g. consider a server with some common system-wide pre-receive.d hook
(e.g. author e-mail envelope check), and a SOX/PCI controlled repository
where some compliance thing says all push attempts must be logged.

You could then do:

    /etc/git/hooks/pre-receive.d/email-check
    /path/to/repo/hooks/pre-receive.d/000-log-push-attempt-to-db
    /path/to/repo/hooks/pre-receive.d/some-other-check

And we'd always run the 000-* hook first, whereas in the current schema
you can't do that without editing the system-wide config.

>> 
>> All that being said I'm open to being convinced, I just don't see what
>> the target user is, and the submitted docs don't really make a case for
>> it. I.e. there's plenty of "what" not "why would someone want this...".
>
> ACK. I'll try and go over the doc again before I reroll.
>
>  - Emily

1. https://lore.kernel.org/git/87wojjsv9p.fsf@xxxxxxxxxxxxxxxxxxx/
2. https://public-inbox.org/git/CACBZZX6j6q2DUN_Z-Pnent1u714dVNPFBrL_PiEQyLmCzLUVxg@xxxxxxxxxxxxxx/