Re: [PATCH v3 0/6] config-based hooks restarted

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Thu, 2 Sep 2021 15:01:56 -0700

On Tue, Aug 24, 2021 at 10:29:33PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Wed, Aug 18 2021, Emily Shaffer wrote:
> 0)
> 
> I think this is in much much better shape vis-as-vis the simplified
> config schema that's now being proposed re our discussion starting
> around https://lore.kernel.org/git/87bl6ttmlv.fsf@xxxxxxxxxxxxxxxxxxx/
> 
> I.e. the main complexity of the "skip" mechanism is gone, and also the
> conflation of hook names with hook commands (the "rm -rf /" as a
> <name-or-cmd> discussed in the above).
> 
> So before going any further, I'll just say that I wouldn't object much
> to this design going in as-is. What I'm about to mention here below is
> much closer to bikeshedding in my mind than "this is really to complex
> to go in-tree", which was my opinion of the config schema before.
> 
> 1)
> 
> On the current config design: First, before going into any detail on
> that, I think whatever anyone's opinion is on that that the
> design-focused patches as they stand could really use more a more
> extended discussion of the design.
> 
> I.e. talk about the previously considered config schema, why it evolved
> into its current form. The trade-offs involved, and why the patch
> proposed to implement the schema it's implementing over another earlier
> or alternate design.
> 
> I.e. https://lore.kernel.org/git/20210819033450.3382652-6-emilyshaffer@xxxxxxxxxx
> is two very short paragraphs. We won't be able to summarize all our
> month-long discussion on the config design in one commit message, but I
> think at least discussing it somewhat / linking to relevant on-list
> discussions would make future source spelunking easier.

Hum. I don't think that it's necessary to summarize the whole discussion
in the commit message, but I think it's worth it to describe the
rationale ("we like this design because it makes x good practice easy
and y bug hard") and link out to the discussion for parties who are
interested in reading more.

> 
> 2)
> 
> So that out of the way, a comment on the current config design, which
> should be read in the context of what I noted in #0. I.e. I'm *much*
> happier with this version.
> 
> That being said I'm still not convinced that the simple 1=1 mapping of
> "hook.<name>.command" and its "value" should be followed by the 1=many
> mapping of "hook.<name>.event" and its "value".
> 
> I.e. we're making the trade-off of saving the user from typing out or
> specifying:
>     
>     [hook "my-pre-commit"]
>     command = ~/hooks/pre-commit-or-push
>     event = pre-commit
>     [hook "my-pre-push"]
>     command = ~/hooks/pre-commit-or-push
>     event = pre-push
> 
> And instead being able to do:
> 
>     [hook "my-pre-commit-or-push"]
>     command = ~/hooks/pre-commit-or-push
>     event = pre-commit
>     event = pre-push
> 
> So for the very common case, saving two config lines. "Two" because as
> we discussed[1] as there's currently no GIT_HOOK_TYPE env var. So this
> form will work pretty much only for that case.
> 
> I.e. unlike with .git/hook/<name> the hook run via config can't
> determine what <hook-type> it's being run at, so as it stands this is
> only useful for those hooks listed in githooks(5) where someone would
> want to do the exact same thing for one or more <hook-name> names. You
> can't use it as a general routing mechanism for any hook type as it
> stands.
> 
> I *think* that's only these two, perhaps "update" and "pre-receive",
> with the hook seeing if it consume stdin/has arguments to disambiguate
> the two.
> 
> But even with a GIT_HOOK_TYPE passed the trade-off, as discussed in [1],
> and downthread in [2], is that by having it 1=many we're closing the
> door on any future hook.<name>.<whatever>. I.e. config that would change
> the behavior of that hook, but you'd want to change it in another way
> for at least one of the N event types.

I'm not really sure that's the case, to be honest. Even with the config
scheme as is in this iteration, you can still define it the way you're
describing with no problem, and in fact it makes it easier for users to
apply some special config to all-but-one invocation.

Let's say for example the "git-secrets" hook, which we do have defined
for at least 3 different events at Google today, and a hypothetical
"parallelize me" config; for the sake of argument let's presume that
this is the far future and we've added a GIT_HOOK_TYPE envvar:

[hook "git-secrets"]
  command = /bin/git-secrets
  event = pre-commit
  event = pre-merge-commit
  parallelizable = true

[hook "git-secrets-mutexed"]
  command = /bin/git-secrets
  event = prepare-commit-msg
  event = commit-msg
  parallelizable = false

If we don't allow "hook.myhookname.event" to be multiply configurable,
then the user gets this really tedious task of defining every single
"hook.git-secrets-$hookevent.parallelizable" config.

> 
> Well, "closing the door" as in if you'd want that you'd have to split up
> the section from the "my-pre-commit-or-push" example above to the
> "my-pre-commit" and "my-pre-push" example.
> 
> But again, on the "is the complexity worth it" we're then having to
> explain to users that they can do it one way if the want no config other
> than hook.<name>.{command,event*}, but another if they have another key
> in that namespace.

I am not so worried about "this will be hard to explain, so we should
not do it" - I think we can make the documentation useful with enough
effort and expertise. (And yes, I'm feeling optimistic because I have an
actual technical writer taking a look at the manpage right now.)

> 
> You've said that you wanted to add something like a GIT_HOOK_TYPE
> environment variable. Fair enough, and I guess we could add it in a
> re-roll of this series. I'm mainly commenting on the end-state of *this*
> series in particular. I.e. I think it leaves the user & implementation
> with a config schema that still seems to be needlessly complex for the
> very limited benefits that complexity brings us in what you're able to
> do with it now.

I think that limiting ourselves in the way you're describing will make
it more difficult to bring additional benefits later. It is certainly
possible and valid to write your configuration the way you are
describing, without changing this schema.

> 
> But some of that goes back to the comments I had on 5/6[3], i.e. I'm
> willing to be convinced, but I think that the current commit message &
> added docs aren't really selling the idea of why it's worth it.

Ok. I think your point is not "the schema is still wrong" as much as it
is "the documentation could be much better", and I agree.

> 
> 3) 
> 
> As an extension to my comments on 5/6[3], I think this whole notion of
> "git hook run <name>" as invoked by a user of git is just more confusing
> the more I think about it.
> 
> I.e. 5/6[4] is apparently seeking to implement a way to just make that
> facility a general way for users to run some command on their system to
> do whatever, instead of say using /usr/bin/parallel or a shell alias.
> 
> But then we also use that command as our own dispatch mechanism for our
> own known hooks, except mostly not, since we mostly use the C API
> ourselves directly.
> 
> It's particularly confusing that if you say run "git hook run
> pre-auto-gc" as a user to test your hook you'll have that hook run in
> the same way that git-gc(1) would run it. So someone developing a hook
> might think they can use "git hook run" for testing it.
> 
> But if you do the same with say "git hook run pre-receive" or anything
> else that feeds arguments or stdin (e.g. "update", or "pre-receive"),
> you'll have your hook happily being run by git, but in a way that's not
> at all how such a hook will be run when it's run by git "for real".
> 
> So I wonder if we shouldn't just have the thing die() if you try to run
> any hook that's in githooks(5) itself, except for sendemail-validate and
> the p4 hooks, since we need to run those ourselves.

I think this again falls to a documentation issue. I would love to have
a tighter loop when developing a hook that takes weird mid-merge
arguments or whatever; making it easier to test the hook in isolation
with known inputs sounds like a good thing to me.

Personally, "I ran 'git hook run pre-receive' without being in the
middle of a receive operation and it didn't behave like it was in the
middle of a receive operation!" doesn't sound all that surprising to me.

> 
> Or have those use an internal-only "git hook--helper", and start out
> with "git hook" just supporting "git hook list", and then later on have
> "git hook run" (or perhaps "git hook run-configured"?) be an entry point
> for this facility of running some arbitrary script that's not a "real"
> hook.
> 
> I don't know, maybe I'm the only one that finds this confusing...
> 
> 1. https://lore.kernel.org/git/87bl6ttmlv.fsf@xxxxxxxxxxxxxxxxxxx/
> 2. https://lore.kernel.org/git/877dh0n1b3.fsf@xxxxxxxxxxxxxxxxxxx/
> 3. https://lore.kernel.org/git/87lf4qeh86.fsf@xxxxxxxxxxxxxxxxxxx/
> 4. https://lore.kernel.org/git/20210819033450.3382652-6-emilyshaffer@xxxxxxxxxx/

Besides "make the docs more obvious", I don't think there is anything in
this mail that I want to act on. I am very happy with the config schema
as it is, as well as the behavior of 'git hook run'.

As I mentioned here and somewhere else in the review, I am in the
process of getting feedback on the manpage from a tech writer this week,
so do not expect a reroll from me until at least next week. I saw your
reroll today and I'll try and look at it (or at least the interdiff)
tomorrow or Monday.

 - Emily