Re: [PATCH] hooks: propose repository owner configured hooks

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 19 Mar 2021 11:27:29 +0100

On Thu, Mar 18 2021, Albert Cui via GitGitGadget wrote:

> We propose adding native Git functionality to allow project maintainers to
> specify hooks that a user ought to install and utilize in their development
> workflows. This patch documents the requirements we propose for this feature
> as well as a design sketch for implementation.

I like this goal. It's something I proposed (off-list) before and did a
small write-up of here:
https://lore.kernel.org/git/87zi6eakkt.fsf@xxxxxxxxxxxxxxxxxxx/

As far as I recall the response in the room at the time was the expected
combination of "sure that would be neat" and "let's see the
patches". I.e. I don't think there's hard objections to the existence of
such a facility, but of course the devel is in the details, and
obviously I never followed-up with actually trying to implement it.

With config-based hooks this'll be much easier for the hook case, and
I've tried to help that along[A]. I'd be interested in reviewing any
effort in this area as well. The ".githooks/*" case in that proposal
goes away with config-based hooks (since they wouldn't be special
anymore, just config).

> +Example workflow
> +^^^^^^^^^^^^^^^^
> +
> +* Perform fetch as normal
> +
> +* After fetch is complete, Git checks for a "magic" config branch (e.g.
> ++origin/refs/recommended-config+) which contains information about config lines
> +an end-user may want (including hooks).

Why collapse the many-to-many branch/config relationship to many-one
this way instead of having a .gitconfig at the top-level? Then you can
seamlessly have per-branch config, and it'll work if you later locally
clone the repo or just transport that branch (e.g. via bundle).

But reading ahead...

> +* As part of the fetch subcommand, Git prompts users to install the configs
> +contained there.
> +
> +    ** User responses to that prompt could be "sticky" - e.g. a user could reply
> +    "no (this time)", "no (never)", "yes (this time)", or "yes (always)".
> +    Always/never indicate that the user trusts the remote this config is coming
> +    from, and should not apply to configs fetched from other remotes.

As noted in the proposal I linked I think anyone thinking about this
would do well to examine the trade-off Emacs's implementation of this
uses, since it manages to safely open arbitrary files that'll
potentially run arbitrary code on-open, but only if the user opts-in.

> +Later, we might want to do this before the initial clone is performed; that
> +workflow looks like:
> +
> +* During clone, perform ls-refs as normal
> +
> +* If the server has a "magic" config branch, fetch only that config branch.

...the reason for the magic branch is this before-clone use-case?

Unless there's a really strong use-case for that I'd think the
per-branch .gitconfig would be a better trade-off, but even then there
would be ways to make that work obviously in the many-many case, and
still e.g. have a branch advertise a config blob for its "main" branch
as a special case or something.

I also suspect an unstated constraint of having this in a magic branch
is the limitation of some git hosting provider's ACL
implementations. More on that later...

> +* Prompt users as described above.
> +
> +* Perform the rest of the clone.
> [...]
> +* Repository owners have a method for providing recommended config for
> +contributors.
> +
> +* Installation flow happens without additional user intervention.
> +
> +* Keeping config branch and history separate from code branch and history means
> +it is versioned, but not tied to user's checkout.
> +
> +* Letting users specify "always" or "never" reduces amount of pain introduced by
> +interactive "configuration wizard" flow.
> +
> +Cons
> +^^^^
> +
> +* Requires addition of step to fetch (and later clone).
> +
> +* Turning a "set and forget" command like clone into an interactive session with
> +the user is not ideal; care must be taken to avoid breaking bots.
> +
> +* Inflating configs and executables from a remote tracking branch which is never
> +checked out could be slow.
> +
> +Future Work
> +~~~~~~~~~~~
> +
> +* Extending this to allow repository owners to specify specific configurations
> +in general e.g. this repository should use partial-clone with these parameters.

I don't see why such a proposal should narrowly confine itself to hooks.

Once we have config-based hooks then hooks are just configuration.

If we're going to pick up arbitrary configuration from remotes/repos on
request then we'd be starting with the most dangerous configuration.

I think a better way to start such an effort incrementally would be:

1. Audit git's config parser for the safety of parsing arbitrary config,
   we parse .gitmodules now, do we feel it's safe enough to parse
   arbitrary config (probably, but worth checking).

2. Add reflection to git's own config variables. Right now we have this
   in the binary generated via a grep from the documentation. Similar to
   Emacs's implementation we should/could categorize all our known
   config variables by safety.

   Hooks are the least safe, something like a diff.context=N setting the
   repo wants to suggest a -U<n> setting much safer (just tweaking our
   existing diff algorithm) etc. But even in those cases we'd want to
   proceed slowly, e.g. is that config parsing for that -U<n> defensive
   enough to be safe for arbitrary data?

3. Users should be able to e.g. configure "yes, for any repo I clone
   they can tweak 'safe'" variables. This would reduce user dialog
   fatigue, and thus increase safety. I.e. a repo who wants to set a
   thing like hooks would stand out, but something that e.g. wants to
   change the diff order would pass on existing global configuration.

4. Smarter ACL that magic remote+magic branch: It seems like an obvious
   thing to me to want that if I clone e.g. a random clone of git.git
   that I'd want to setup config for it IFF the .gitconfig in it is
   reachable from a tag GPG signed by <approved key>.

   Git's a distributed system, so while I don't think it's bad to have
   some feature like "I always trust config from this remote" (e.g. a
   corporate environment where you know its .gitconfig is
   guarded/audited) we should think about more distributed-friendly
   solutions first and if possible guide users towards those.

A. https://lore.kernel.org/git/87mtv8fww3.fsf@xxxxxxxxxxxxxxxxxxx/