[RFC] Hook management via 'git hooks' command

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Fri, 15 Nov 2019 17:11:25 -0800

Hi all.

The topic of multihook support has come up before; here are a handful of
conversations I could find on it:

 brian m. carlson's proposal for multiple hooks via .git/hooks/pre-commit.d/:
   https://public-inbox.org/git/20190424004948.728326-1-sandals@xxxxxxxxxxxxxxxxxxxx/

 Jonathan Nieder's discussion of hooks as an attack vector:
   https://public-inbox.org/git/20171002234517.GV19555@xxxxxxxxxxxxxxxxxxxxxxxxx/

 Ævar Arnfjörð Bjarmason's RFC for multihooks in a similar manner to brian's:
   https://public-inbox.org/git/CACBZZX6j6q2DUN_Z-Pnent1u714dVNPFBrL_PiEQyLmCzLUVxg@xxxxxxxxxxxxxx/

As I consider these conversations, I think about what goals we are aiming
for:

 1. Let a user run more than one hook on a single event.
 2. Let a user determine the order of those hooks.
 3. Let a user version-control their hooks without kludging. (This is actually
    largely solved by core.hookName, thanks to Ævar.)
 4. Make it hard for a user to run a hook they did not expect to run.
 5. Don't break users who currently have hooks in $REPO/.git/hooks, but provide
    a path for them to stop using hooks from that directory.

It seems that brian's thread stalled out in v2, but as I understand it, the
method was:

 - For each codepath which executes a hook:
   - Check whether .git/hooks/${hookname} is valid (exists, executable)
   - If not, check whether .git/hooks/${hookname}.d exists, and run each script
     within, in alphabetical order.

This approach seems similar to one that Ævar suggested in their RFC linked
above.

Benefits of that approach include:
 - A familiar Unixlike interface
 - Previously configured hooks will Just Work

Drawbacks of that approach include:
 - Still an attack vector, unless core.hooksDirectory is set
 - Still need to configure a hook for each repo, unless core.hooksDirectory is
   set
 - Hooks are still not version controlled, unless core.hooksDirectory is set

This meets goals 1, 2, and 5. Using core.hookPath makes this approach meet 3 as
well.

An alternate suggestion from Jonathan Nieder seems to look something like:

 - Learn a core.${hookname} config.
 - Teach all hook executing codepaths to check core.${hookname}
 - Either add multiple core.${hookname} entries and order ...TBD? or put lots of
   hooks on a single config entry and run in order on that line.

Benefits of the config approach include:
 - Easy to configure global, repo-wide, worktree-wide hooks
 - Long-term, the decision of which hooks to run is left entirely to the user
 - The hooks themselves can be version controlled, even distributed with the
   repo (a user could specify ${repopath}/hooks/pre-push-lint if they wanted,
   and we chose to allow it)

Drawbacks of the config approach include:
 - Different enough from the current approach to be unusual. Coming away from
   the transition period will require some thought.
 - The ordering of hooks may be confusing.
 - It may be difficult to choose _not_ to run a global hook from a local
   project.

This meets goals 3, 4, and 5; it could meet 1 and 2 but needs to be fleshed out
more.

Meanwhile, I have some other concerns when I read through the source:

 - I don't think I saw a codified list of all the possible hooks. Meaning,
   find_hook() is called with a freeform string, not from some enum, and it's
   hard to programmatically list all places where a hook could be used. This
   came up when I was looking at bugreport; even the template
   ${hookname}.sample files prepopulated in .git/hooks aren't a comprehensive
   list of all hooks mentioned in `git help hooks`. This feels like a lack of
   assurance that all hooks are well-documented.

Please do let me know if I missed something in my attempt to frame the "story so
far".

Here's my suggestion.

 - Let configs handle the hook list and ordering, via 'hook.{hookname}' which
   can be specified more than once.
   - global config: hook.update = /path/to/firsthook
     user config: hook.update = /path/to/secondhook
     worktree config: hook.update = -/path/to/firsthook #eliminate the firsthook
      call
   - global config: hook.update = /path/to/firsthook
     repo config: hook.update = /path/to/secondhook
     repo config: hook.update = ^/path/to/firsthook #move firsthook execution
       after secondhook for this project
 - Let users decide whether they want to check core.hookDir in addition to their
   config, instead of their config, or not at all, via a config
   'hook.runHookDir' "hookdir-first", "hookdir-only", "config-only", etc.
 - Let users ask to be notified if running a hook from .git/hooks via a config
   'hook.warnHookDir'. (Mentioning .git/hooks rather than core.hookDir is
   intentional here.) Alternatively, ask for 'hook.warn' with values "hookdir",
   "all" depending on how trusting the user feels.
   - If we want to phase out .git/hooks, we can make this notification opt-out
     instead of opt-in, someday.
   - between runHookDir and warnHookDir, users are able to phase out
     .git/hooks scripts at their own pace.
 - Abstract most of this via a 'git hooks' command.
   - 'git hooks add <hookname> [--<scope>] <path>' to append another hook;
     let the scope setting match 'git config'.
   - 'git hooks show [<hookname>]' to indicate which hooks will be run, either
     on a specific event or for all events, and in which order, as well as each
     hook's config origin
   - 'git hooks edit <hookname>' to modify the hook order, or choose not to run
     a hook locally
   - By managing it through a command, we can reorder the contents of config
     files if it makes sense to do so.
 - Add a hook library.
   - Optionally specify all hook events via enum, so we aren't string-typing
     calls to find_hook() anymore.
   - Resolve configs into a list of hooks to run by concatenating them together
     in config precedence order.
     - Also allow configs formatted like "-<path>" to remove an
       earlier-configured invocation of that hook, or "^<path>" to move the
       earlier-configured invocation to this point in the execution order
   - Move the find_hook() implementation to here, to account for the multihook
     ordering

I think it reaches the goals I mentioned:

 1. User can specify more than one hook per event by adding more configs or
    running 'git hooks add <event>'.
 2. User can determine the order by reordering their configs in file, using
    "^/hookpath" notation, or using 'git hooks edit'.
 3. User can point to a hook script which is checked into the repo they want it
    to run on, another repo in their filesystem, or anywhere at all. Contrary to
    the behavior of core.hookDir, users can manage each hook in a different repo
    if they wish (maybe the git-secrets hooks all live in the git-secrets repo,
    and the Gerrit Change-Id hook lives in the gerrit repo, and each project's
    pre-commit formatting hooks live in their own projects' repos).
 4. Users who have "bought in" to the config-based hook management can ask to be
    warned if a hook is run from .git/hooks, and can turn off running hooks from
    .git/hooks entirely if they wish.
 5. By defaulting to hook.runHookDir=hookdir-first or hookdir-only, we can avoid
    breaking users who are using .git/hooks/ today. The former gives value-add -
    the user can start using config hooks seamlessly - while the latter gives
    old functionality only.

The other benefits I can think of:
 - Having a friendly porcelain command to manage hooks makes life easier for the
   Git and Unix uninitiated.
 - If 'git hooks edit' UI looks like 'git rebase --interactive' UI, then users
   may already feel at home using it.
 - If organizations are already distributing global configs, they can also
   distribute hooks such as git-secrets or other internal hooks if they wish.
 - Users who want to use a hook that comes with a repo may still be subject to
   attack if the hook shipped by the repo changes under their feet (although
   they have explicitly asked Git to run that hook, so the threat doesn't feel
   different from running a shell script from a repo manually).

The drawbacks I can think of:
 - It is a much more complicated approach than adding .git/hooks/${hookname}.d.
 - The config syntax can clutter the output if users want to examine their
   config with 'git config --list'.
 - It is still possible for someone to attack via a zip file if they manipulate
   .git/config to use a hook they specified in the repo.

The issue with .git/config is still a pretty big one, but my own thought is that
we should treat it as a separate piece. Yes, those kinds of attacks are still
possible as long as either .git/config or .git/hooks are present in the repo
directory. But coding around .git/config will lead to some cruft that won't make
sense once repo-local config files are made more secure (for example, one
solution might be to ignore local and worktree configs when evaluating the
hooks list, which would be inconvenient and no longer make sense if those config
scopes are secured).

I look forward to the discussion. Thanks for your thoughts.
 - Emily