Re: [PATCH 00/27] Base for "config-based-hooks"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 17, 2021 at 12:22:34PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> This v3 of the "Base for 'config-based-hooks'" topic is >95% a
> slimmed-down versio nof Emily Schaffer's work to introduce a mechanism
(Tiniest of nits - I only have a 'c' in my middle name ;) )
> to drive hooks via config.
> 
> This topic doesn't do that, but moves all hook execution (C libary and
> Perl|Python script) to either the new hook.[ch] library code, or the
> "git hook run" utility.
> 
> See previous iterations for more details:
> 
>  v0 (Emily's): http://lore.kernel.org/git/20210527000856.695702-1-emilyshaffer@xxxxxxxxxx
>  v1: https://lore.kernel.org/git/cover-00.31-00000000000-20210528T110515Z-avarab@xxxxxxxxx/
>  v2: http://lore.kernel.org/git/cover-00.30-00000000000-20210614T101920Z-avarab@xxxxxxxxx
> 
> This series gained two new dependencies since v2, my just-submitted
> preparatory topics:
> 
>     https://lore.kernel.org/git/cover-0.3-0000000000-20210617T095827Z-avarab@xxxxxxxxx/
>     https://lore.kernel.org/git/cover-0.3-0000000000-20210617T100239Z-avarab@xxxxxxxxx/
> 
> And hopefully addreses all the feedback on v2, mainly/entirely from
> Emily at https://lore.kernel.org/git/YMfLO9CT+iIDR3OA@xxxxxxxxxx
> 
> I'd normally have waited longer for a v3, but as discussed in the
> small hook.[ch] topic that precedes this one building the new
> hook-list.h had an error on Windows CI due to a missing CMake change,
> that's now fixed.
> 
> Other changes can be seen in the range-diff, all rather trivial fixes
> like comment fixes, trimming off "argc/argv" before we pass things to
> "git hook run", removing a redundant test setup etc.
> 
> I also added a GIT_TEST_FAKE_HOOKS=true for use in the test suite to
> make us support a "test-hook" and a "does-not-exist" hook, we'd
> previously accept those outside the test environment.

I initially wasn't sure this is necessary, and I'm still not entirely
sure - it makes more sense to me to see test-hook and does-not-exist use
essentially
https://lore.kernel.org/git/20210715232603.3415111-8-emilyshaffer@xxxxxxxxxx.

Would it make sense for you to absorb that much - "when we're 'git hook
run'ing, don't validate hookname" - into your series to avoid this
test-fake-hooks thing, and also teach the test-helper to use (or not
use) the allow_unknown flag?

[...]

> Range-diff:
>  1:  447d349c73 !  1:  cf4b06bfdf hook: add 'run' subcommand
>     @@ Documentation/git-hook.txt (new)
>     @@ Documentation/githooks.txt: and "0" meaning they were not.
>     -@@ Makefile: LIB_OBJS += hash-lookup.o
>     - LIB_OBJS += hashmap.o
>     - LIB_OBJS += help.o
>     - LIB_OBJS += hex.o
>     -+LIB_OBJS += hook.o
>     - LIB_OBJS += ident.o
>     - LIB_OBJS += json-writer.o
>     - LIB_OBJS += kwset.o

Ah, no longer adding hook.o to the Makefile because it's been added in
one of your splintered-off dependency topics. Ok.

>     @@ builtin/hook.c (new)
>      +#include "strvec.h"
>      +
>      +static const char * const builtin_hook_usage[] = {
>     ++	N_("git hook <command> [...]"),
>     ++	N_("git hook run <hook-name> [-- <hook-args>]"),
>     ++	NULL
>     ++};
>     ++
>     ++static const char * const builtin_hook_run_usage[] = {
>      +	N_("git hook run <hook-name> [-- <hook-args>]"),
>      +	NULL
>      +};
Cool, so now we have a separate 'git hook<enter>' usage and 'git hook
run<enter>' usage.

I was going to complain about the strings being duplicated, but it
appears that other commands that use this pattern - like 'git remote' -
do it the same way as you're doing here. So seems like that's a problem
to ponder later, unrelated to this topic ;)

>     @@ builtin/hook.c (new)
>      +	};
>      +
>      +	argc = parse_options(argc, argv, prefix, run_options,
>     -+			     builtin_hook_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
>     ++			     builtin_hook_run_usage,
>     ++			     PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
>      +
>     -+	if (argc > 2) {
>     -+		if (strcmp(argv[2], "--") &&
>     -+		    strcmp(argv[2], "--end-of-options"))
>     ++	if (argc > 1) {
>     ++		if (strcmp(argv[1], "--") &&
>     ++		    strcmp(argv[1], "--end-of-options"))
>      +			/* Having a -- for "run" is mandatory */
>      +			usage_with_options(builtin_hook_usage, run_options);
>      +		/* Add our arguments, start after -- */
>     -+		for (i = 3 ; i < argc; i++)
>     ++		for (i = 2 ; i < argc; i++)

Nice, so the implementation of run_hooks() has one less stray arg to
worry about.

>      +			strvec_push(&opt.args, argv[i]);
>      +	}
>      +
>      +	/* Need to take into account core.hooksPath */
>      +	git_config(git_default_config, NULL);
>      +
>     -+	hook_name = argv[1];
>     ++	/*
>     ++	 * We are not using run_hooks() because we'd like to detect
>     ++	 * missing hooks. Let's find it ourselves and call
>     ++	 * run_found_hooks() instead.
>     ++	 */
>     ++	hook_name = argv[0];

And a more explicit comment. Thanks.

>      +	hook_path = find_hook(hook_name);
>      +	if (!hook_path) {
>      +		error("cannot find a hook named %s", hook_name);
>     @@ builtin/hook.c (new)
>      +	struct option builtin_hook_options[] = {
>      +		OPT_END(),
>      +	};
>     ++	argc = parse_options(argc, argv, NULL, builtin_hook_options,
>     ++			     builtin_hook_usage, PARSE_OPT_STOP_AT_NON_OPTION);
>     ++	if (!argc)
>     ++		usage_with_options(builtin_hook_usage, builtin_hook_options);

Nice - parse_options() removes the 'hook' part of the invocation for us,
so we don't need to carry it around into the subcommands. Thanks.

>      +
>     -+	if (!strcmp(argv[1], "run"))
>     ++	if (!strcmp(argv[0], "run"))
>      +		return run(argc, argv, prefix);
>     -+	usage_with_options(builtin_hook_usage, builtin_hook_options);
>     -+	return 1;
>     ++	else
>     ++		usage_with_options(builtin_hook_usage, builtin_hook_options);
>      +}

Ah, and here we lose the awkward "return something even though
usage_with_options() exits". Ok.

>     + ## hook.c ##
>     + static int known_hook(const char *name)
>     + {
>     + 	const char **p;
>     + 	size_t len = strlen(name);
>     ++	static int test_hooks_ok = -1;
>     ++

Setting aside whether or not I think it's a good idea, this seems
correct - we either allow test hooks for the entire lifetime of the
process (e.g. from the test runner) or enforce hookname validation for
the entire lifetime of the process (e.g. 'git hook run' or normal Git
operation). So it makes sense to cache it in a static. Ok.

>     + 	for (p = hook_name_list; *p; p++) {
>     + 		const char *hook = *p;
>     + 
>     +@@ hook.c: static int known_hook(const char *name)
>     + 			return 1;
>     + 	}
>     + 
>     ++	if (test_hooks_ok == -1)
>     ++		test_hooks_ok = git_env_bool("GIT_TEST_FAKE_HOOKS", 0);
>     ++
>     ++	if (test_hooks_ok &&
>     ++	    (!strcmp(name, "test-hook") ||
>     ++	     !strcmp(name, "does-not-exist")))
>     ++		return 1;
>     ++

As I mentioned above, this still seems too restrictive to me, but I
don't need to rehash it here.

>     @@ hook.c (new)
>      +	struct hook_cb_data *hook_cb = pp_cb;
>      +	struct hook *run_me = hook_cb->run_me;
>      +
>     -+	if (!run_me)
>     -+		BUG("did we not return 1 in notify_hook_finished?");
>     -+

I think the logic here is not quite correct, and I patched it in my
series on top when I enabled multiple hooks.

My understanding of the contract with run_processes_parallel() is this:
 - When you're done assigning tasks, return '0' from pick_next_hook()
 - If you want to stop all task processing because of one task's exit
   state, return '1' from notify_hook_finished()

But you instead are unconditionally returning '1' from
'notify_hook_finished()', and not handling "no more tasks to assign" in
pick_next_hook(). I think it is more correct to return 0 from
'notify_hook_finished()' - "this hook ran correctly, or if it ran
poorly, at least we don't want to prevent anybody else from running, as
that's not our problem". And then in your pick_next_hook(), since you
know you only want to run one hook and you aren't using a list yet, at
the end after you prepare the child_process, you can set hook_cb->run_me
= NULL - "there is no next hook to pick" - and then use "if (!run_me)"
to return 0 to tell run_processes_parallel() that there are no more
tasks to assign.

I  mention it because without noticing your changed flow in this patch,
I wrote a fairly pernicious bug in my rebase on top. By leaving
'notify_hook_finished()' to return 1, but enabling parallelism,
multihooks *appeared* to work - because all the tasks were assigned and
started when running in parallel, even though the first one to finish
said "wait don't start any more tasks please". But when running multiple
hooks with j=1, only the first hook would run, even though the list was
populated correctly - and stepping through pick_next_hook() didn't
indicate why.

I *think* you did it this way to avoid calling pick_next_hook() again with
nothing to iterate over, but since run_processes_parallel() is written
for lists of tasks, I don't think that kludge is the "correct" approach.

>     +@@ hook.h: const char *find_hook(const char *name);
>     -+	/* Number of threads to parallelize across */
>     ++	/*
>     ++	 * Number of threads to parallelize across, currently a stub,
>     ++	 * we use the parallel API for future-proofing, but we always
>     ++	 * have one hook of a given name, so this is always an
>     ++	 * implicit 1 for now.
>     ++	 */
>      +	int jobs;

Makes me wonder whether you really even want to leave it in here. I
guess it makes your call to run_processes_parallel() cleaner. But it'd
also be handy to swap '1' for 'options->jobs' in
https://lore.kernel.org/git/20210715232603.3415111-3-emilyshaffer@xxxxxxxxxx
instead, if you'd prefer.

>      +};
>      +
>     @@ hook.h (new)
>      +	.args = STRVEC_INIT, \
>      +}
>      +
>     -+/*
>     -+ * Callback provided to feed_pipe_fn and consume_sideband_fn.
>     -+ */

Why delete the comment? What was it hurting? ;)

>     @@ t/t1800-hook.sh (new)
>     ++test_expect_success 'git hook run: nonexistent hook' '

Ah, thanks for '%s/--/:/'. I know it was a nit, but that was bugging me :)

>      +test_expect_success 'git -c core.hooksPath=<PATH> hook run' '
>      +	mkdir my-hooks &&
>     -+	write_script my-hooks/test-hook <<-EOF &&
>     -+	echo Hook ran >>actual
>     ++	write_script my-hooks/test-hook <<-\EOF &&
>     ++	echo Hook ran $1 >>actual
>      +	EOF
>      +
>      +	cat >expect <<-\EOF &&
>      +	Test hook
>     -+	Hook ran
>     -+	Hook ran
>     -+	Hook ran
>     -+	Hook ran
>     ++	Hook ran one
>     ++	Hook ran two
>     ++	Hook ran three
>     ++	Hook ran four
>      +	EOF
>      +
>      +	# Test various ways of specifying the path. See also
>      +	# t1350-config-hooks-path.sh
>      +	>actual &&
>     -+	git hook run test-hook 2>>actual &&
>     -+	git -c core.hooksPath=my-hooks hook run test-hook 2>>actual &&
>     -+	git -c core.hooksPath=my-hooks/ hook run test-hook 2>>actual &&
>     -+	git -c core.hooksPath="$PWD/my-hooks" hook run test-hook 2>>actual &&
>     -+	git -c core.hooksPath="$PWD/my-hooks/" hook run test-hook 2>>actual &&
>     ++	git hook run test-hook -- ignored 2>>actual &&
>     ++	git -c core.hooksPath=my-hooks hook run test-hook -- one 2>>actual &&
>     ++	git -c core.hooksPath=my-hooks/ hook run test-hook -- two 2>>actual &&
>     ++	git -c core.hooksPath="$PWD/my-hooks" hook run test-hook -- three 2>>actual &&
>     ++	git -c core.hooksPath="$PWD/my-hooks/" hook run test-hook -- four 2>>actual &&

Thanks - this will be much easier to debug when someone (me, probably)
breaks one of these by accident ;)

>      +	test_cmp expect actual
>      +'
>      +
>     -+test_expect_success 'set up a pre-commit hook in core.hooksPath' '
>     -+	>actual &&
>     -+	mkdir -p .git/custom-hooks .git/hooks &&
>     -+	write_script .git/custom-hooks/pre-commit <<-\EOF &&
>     -+	echo CUSTOM >>actual
>     -+	EOF
>     -+	write_script .git/hooks/pre-commit <<-\EOF
>     -+	echo NORMAL >>actual
>     -+	EOF
>     -+'

Hm, is this removed because we feel like the test-hook test above gives
us enough coverage? Ok.

>  9:  8d8b2d2645 !  7:  12347d901b git hook run: add an --ignore-missing flag
>       ## t/t1800-hook.sh ##
>     +-test_expect_success 'git hook run: basic' '
>     ++test_expect_success 'git hook run: nonexistent hook with --ignore-missing' '
>      +	git hook run --ignore-missing does-not-exist 2>stderr.actual &&
>      +	test_must_be_empty stderr.actual
>      +'
>      +
>     - test_expect_success 'git hook run -- basic' '
>     ++test_expect_success 'git hook run -- basic' '

Ah, I see the range-diff caught s/--/:/ earlier but that step was missed
in patch 9, which made this bit of the diff a little confused.

> 11:  aa970a8175 !  9:  246a82b55b git-p4: use 'git hook' to run hooks
>     @@ Commit message
>          Python, we can directly call 'git hook run'. We emulate the existence
>          check with the --ignore-missing flag.
>      
>     +    As this is the last hook execution in git.git to not go through "git
>     +    hook run" or the hook.[ch] library we can now be absolutely sure that
>     +    our assertion in hook.c that only hooks known by the generated (from
>     +    githooks(5)) hook-list.h are permitted.
>     +

\o/

> 17:  c4f60db606 ! 15:  b7c0ee9719 hook: support passing stdin to hooks
>       ## builtin/hook.c ##
>       static const char * const builtin_hook_usage[] = {
>     + 	N_("git hook <command> [...]"),
>      -	N_("git hook run <hook-name> [-- <hook-args>]"),
>     ++	N_("git hook run [<args>] <hook-name> [-- <hook-args>]"),
>     + 	NULL
>     + };
>     + 
>     + static const char * const builtin_hook_run_usage[] = {
>     + 	N_("git hook run <hook-name> [-- <hook-args>]"),
>      +	N_("git hook run [--to-stdin=<path>] <hook-name> [-- <hook-args>]"),

Nice, and here the strings in builtin_hook_usage and
builtin_hook_run_usage start to differ, which makes my earlier concern
entirely moot. Thanks.

--

Aside from the questions about a) this test hook envvar and b) specific
mechanics of terminating run_processes_parallel(), this series looks
good to me.

 - Emily



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux