Re: [PATCH v7 12/17] hook: allow parallel hook execution

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Sun, 31 Jan 2021 22:04:22 -0800

> In many cases, there's no reason not to allow hooks to execute in
> parallel. run_processes_parallel() is well-suited - it's a task queue
> that runs its housekeeping in series, which means users don't
> need to worry about thread safety on their callback data. True
> multithreaded execution with the async_* functions isn't necessary here.
> Synchronous hook execution can be achieved by only allowing 1 job to run
> at a time.
> 
> Teach run_hooks() to use that function for simple hooks which don't
> require stdin or capture of stderr.

Which hooks would be run in parallel, and which hooks in series? I don't
see code that distinguishes between them.

> 
> Signed-off-by: Emily Shaffer <emilyshaffer@xxxxxxxxxx>
> ---
> 
> Notes:
>     Per AEvar's request - parallel hook execution on day zero.
>     
>     In most ways run_processes_parallel() worked great for me - but it didn't
>     have great support for hooks where we pipe to and from. I had to add this
>     support later in the series.
>     
>     Since I modified an existing and in-use library I'd appreciate a keen look on
>     these patches.

What is the existing and in-use library that you're modifying?

> @@ -246,11 +255,96 @@ void run_hooks_opt_clear(struct run_hooks_opt *o)
>  	strvec_clear(&o->args);
>  }
>  
> +
> +static int pick_next_hook(struct child_process *cp,
> +			  struct strbuf *out,
> +			  void *pp_cb,
> +			  void **pp_task_cb)
> +{
> +	struct hook_cb_data *hook_cb = pp_cb;
> +
> +	struct hook *hook = list_entry(hook_cb->run_me, struct hook, list);
> +
> +	if (hook_cb->head == hook_cb->run_me)
> +		return 0;
> +
> +	cp->env = hook_cb->options->env.v;
> +	cp->stdout_to_stderr = 1;
> +	cp->trace2_hook_name = hook->command.buf;
> +
> +	/* reopen the file for stdin; run_command closes it. */
> +	if (hook_cb->options->path_to_stdin) {
> +		cp->no_stdin = 0;
> +		cp->in = xopen(hook_cb->options->path_to_stdin, O_RDONLY);
> +	} else {
> +		cp->no_stdin = 1;
> +	}
> +
> +	/*
> +	 * Commands from the config could be oneliners, but we know
> +	 * for certain that hookdir commands are not.
> +	 */
> +	if (hook->from_hookdir)
> +		cp->use_shell = 0;
> +	else
> +		cp->use_shell = 1;
> +
> +	/* add command */
> +	strvec_push(&cp->args, hook->command.buf);
> +
> +	/*
> +	 * add passed-in argv, without expanding - let the user get back
> +	 * exactly what they put in
> +	 */
> +	strvec_pushv(&cp->args, hook_cb->options->args.v);

I just skimmed over this setup-process-for-hook part - it would have
been much clearer if it was refactored into its own function before this
patch (or better yet, written as its own function in the first place).
As it is, there are some unnecessary rewritings - e.g. setting stdin
after env, and the use_shell setup.

> diff --git a/hook.h b/hook.h

[snip]

> +/*
> + * Callback provided to feed_pipe_fn and consume_sideband_fn.
> + */
> +struct hook_cb_data {
> +	int rc;
> +	struct list_head *head;
> +	struct list_head *run_me;
> +	struct run_hooks_opt *options;
> +};

Could this be in hook.c instead?

Also, I think it's clearer if run_me was a struct hook, and set to NULL
when iteration reaches the end. If you disagree, I think it needs some
documentation (e.g. "the embedded linked list part of the hook that must
be run next; if equal to head, then iteration has ended" or something
like that).

> +#define RUN_HOOKS_OPT_INIT_SYNC  {   		\
>  	.env = STRVEC_INIT, 			\
>  	.args = STRVEC_INIT, 			\
>  	.path_to_stdin = NULL,			\
> +	.jobs = 1,				\
>  	.run_hookdir = configured_hookdir_opt()	\
>  }

This is not used anywhere.