On Sun, Jan 31, 2021 at 10:04:22PM -0800, Jonathan Tan wrote: > > > In many cases, there's no reason not to allow hooks to execute in > > parallel. run_processes_parallel() is well-suited - it's a task queue > > that runs its housekeeping in series, which means users don't > > need to worry about thread safety on their callback data. True > > multithreaded execution with the async_* functions isn't necessary here. > > Synchronous hook execution can be achieved by only allowing 1 job to run > > at a time. > > > > Teach run_hooks() to use that function for simple hooks which don't > > require stdin or capture of stderr. > > Which hooks would be run in parallel, and which hooks in series? I don't > see code that distinguishes between them. It's up to the caller, who can set run_hooks_opt.jobs. In part II of this series I made a guess at which ones should run in parallel or in series and specified it in Documentation/githooks.txt. > > > > > Signed-off-by: Emily Shaffer <emilyshaffer@xxxxxxxxxx> > > --- > > > > Notes: > > Per AEvar's request - parallel hook execution on day zero. > > > > In most ways run_processes_parallel() worked great for me - but it didn't > > have great support for hooks where we pipe to and from. I had to add this > > support later in the series. > > > > Since I modified an existing and in-use library I'd appreciate a keen look on > > these patches. > > What is the existing and in-use library that you're modifying? Hm, this note wasn't super specific. From this point onwards in the series I make changes to the run-command.h:run_processes_parallel() library, although not in this commit itself. I think I meant "from here on out, help me look at run-command.h". I'll try to make the note a little better next series, sorry for the confusion :) :) > > > @@ -246,11 +255,96 @@ void run_hooks_opt_clear(struct run_hooks_opt *o) > > strvec_clear(&o->args); > > } > > > > + > > +static int pick_next_hook(struct child_process *cp, > > + struct strbuf *out, > > + void *pp_cb, > > + void **pp_task_cb) > > +{ > > + struct hook_cb_data *hook_cb = pp_cb; > > + > > + struct hook *hook = list_entry(hook_cb->run_me, struct hook, list); > > + > > + if (hook_cb->head == hook_cb->run_me) > > + return 0; > > + > > + cp->env = hook_cb->options->env.v; > > + cp->stdout_to_stderr = 1; > > + cp->trace2_hook_name = hook->command.buf; > > + > > + /* reopen the file for stdin; run_command closes it. */ > > + if (hook_cb->options->path_to_stdin) { > > + cp->no_stdin = 0; > > + cp->in = xopen(hook_cb->options->path_to_stdin, O_RDONLY); > > + } else { > > + cp->no_stdin = 1; > > + } > > + > > + /* > > + * Commands from the config could be oneliners, but we know > > + * for certain that hookdir commands are not. > > + */ > > + if (hook->from_hookdir) > > + cp->use_shell = 0; > > + else > > + cp->use_shell = 1; > > + > > + /* add command */ > > + strvec_push(&cp->args, hook->command.buf); > > + > > + /* > > + * add passed-in argv, without expanding - let the user get back > > + * exactly what they put in > > + */ > > + strvec_pushv(&cp->args, hook_cb->options->args.v); > > I just skimmed over this setup-process-for-hook part - it would have > been much clearer if it was refactored into its own function before this > patch (or better yet, written as its own function in the first place). > As it is, there are some unnecessary rewritings - e.g. setting stdin > after env, and the use_shell setup. Yeah, that makes sense. Will see if I can change it for next round :) > > > diff --git a/hook.h b/hook.h > > [snip] > > > +/* > > + * Callback provided to feed_pipe_fn and consume_sideband_fn. > > + */ > > +struct hook_cb_data { > > + int rc; > > + struct list_head *head; > > + struct list_head *run_me; > > + struct run_hooks_opt *options; > > +}; > > Could this be in hook.c instead? It ends up being needed publicly by https://lore.kernel.org/git/20201222000435.1529768-17-emilyshaffer@xxxxxxxxxx (receive-pack: convert receive hooks to hook.h), which writes its own stdin provider callback. (In a later commit, a "void* options" gets added to this struct.) At that point it's needed because the run-command callback structure can provide one context pointer for the overall work queue, and one context pointer for the individual task; this one is the "overall work queue" pointer. >From hook.h's perspective, the entire hook_cb_data is needed for pick_next_hook; but run-command.h:run_processes_parallel() doesn't have a way to tease out a smaller amount of the context pointer for various callbacks. If we wanted to obfuscate "hook_cb_data" we'd need to add another indirection and call back to hook.h first, who could then tease out the client-provided context and then call the client callback, but to me it sounds unnecessarily complex. > > Also, I think it's clearer if run_me was a struct hook, and set to NULL > when iteration reaches the end. If you disagree, I think it needs some > documentation (e.g. "the embedded linked list part of the hook that must > be run next; if equal to head, then iteration has ended" or something > like that). Yeah, I don't see a huge reason not to do that, sure. > > > +#define RUN_HOOKS_OPT_INIT_SYNC { \ > > .env = STRVEC_INIT, \ > > .args = STRVEC_INIT, \ > > .path_to_stdin = NULL, \ > > + .jobs = 1, \ > > .run_hookdir = configured_hookdir_opt() \ > > } > > This is not used anywhere. It is used in part II by hooks which are not able to be parallelized. - Emily