On Thu, Aug 12, 2021 at 10:51:01AM -0700, Junio C Hamano wrote: > > Emily Shaffer <emilyshaffer@xxxxxxxxxx> writes: > > > In many cases, there's no reason not to allow hooks to execute in > > parallel. run_processes_parallel() is well-suited - it's a task queue > > that runs its housekeeping in series, which means users don't > > need to worry about thread safety on their callback data. True > > multithreaded execution with the async_* functions isn't necessary here. > > Synchronous hook execution can be achieved by only allowing 1 job to run > > at a time. > > > > Teach run_hooks() to use that function for simple hooks which don't > > require stdin or capture of stderr. > > Since run_hooks() has been using run_processes_parallel_tr2() > already in the ab/config-based-hooks-base topic, the above > description puzzled me quite a bit. I think you meant to say with > this step you update the callers to pass either .jobs=1 or .jobs=0 > in the hooks_opt, so that hooks _can_ run in parallel (as opposed > to using hardcoded jobs=1 in run_hooks() when calling the underlying > run_processes_parallel_tr2() function). I think you are right, and this commit message is from a much older version of the series. Ok; I will reword it. I think it can be much simpler. > > I do not think SYNC and ASYNC are great adjectives for what you are > doing in this step. Wouldn't ASYNC mean "the caller tells the hooks > to run, continues without waiting for them to finish"? In order to > let more than one hooks to run at the same time, the caller has to > continue without waiting for the first one it started so that it can > start the second one before the first one finishes (otherwise these > two hooks will not run at the same time), but ASYNC implys a lot > more than that. After starting all of these hooks, the caller may > continue doing things that are not even related to these spawned > hooks if a hook is run asynchronously. > > But I do not think that is what is happening here. After all, > run_processes_parallel_tr2() will not return until all of the > subprocesses are culled. > > Perhaps the distinction you are trying to express is easier to > convey to readers if you instead used the verb TO RUN with the > adverb SERIALLY vs IN PARALLEL (as opposed to SYNCHRONOUSLY vs > ASYNCHRONOUSLY)? Good point. I think you are right and I'll change the init macros to SERIAL/PARALLEL instead. > > > diff --git a/builtin/hook.c b/builtin/hook.c > > index 4d39c9e75e..12c9126032 100644 > > --- a/builtin/hook.c > > +++ b/builtin/hook.c > > @@ -22,7 +22,7 @@ static const char * const builtin_hook_run_usage[] = { > > static int run(int argc, const char **argv, const char *prefix) > > { > > int i; > > - struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT; > > + struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT_SYNC; > > int ignore_missing = 0; > > const char *hook_name; > > struct list_head *hooks; > > @@ -32,6 +32,8 @@ static int run(int argc, const char **argv, const char *prefix) > > N_("exit quietly with a zero exit code if the requested hook cannot be found")), > > OPT_STRING(0, "to-stdin", &opt.path_to_stdin, N_("path"), > > N_("file to read into hooks' stdin")), > > + OPT_INTEGER('j', "jobs", &opt.jobs, > > + N_("run up to <n> hooks simultaneously")), > > OPT_END(), > > }; > > int ret; > > OK, so by default "git hook" runs 1 at a time but we can instruct to > run the optimal number of jobs via "git hook -j0" etc. Is it surprising for an explicitly provided "-j0" to still be overridden by a config if one was provided? That is, "-j0" doesn't mean "-j$(nproc)" as it does in many other Unixy tools, here. It means "-j(hook.jobs OR $(nproc))". I guess you could also say that "if you configured hook.jobs, it is because you decided nproc was less optimal than the value you put into hook.jobs" - so maybe it is fine? > > > diff --git a/hook.c b/hook.c > > index 80e150548c..37f682c6d8 100644 > > --- a/hook.c > > +++ b/hook.c > > @@ -228,6 +228,28 @@ static int notify_hook_finished(int result, > > return 0; > > } > > > > +/* > > + * Determines how many jobs to use after we know we want to parallelize. First > > + * priority is the config 'hook.jobs' and second priority is the number of CPUs. > > + */ > > +static int configured_hook_jobs(void) > > +{ > > + /* > > + * The config and the CPU count probably won't change during the process > > + * lifetime, so cache the result in case we invoke multiple hooks during > > + * one process. > > + */ > > + static int jobs = 0; > > + if (jobs) > > + return jobs; > > + > > + if (git_config_get_int("hook.jobs", &jobs)) > > + /* if the config isn't set, fall back to CPU count. */ > > + jobs = online_cpus(); > > + > > + return jobs; > > +} > > Not a suggestion to make this improvement as a part of this series, > but there already are more than several codepaths doing essentially > the same thing as above, taking their own configuration or > environment variables and falling back to online_cpus(). > > It may be worth introducing a shared helper function so that this > can become > > static int configured_hook_jobs(void) > { > static int jobs; /* no need for "= 0" */ > > if (!jobs) > jobs = default_parallelism("hook.jobs", NULL); > return jobs; > } > > where the new helper takes the configuration and environment > variable names. Sounds handy. - Emily