Re: [PATCH 2/9] hook: allow parallel hook execution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 16, 2021 at 10:36:10AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Jul 15 2021, Emily Shaffer wrote:
> 
> > In many cases, there's no reason not to allow hooks to execute in
> > parallel. run_processes_parallel() is well-suited - it's a task queue
> > that runs its housekeeping in series, which means users don't
> > need to worry about thread safety on their callback data. True
> > multithreaded execution with the async_* functions isn't necessary here.
> > Synchronous hook execution can be achieved by only allowing 1 job to run
> > at a time.
> >
> > Teach run_hooks() to use that function for simple hooks which don't
> > require stdin or capture of stderr.
> 
> This doesn't mention...
> 
> >  	int ret;
> > -	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
> > +	struct run_hooks_opt opt;
> >  
> > +	run_hooks_opt_init_sync(&opt);
> 
> 
> ...why we need to bring the s/macro/func/ init pattern, back, but looking ahead...
> 
> > +int configured_hook_jobs(void)a
> > +{
> > +	int n = online_cpus();
> > +	git_config_get_int("hook.jobs", &n);
> > +
> > +	return n;
> > +}
> > +
> >  int hook_exists(const char *name)
> >  {
> >  	return !!find_hook(name);
> > @@ -117,6 +125,26 @@ struct list_head* hook_list(const char* hookname)
> >  	return hook_head;
> >  }
> >  
> > +void run_hooks_opt_init_sync(struct run_hooks_opt *o)
> > +{
> > +	strvec_init(&o->env);
> > +	strvec_init(&o->args);
> > +	o->path_to_stdin = NULL;
> > +	o->jobs = 1;
> > +	o->dir = NULL;
> > +	o->feed_pipe = NULL;
> > +	o->feed_pipe_ctx = NULL;
> > +	o->consume_sideband = NULL;
> > +	o->invoked_hook = NULL;
> > +	o->absolute_path = 0;
> > +}
> > +
> > +void run_hooks_opt_init_async(struct run_hooks_opt *o)
> > +{
> > +	run_hooks_opt_init_sync(o);
> > +	o->jobs = configured_hook_jobs();
> > +}
> 
> ...okey, so it's because you brought back the "call jobs function" in
> one of the init functions.
> 
> I had a comment in a previous round, I found
> https://lore.kernel.org/git/87lf7bzbrk.fsf@xxxxxxxxxxxxxxxxxxx/, but I
> think there was a later one where I commented on the "jobs" field
> specifically.
> 
> Anyway, it seems much easier to me to just keep the simpler macro init
> and then:
> 
> > -	if (options->jobs != 1)
> > -		BUG("we do not handle %d or any other != 1 job number yet", options->jobs);
> > -
> >  	run_processes_parallel_tr2(options->jobs,
> >  				   pick_next_hook,
> >  				   notify_start_failure,
> 
> There's this one place where we use the "jobs" parameter, just do
> something like this there:
>         
>         int configured_hook_jobs(void)
>         {
>                 static int jobs;
>                 if (!jobs)
>                     return jobs;
>                 if (git_config_get_int("hook.jobs", &jobs))
>                     jobs = online_cpus();
>                 return jobs;
>         }
> 
> I.e. you also needlessly call online_cpus() when we're about to override
> it in the config. The git_config_get_int()'s return value indicates
> whether you need to do that. Then just:
> 
>     int jobs = options->jobs ? options->jobs : configured_hook_jobs();
>     run_processes_parallel_tr2(jobs, [...]);

Ahh, and then let RUN_HOOKS_OPT_INIT_ASYNC set jobs to 0 ("go look it
up"). Yeah, that makes sense.

Shout if somehow you meant to leave just one initializer macro;
otherwise, I'll do it this way - with RUN_HOOKS_OPT_INIT_ASYNC and
RUN_HOOKS_OPT_INIT_SYNC. I think it's valuable for hook callers to make
it very plain at the callsite whether they're parallelizable or not, and
I think

 struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
 opt.jobs = 0;

doesn't make that as obvious.

> 
> Or some such, i.e. we can defer getting the job number away from startup
> to when we actually need to start those jobs, and your whole use of a
> function init pattern came down to doing that really early.
> 
> As an aside if you /do/ need to do init-via-function my 5726a6b4012 (*.c
> *_init(): define in terms of corresponding *_INIT macro, 2021-07-01) in
> "next" shows a much nicer way to do that. I.e. you'd just do:
> 
>     void run_hooks_opt_init_sync(struct run_hooks_opt *o)
>     {
>          struct run_hooks_opt blank = RUN_HOOKS_OPT_INIT;
>          memcpy(o, &blank, sizeof(*o));
>     }
> 
>     void run_hooks_opt_init_async(struct run_hooks_opt *o)
>     {
>         run_hooks_opt_init_sync(o);
>         o->jobs = configured_hook_jobs();
>     }
> 
> In some cases we do actually need to do init via functions, but can init
> a large option via the macro, which IMO is nicer to read, but here I
> think we don't need the functions at all per the above.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux