Re: [PATCH v7 15/17] hook: provide stdin by string_list or callback

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Tue, 23 Feb 2021 11:52:56 -0800

On Sun, Jan 31, 2021 at 11:04:48PM -0800, Jonathan Tan wrote:
> 
> > In cases where a hook requires only a small amount of information via
> > stdin, it should be simple for users to provide a string_list alone. But
> > in more complicated cases where the stdin is too large to hold in
> > memory, let's provide a callback the users can populate line after line
> > with instead.
> 
> [snip]
> 
> > diff --git a/hook.h b/hook.h
> > index 8a7542610c..0ac83fa7ca 100644
> > --- a/hook.h
> > +++ b/hook.h
> > @@ -2,6 +2,7 @@
> >  #include "list.h"
> >  #include "strbuf.h"
> >  #include "strvec.h"
> > +#include "run-command.h"
> >  
> >  struct hook
> >  {
> > @@ -14,6 +15,12 @@ struct hook
> >  	/* The literal command to run. */
> >  	struct strbuf command;
> >  	int from_hookdir;
> > +
> > +	/*
> > +	 * Use this to keep state for your feed_pipe_fn if you are using
> > +	 * run_hooks_opt.feed_pipe. Otherwise, do not touch it.
> > +	 */
> > +	void *feed_pipe_cb_data;
> 
> When would we need per-hook state? I see in patch 14 that you give each
> running process little by little (in pp_buffer_stdin()), perhaps so that
> each hook can make progress at roughly the same pace, but I don't think
> we can expect all hooks to work the same, so I don't think it's worth
> complicating the design for all that.

I agree that this is a complicated way of doing it, and if you have a
better design I'd be really excited to hear it.

It seemed like this was what was necessary for hooks like
https://lore.kernel.org/git/20201222000435.1529768-15-emilyshaffer@xxxxxxxxxx
where the hook and the invoking process talk back and forth, or like
https://lore.kernel.org/git/20201222000435.1529768-17-emilyshaffer@xxxxxxxxxx
which generates stdin on the fly for hooks which cannot be parallelized
(and so won't run at the same pace).

The former example - proc-receive - does have a constraint that multiple
hooks can't be specified, so we could theoretically keep the old
implementation and just pick up the single hook's location from the new
hook library. But the latter example still makes me think this much
complexity is needed.

> 
> >  };
> >  
> >  /*
> > @@ -57,12 +64,24 @@ struct run_hooks_opt
> >  
> >  	/* Path to file which should be piped to stdin for each hook */
> >  	const char *path_to_stdin;
> > +	/* Pipe each string to stdin, separated by newlines */
> > +	struct string_list str_stdin;
> > +	/*
> > +	 * Callback and state pointer to ask for more content to pipe to stdin.
> > +	 * Will be called repeatedly, for each hook. See
> > +	 * hook.c:pipe_from_stdin() for an example. Keep per-hook state in
> > +	 * hook.feed_pipe_cb_data (per process). Keep initialization context in
> > +	 * feed_pipe_ctx (shared by all processes).
> > +	 */
> > +	feed_pipe_fn feed_pipe;
> > +	void *feed_pipe_ctx;
> 
> Instead of 3 fields, I think 2 suffice - the function and the data
> (called "ctx" here). We can supply a function that treats the data as a
> string_list.

Nice catch, sure.

 - Emily