On Thu, Feb 25, 2021 at 12:56:11PM -0800, Jonathan Tan wrote: > > > > When would we need per-hook state? I see in patch 14 that you give each > > > running process little by little (in pp_buffer_stdin()), perhaps so that > > > each hook can make progress at roughly the same pace, but I don't think > > > we can expect all hooks to work the same, so I don't think it's worth > > > complicating the design for all that. > > > > I agree that this is a complicated way of doing it, and if you have a > > better design I'd be really excited to hear it. > > > > It seemed like this was what was necessary for hooks like > > https://lore.kernel.org/git/20201222000435.1529768-15-emilyshaffer@xxxxxxxxxx > > where the hook and the invoking process talk back and forth, or like > > https://lore.kernel.org/git/20201222000435.1529768-17-emilyshaffer@xxxxxxxxxx > > which generates stdin on the fly for hooks which cannot be parallelized > > (and so won't run at the same pace). > > > > The former example - proc-receive - does have a constraint that multiple > > hooks can't be specified, so we could theoretically keep the old > > implementation and just pick up the single hook's location from the new > > hook library. But the latter example still makes me think this much > > complexity is needed. > > Ah, I see. From your explanation, in these 2 cases, only one hook > executes at a time (in the former case, because there is only one hook, > and in the latter case, you said that the hooks cannot be parallelized). > So it seems to me that the global state (in struct run_hooks_opt) would > be sufficient to keep track of what's going on. (The feed_pipe_fn > function can use pp_cb to keep track of the last executing pp_task_cb > and then compare it against the new pp_task_cb, I think, to keep track > of when a new hook has started.) > > Even in the case of multiple hooks run in series (as opposed to a single > hook), I would think that the reason they can't be run in parallel is > because the nature of execution of a hook depends on what happened > during the execution of the previous hook, which seems to me to be even > more reason to centralize the state in struct run_hooks_opt. > > Having said that, if my suggestion of not having per-hook state makes > certain patches more complicated, then that might be reason enough to > have per-hook state. In that case, you should write "per-hook state, > though strictly not necessary, makes <case> simpler" (or something like > that) in the commit message. Jonathan and I discussed this a little more offline and agreed to leave the implementation as is. Jonathan had suggested "have one callback invocation apply to all hooks that are running now", either by having the callback iterate over the task queue or by having the run-command lib take the result from the callback and have *that* iterate over the task queue. The idea being, one pointer to one copy of source material is easier to handle than many. I suggested that the callback's implementation of the second version of that, where the library takes care of the "and do it for each task in progress" part, would be pretty much identical to the callback's implementation as it is in this patch, except that as it is here the context pointer is per-task and as Jonathan suggests the context pointer is per-entire-hook-invocation - so there isn't much complexity difference between the two, from the user's perspective. We also talked about cases where N=# of hooks > M=# of jobs, that is, where some hooks must wait for other hooks to finish executing before that could start. In this case, users' callback implementations would need to be able to start over from the beginning of the source material, and a long-running hook would block other short-running hooks from beginning (because the long-running hook would be confused by hearing the source material to its stdin again). Hopefully this diagram illustrates better based on my understanding of Jonathan's suggestion: A B <- "hey everyone, stdin 'foo'" A B <- "hey everyone, stdin 'bar'" A B <- "hey everyone, stdin 'baz'" B B B B C <- "hey everyone, stdin 'foo'" C <- "hey everyone, stdin 'bar'" C <- "hey everyone, stdin 'baz'" C Anyway, since the complexity is probably about the same to the end user and using per-hook context means we don't have to wait like this, we agreed to stick with the implementation as is. - Emily