Re: [PATCH v3 1/6] run-command: add pipe_output_fn to run_processes_parallel_opts

Calvin Wan <calvinwan@xxxxxxxxxx> · Mon, 24 Oct 2022 10:13:08 -0700

On Thu, Oct 20, 2022 at 8:31 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
>
> On Thu, Oct 20 2022, Calvin Wan wrote:
>
> > Add pipe_output_fn as an optionally set function in
> > run_process_parallel_opts. If set, output from each child process is
> > first separately stored in 'out' and then piped to the callback
> > function when the child process finishes to allow for separate parsing.
>
> The "when[...]finish[ed]" here seems a bit odd to me. Why isn't the API
> to just stream this to callbacks as it comes in.
>
> Then if a caller only cares about the output at the very end they can
> manage that state between their streaming callbacks and "finish"
> callback, i.e. buffer it & flush it themselves.

That's a good idea. This also lets me remove the 'out' variable from
parallel_process.children.

>
> > diff --git a/run-command.c b/run-command.c
> > index c772acd743..03787bc7f5 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -1503,6 +1503,7 @@ struct parallel_processes {
> >               enum child_state state;
> >               struct child_process process;
> >               struct strbuf err;
> > +             struct strbuf out;
> >               void *data;
> >       } *children;
> >       /*
> > @@ -1560,6 +1561,9 @@ static void pp_init(struct parallel_processes *pp,
> >
> >       if (!opts->get_next_task)
> >               BUG("you need to specify a get_next_task function");
> > +
> > +     if (opts->pipe_output && opts->ungroup)
> > +             BUG("pipe_output and ungroup are incompatible with each other");
> >
> >       CALLOC_ARRAY(pp->children, n);
> >       if (!opts->ungroup)
> > @@ -1567,6 +1571,8 @@ static void pp_init(struct parallel_processes *pp,
> >
> >       for (size_t i = 0; i < n; i++) {
> >               strbuf_init(&pp->children[i].err, 0);
> > +             if (opts->pipe_output)
> > +                     strbuf_init(&pp->children[i].out, 0);
>
> Even if we're not using this, let's init it for simplicity. We don't use
> the "err" with ungroup and we're init-ing that, and...

ack.

>
> >               child_process_init(&pp->children[i].process);
> >               if (pp->pfd) {
> >                       pp->pfd[i].events = POLLIN | POLLHUP;
> > @@ -1586,6 +1592,7 @@ static void pp_cleanup(struct parallel_processes *pp,
> >       trace_printf("run_processes_parallel: done");
> >       for (size_t i = 0; i < opts->processes; i++) {
> >               strbuf_release(&pp->children[i].err);
> > +             strbuf_release(&pp->children[i].out);
>
> ...here you're strbuf_relese()-ing a string that was never init'd, it's
> not segfaulting because we check sb->alloc, and since we calloc'd this
> whole thing it'll be 0, but let's just init it so it's a proper strbuf
> (with slopbuf). It's cheap.

ack.

> > +/**
> > + * This callback is called on every child process that finished processing.
> > + *
> > + * "struct strbuf *process_out" contains the output from the finished child
> > + * process.
> > + *
> > + * pp_cb is the callback cookie as passed into run_processes_parallel,
> > + * pp_task_cb is the callback cookie as passed into get_next_task_fn.
> > + *
> > + * This function is incompatible with "ungroup"
> > + */
> > +typedef void (*pipe_output_fn)(struct strbuf *process_out,
> > +                            void *pp_cb,
> > +                            void *pp_task_cb);
> > +
> >  /**
> >   * This callback is called on every child process that finished processing.
> >   *
> > @@ -493,6 +508,12 @@ struct run_process_parallel_opts
> >        */
> >       start_failure_fn start_failure;
> >
> > +     /**
> > +      * pipe_output: See pipe_output_fn() above. This should be
> > +      * NULL unless process specific output is needed
> > +      */
> > +     pipe_output_fn pipe_output;
> > +
> >       /**
> >        * task_finished: See task_finished_fn() above. This can be
> >        * NULL to omit any special handling.
> > diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> > index 3ecb830f4a..e9b41419a0 100644
> > --- a/t/helper/test-run-command.c
> > +++ b/t/helper/test-run-command.c
> > @@ -52,6 +52,13 @@ static int no_job(struct child_process *cp,
> >       return 0;
> >  }
> >
> > +static void pipe_output(struct strbuf *process_out,
> > +                     void *pp_cb,
> > +                     void *pp_task_cb)
> > +{
> > +     fprintf(stderr, "%s", process_out->buf);
>
> maybe print this with split lines prefixed with something so wour tests
> can see that something actually happened here, & test-cmp it so we can
> see what went where, as opposed to...
>
> > +test_expect_success 'run_command runs in parallel with more jobs available than tasks --pipe-output' '
> > +     test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> > +     test_must_be_empty out &&
> > +     test_line_count = 20 err
> > +'
>
> Just checking the number of lines, which seems to leave a lot of leeway
> for the output being mixed up in all sorts of ways & the test to still
> pass..
>
> (ditto below)

ack.