Re: [RFC PATCH 11/21] parallel-checkout: make it truly parallel

Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> · Wed, 19 Aug 2020 22:33:13 -0300

On Wed, Aug 19, 2020 at 6:34 PM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
>
> On 8/10/20 5:33 PM, Matheus Tavares wrote:
> >
> > +static struct child_process *setup_workers(struct checkout *state, int num_workers)
> > +{
> > +     struct child_process *workers;
> > +     int i, workers_with_one_extra_item;
> > +     size_t base_batch_size, next_to_assign = 0;
> > +
> > +     base_batch_size = parallel_checkout->nr / num_workers;
> > +     workers_with_one_extra_item = parallel_checkout->nr % num_workers;
> > +     ALLOC_ARRAY(workers, num_workers);
> > +
> > +     for (i = 0; i < num_workers; ++i) {
> > +             struct child_process *cp = &workers[i];
> > +             size_t batch_size = base_batch_size;
> > +
> > +             child_process_init(cp);
> > +             cp->git_cmd = 1;
> > +             cp->in = -1;
> > +             cp->out = -1;
> > +             strvec_push(&cp->args, "checkout--helper");
> > +             if (state->base_dir_len)
> > +                     strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
> > +             if (start_command(cp))
> > +                     die(_("failed to spawn checkout worker"));
>
> We should consider splitting this loop into one to start the helpers
> and another loop to later send them their assignments.  This would
> better hide the process startup costs.
>
> When comparing this version with my pc-p4-core branch on Windows,
> I was seeing a delay of 0.8 seconds between each helper process
> getting started.  And on my version a delay of 0.2 between them.
>
> I was testing with a huge repo and the batch size was ~200k, so it
> foreground process was stuck in send_batch() for a while before it
> could start the next helper process.
>
> It still takes the same amount of time to send each batch, but
> the 2nd thru nth helpers can be starting while we are sending the
> batch to the 1st helper.  (This might just be a Windows issue because
> of how slow process creation is on Windows....)

Thanks for the explanation. I will split the loop in v2.

> We could maybe also save a little time splitting the batches
> across the helpers, but that's a refinement for later...
>
> > +
> > +             /* distribute the extra work evenly */
> > +             if (i < workers_with_one_extra_item)
> > +                     batch_size++;
> > +
> > +             send_batch(cp->in, next_to_assign, batch_size);
> > +             next_to_assign += batch_size;
> >       }
> >
> > +     return workers;
> > +}
> > +
> > +static void finish_workers(struct child_process *workers, int num_workers)
> > +{
> > +     int i;
> > +     for (i = 0; i < num_workers; ++i) {
> > +             struct child_process *w = &workers[i];
> > +             if (w->in >= 0)
> > +                     close(w->in);
> > +             if (w->out >= 0)
> > +                     close(w->out);
>
> You might also consider splitting this loop too.  The net-net here
> is that the foreground process closes the handle to the child and
> waits for the child to exit -- which it will because it get EOF on
> its stdin.
>
> But the foreground process is stuck in a wait() for it to do so.
>
> You could make finish_workers() just call close() on all the child
> handles and then have an atexit() handler to actually wait() and
> reap them.  This would let the children exit asynchronously (while
> the caller here in the foreground process is updating the index
> on disk, for example).

Makes sense, thanks. And I think we could achieve this by setting both
`clean_on_exit` and `wait_after_clean` on the child_process struct,
right? (BTW, I have just noticed that we probably want these flags set
for another reason too: we wouldn't want the workers to keep checking
out files if the main process was killed.)

Maybe the downside of using the atexit() handler, instead of calling
finish_command(), would be that we cannot free the `workers` array
when cleaning up parallel-checkout, right? Also, we wouldn't be able
to report an error if the worker ends with an error code (but at this
point it would have already sent all the results to the foreground
process, anyway).

Do you think that we can mitigate the wait() cost by just splitting
the loop in two, like we are going to do in setup_workers()?

>
> > +             if (finish_command(w))
> > +                     die(_("checkout worker finished with error"));
> > +     }
> > +     free(workers);
> > +}