Re: [PATCH 6/8] run-command: add an asynchronous parallel child processor

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 30 Sep 2015 11:28:47 -0700

On Tue, Sep 29, 2015 at 8:12 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> +     while (1) {
>> +             int i;
>> +             int output_timeout = 100;
>> +             int spawn_cap = 4;
>> +
>> +             if (!no_more_task) {
>> +                     for (i = 0; i < spawn_cap; i++) {
>> +                             int code;
>> +                             if (pp->nr_processes == pp->max_processes)
>> +                                     break;
>> +
>> +                             code = pp_start_one(pp);
>> +                             if (!code)
>> +                                     continue;
>> +                             if (code < 0) {
>> +                                     pp->shutdown = 1;
>> +                                     kill_children(pp, SIGTERM);
>> +                             }
>> +                             no_more_task = 1;
>> +                             break;
>> +                     }
>> +             }
>> +             if (no_more_task && !pp->nr_processes)
>> +                     break;
>
> I may have comments on other parts of this patch, but I noticed this
> a bit hard to read while reading the end result.
>
> Losing the outer "if (!no_more_task)" and replacing the above with
>
>         for (no_more_task = 0, i = 0;
>              !no_more_task && i < spawn_cap;
>              i++) {
>                 ... do things that may or may not set
>                 ... no_more_task
>         }
>         if (no_more_task && ...)
>                 break;
>
> would make it clear that regardless of spawn_cap, no_more_task is
> honored.
>
> Also I think that having the outer "if (!no_more_task)" and not
> having "no_more_task = 0" after each iteration is buggy.  Even when
> next_task() told start_one() that it does not have more tasks for
> now, as long as there are running processes, it is entirely plausible
> that next call to next_task() can return "now we have some more task
> to do".
>
> Although I think it would make it unsightly, if you want to have the
> large indentation that protects the spawn_cap loop from getting
> entered, the condition would be
>
>         if (!pp->shutdown) {
>                 for (... spawn_cap loop ...) {
>                         ...
>                 }
>         }
>
> That structure could make sense.  But even then I would probably
> write it more like
>
>         ...
>         int spawn_cap = 4;
>
>         pp = pp_init(...);
>         while (1) {
>                 int no_more_task = 0;
>
>                 for (i = 0;
>                      !no_more_task && !pp->shutdown && i < spawn_cap;
>                      i++) {
>                         ...
>                         code = start_one();
>                         ... set no_more_task to 1 as needed
>                         ... set pp->shutdown to 1 as needed
>                 }
>                 if (no_more_task && !pp->nr_processes)
>                         break;
>                 buffer_stderr(...);
>                 output(...);
>                 collect(...);
>         }

That is indeed better to read.
Though if we reset `no_more_task` each iteration of the loop by
having its declaration inside the loop, the condition for exiting the
loop needs to be:

    if ((no_more_task || pp->shutdown) && !pp->nr_processes)
        break;

for correctness.

When looking at that condition, I realized that I implicitly assumed
the workflow, where get_next_task returns 0 intermittently, to be a
second class citizen. I reworded the documentation as well there.

>
> That is, you need to have two independent conditions that tell you
> not to spawn any new task:
>
>  (1) You called start_one() repeatedly and next_task() said "nothing
>      more for now", so you know calling start_one() one more time
>      without changing other conditions (like draining output from
>      running processes and culling finished ones) will not help.
>
>      Letting other parts of the application that uses this scheduler
>      loop (i.e. drain output, cull finished process, etc.) may
>      change the situation and you _do_ need to call start_one() when
>      the next_task() merely said "nothing more for now".
>
>      That is what no_more_task controls.
>
>  (2) The application said "I want the system to be gracefully shut
>      down".  next_task() may also have said "nothing more for now"
>      and you may have set no_more_task in response to it, but unlike
>      (1) above, draining and culling must be done only to shut the
>      system down, the application does not want new processes to be
>      added.  You do not want to enter the spawn_cap loop when it
>      happens.
>
>      That is what pp->shutdown controls.
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html