Re: [PATCH 0/2] Another squash on run-command: add an asynchronous parallel child processor

Stefan Beller <sbeller@xxxxxxxxxx> · Fri, 25 Sep 2015 11:56:11 -0700

On Thu, Sep 24, 2015 at 6:08 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>>  * If you do not die() in start_failure_fn or return_value_fn, you
>>    don't want to write to stderr directly as you would destroy the fine
>>    ordering of the processes output. So make the err strbuf available in
>>    both these functions, and make sure the strbuf is appended to the
>>    buffered output in both cases.
>
> Another thing I noticed after re-reading the above is that we shared
> the thinking that dying in these is _the_ normal thing to do and
> continuing is an advanced and/or wierd setting.
>
> And I think it is wrong.  Suppose after spawning 15 tasks and while
> they are still running, you start the 16th one and it fails to stop.
> If your start-failure called die() to kill the controller, what
> happens to the 15 tasks that are already running?
>
> I think two sensible choices that start-failure and return-value can
> make are
>
>  (1) This one task failed, but that is OK.  Please let the other
>      tasks run [*1*].
>
>  (2) There is something seriously wrong with the whole world and I
>      declare an emergency.  Please kill the other ones and exit.

  (3) There is something wrong, such that I cannot finish my
      job, but I know the other 15 processes help towards the goal,
      so I want to let them live on until they are done. E.g: fetch submodules
      may want to take this strategy if it fails to start another sub
process fetching.

By having a return value indicating which strategy you want to pursue here,
we're making the design choice to have everything done monolithically
inside the pp machinery.

We could also offer more access to the pp machinery and an implementation for
(2) might look like this:

static void fictious_start_failure(void *data,
                                void *pp,
                                struct child_process *cp,
                                struct strbuf *err)
{
        struct mydata *m = data;

        if (m->failstrategy == 1)
                ; /* nothing here */
        else if (m->failstrategy == 2)
                killall_children(pp);
        else if (m->failstrategy == 3) {
                m->stop_scheduling_new_tasks = 1;
                redirect_children_to_dev_null(pp);
        else
                ...
}

By having the pointer to the pp struct passed around, we allow
for adding new callback functions to be added later to the
pp machinery, which may not be expressed via a return code.

>
> Dying in these callbacks do not achieve neither.  Perhaps make these
> two functions return bool (or enum if you already know a third
> sensible option, but otherwise bool is fine and the person who
> discovers the need for the third will turn it into enum) to signal
> which one of these two behaviours it wants?
>
> And the default handlers should stop dying, of course.
>
>
> [Footnote]
>
> *1* Because start-failure gets pp, it can even leave a note in it to
>     ask the next invocation of get-next to retry it if it chooses
>     to.  At this point in the design cycle, all we need to do is to
>     make sure that kind of advanced usage is possible with this
>     parallel-run-command API.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html