Re: [PATCH 0/2] Another squash on run-command: add an asynchronous parallel child processor

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 25 Sep 2015 12:32:16 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> We could also offer more access to the pp machinery and an implementation for
> (2) might look like this:
>
> static void fictious_start_failure(void *data,
>                                 void *pp,
>                                 struct child_process *cp,
>                                 struct strbuf *err)
> {
>         struct mydata *m = data;
>
>         if (m->failstrategy == 1)
>                 ; /* nothing here */
>         else if (m->failstrategy == 2)
>                 killall_children(pp);

That looks nice and clean in theory, but I highly doubt that it is a
good idea to have killall_children(pp) in a client code of the API
like this, especially if you are envisioning that that function
merely goes through the list of children and does kill on the
processes.  If killall_children(pp) is supplied by the API, it makes
things less insane (now it can know and keep up with the evolution
of the API implementation detail, such as how buffered output are
kept and such), but still such an API constrains the structure of
the overall scheduling loop and the helper functions that live on
the API side.  You need to make absolutely sure that calling
killall_children() is something that can be sanely done from inside
start_failure() callback, for example.

If you signal the "emergency" with a return value, the callchain on
the API side can choose to do the killing at a place it knows is
safe to do so in a more controlled way.  For example, the main loop
of the API side IIRC (I do not bother checking out 'pu' to read it
as my working tree is busy on another topic right now) is

    while (1) {
    	for (cnt = 0; cnt < 4 && we still have slots; cnt++)
		start_one();

	collect_output();
	drain_output_for_foreground();
	wait_and_flush();
    }

Imagine that you have 0 or more processes running and start another
iteration of this loop.  The first task started by start_one() fails
and the above fictitious_start_failure() calls killall_children()
itself.  What happens to the other three iteration of the inner
loop?  After existing ones are killed, it adds three more?

And to prevent that from happening, you also need to tell your
fictitious_next_task() that no more processes are desired.  The
client of the API is forced to coordinate across its multiple
callback functions.

And you did that to gain what?  One major thing I can think of is
that that way, the main scheduling loop does not have to know why
after attempting to spawn but failing, fictitious_next_task()
started saying "no more tasks".  For somebody who is coming from
"The main loop is a dumb bullet-list of things to do.  All smarts
are inside the callee", that might appear to be a good thing.

But I do not think it is a good thing at all.  That forces the
reader of the code to not just follow the API layer but even down to
its individual clients of the API to understand what is going on.

If you propagate the return from start_failure() callback upwards,
then the main scheduler would *know* that the client application
does not want any more new tasks, and it does want to abort even the
running tasks, so it will not call fictitious_next_task() in the
first place after the client application declares an "emergency
exit".

And that would make the overall logic a lot easier to follow.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html