Stefan Beller <sbeller@xxxxxxxxxx> writes: > We could also offer more access to the pp machinery and an implementation for > (2) might look like this: > > static void fictious_start_failure(void *data, > void *pp, > struct child_process *cp, > struct strbuf *err) > { > struct mydata *m = data; > > if (m->failstrategy == 1) > ; /* nothing here */ > else if (m->failstrategy == 2) > killall_children(pp); That looks nice and clean in theory, but I highly doubt that it is a good idea to have killall_children(pp) in a client code of the API like this, especially if you are envisioning that that function merely goes through the list of children and does kill on the processes. If killall_children(pp) is supplied by the API, it makes things less insane (now it can know and keep up with the evolution of the API implementation detail, such as how buffered output are kept and such), but still such an API constrains the structure of the overall scheduling loop and the helper functions that live on the API side. You need to make absolutely sure that calling killall_children() is something that can be sanely done from inside start_failure() callback, for example. If you signal the "emergency" with a return value, the callchain on the API side can choose to do the killing at a place it knows is safe to do so in a more controlled way. For example, the main loop of the API side IIRC (I do not bother checking out 'pu' to read it as my working tree is busy on another topic right now) is while (1) { for (cnt = 0; cnt < 4 && we still have slots; cnt++) start_one(); collect_output(); drain_output_for_foreground(); wait_and_flush(); } Imagine that you have 0 or more processes running and start another iteration of this loop. The first task started by start_one() fails and the above fictitious_start_failure() calls killall_children() itself. What happens to the other three iteration of the inner loop? After existing ones are killed, it adds three more? And to prevent that from happening, you also need to tell your fictitious_next_task() that no more processes are desired. The client of the API is forced to coordinate across its multiple callback functions. And you did that to gain what? One major thing I can think of is that that way, the main scheduling loop does not have to know why after attempting to spawn but failing, fictitious_next_task() started saying "no more tasks". For somebody who is coming from "The main loop is a dumb bullet-list of things to do. All smarts are inside the callee", that might appear to be a good thing. But I do not think it is a good thing at all. That forces the reader of the code to not just follow the API layer but even down to its individual clients of the API to understand what is going on. If you propagate the return from start_failure() callback upwards, then the main scheduler would *know* that the client application does not want any more new tasks, and it does want to abort even the running tasks, so it will not call fictitious_next_task() in the first place after the client application declares an "emergency exit". And that would make the overall logic a lot easier to follow. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html