On Mon, Sep 21, 2015 at 6:08 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Stefan Beller <sbeller@xxxxxxxxxx> writes: > >> +void default_start_failure(void *data, >> + struct child_process *cp, >> + struct strbuf *err) >> +{ >> + int i; >> + struct strbuf sb = STRBUF_INIT; >> + >> + for (i = 0; cp->argv[i]; i++) >> + strbuf_addf(&sb, "%s ", cp->argv[i]); >> + die_errno("Starting a child failed:\n%s", sb.buf); > > Do we want that trailing SP after the last element of argv[]? > Same question applies to the one in "return-value". done > >> +static void run_processes_parallel_init(struct parallel_processes *pp, >> + int n, void *data, >> + get_next_task_fn get_next_task, >> + start_failure_fn start_failure, >> + return_value_fn return_value) >> +{ >> + int i; >> + >> + if (n < 1) >> + n = online_cpus(); >> + >> + pp->max_processes = n; >> + pp->data = data; >> + if (!get_next_task) >> + die("BUG: you need to specify a get_next_task function"); >> + pp->get_next_task = get_next_task; >> + >> + pp->start_failure = start_failure ? start_failure : default_start_failure; >> + pp->return_value = return_value ? return_value : default_return_value; > > I would actually have expected that leaving these to NULL will just > skip pp->fn calls, instead of a "default implementation", but a pair > of very simple default implementation would not hrtut. Ok, I think the default implementation provided is a reasonable default, as it provides enough information in case of an error. > >> +static void run_processes_parallel_cleanup(struct parallel_processes *pp) >> +{ >> + int i; > > Have a blank between the decl block and the first stmt here (and > elsewhere, too---which you got correct in the function above)? done > >> + for (i = 0; i < pp->max_processes; i++) >> + strbuf_release(&pp->children[i].err); > >> +static void run_processes_parallel_start_one(struct parallel_processes *pp) >> +{ >> + int i; >> + >> + for (i = 0; i < pp->max_processes; i++) >> + if (!pp->children[i].in_use) >> + break; >> + if (i == pp->max_processes) >> + die("BUG: bookkeeping is hard"); > > Mental note: the caller is responsible for not calling this when all > slots are taken. > >> + if (!pp->get_next_task(pp->data, >> + &pp->children[i].process, >> + &pp->children[i].err)) { >> + pp->all_tasks_started = 1; >> + return; >> + } > > Mental note: but it is OK to call this if get_next_task() previously > said "no more task". > > The above two shows a slight discrepancy (nothing earth-breaking). I see. Maybe this can be improved by having the run_processes_parallel_start_as_needed call get_next_task and pass the information into the run_processes_parallel_start_one or as we had it before, combine these two functions again. > > I have this suspicion that the all-tasks-started bit may turn out to > be a big mistake that we may later regret. Don't we want to allow > pp->more_task() to say "no more task to run at this moment" implying > "but please do ask me later, because I may have found more to do by > the time you ask me again"? And this task would arise because the current running children produce more work to be done? So you would have a more_tasks() question. If that returns true get_next_task() must provide that next task? In case we had more work to do, which is based on the outcome of the children, we could just wait in get_next_task for a semaphore/condition variable from the return_value. Though that would stop progress reporting end maybe lock up the whole program due to pipe clogging. It seems to be a better design as we come back to the main loop fast which does the polling. Although I feel like it is over engineered for now. So how would you find out when we are done? * more_tasks() could have different return values in an enum (YES_THERE_ARE, NO_BUT_ASK_LATER, NO_NEVER_ASK_AGAIN) * There could be yet another callback more_tasks_available() and parallel_processing_should_stop() * Hand back a callback ourselfs [Call signal_parallel_processing_done(void*) when more_tasks will never return true again, with a void* we provide to more_tasks()] * ... > > That is one of the reasons why I do not think the "very top level is > a bulleted list" organization is a good idea in general. A good > scheduling decision can seldom be made in isolation without taking > global picture into account. > >> +static void run_processes_parallel_collect_finished(struct parallel_processes *pp) >> +{ >> + int i = 0; >> + pid_t pid; >> + int wait_status, code; >> + int n = pp->max_processes; >> + >> + while (pp->nr_processes > 0) { >> + pid = waitpid(-1, &wait_status, WNOHANG); >> + if (pid == 0) >> + return; >> + >> + if (pid < 0) >> + die_errno("wait"); >> + >> + for (i = 0; i < pp->max_processes; i++) >> + if (pp->children[i].in_use && >> + pid == pp->children[i].process.pid) >> + break; >> + if (i == pp->max_processes) >> + /* >> + * waitpid returned another process id >> + * which we are not waiting for. >> + */ >> + return; > > If we culled a child process that this machinery is not in charge > of, waitpid() in other places that wants to see that child will not > see it. Perhaps such a situation might even warrant an error() or > BUG()? Do we want a "NEEDSWORK: Is this a bug?" comment here at > least? > >> + if (strbuf_read_once(&pp->children[i].err, >> + pp->children[i].process.err, 0) < 0 && >> + errno != EAGAIN) >> + die_errno("strbuf_read_once"); > > Don't we want to read thru to the end here? The reason read_once() > did not read thru to the end may not have anything to do with > NONBLOCK (e.g. xread_nonblock() caps len, and it does not loop). right. > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html