> * Why are we configuring an API behaviour via a global variable in > 21st century? I was mimicking how "ungroup" worked, but now that Avar mentions that pattern was for a quick regression fix, I can fix it to pass it in as a parameter. > * The name "task_finished" is mentioned, but it is unclear what it > is. Is it one of the parameters to run_process_parallel()? It is one of the callback functions passed in as a parameter to run_process_paraller(). I'll go ahead and clarify that. > * Is the effect of the new feature that task_finished callback is > called with the output, in addition to the normal output? I am > not sure why it is called "pipe". The task_finished callback may > be free to fork a child and send the received output from the > task to that child over the pipe, but that is what a client code > could do and is inappropriate to base the name of the mechanism, > isn't it? The output in task_finished callback, before pipe_output, either contains part of the output or the entire output of the child process, since the output is periodically collected into stderr and then reset. The intention of output I believe is for the caller to be able to add anything they would like to the end (this can be seen with functions like fetch_finished() in builtin/fetch.c). My intention with pipe_output is to guarantee that output contains the entire output of the child process so task_finished can utilize it. > > > @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n, > > int output_timeout = 100; > > int spawn_cap = 4; > > int ungroup = run_processes_parallel_ungroup; > > + int pipe_output = run_processes_parallel_pipe_output; > > struct parallel_processes pp; > > > > /* unset for the next API user */ > > run_processes_parallel_ungroup = 0; > > + run_processes_parallel_pipe_output = 0; > > > > pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb, > > ungroup); > > @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n, > > pp.children[i].state = GIT_CP_WAIT_CLEANUP; > > } else { > > pp_buffer_stderr(&pp, output_timeout); > > - pp_output(&pp); > > + if (!pipe_output) > > + pp_output(&pp); > > So, we do not send the output from the child to the regular output > channel when pipe_output is in effect. OK. > > > } > > code = pp_collect_finished(&pp); > > if (code) { > > And no other code changes? This is quite different from what I > expected from reading the proposed log message. > > Am I correct to say that under this new mode, we no longer flush any > output while the child task is running (due to the change in the > above hunk to omit calls to pp_output() during the run) and instead > keep accumulating in the strbuf, until the child task finishes, at > which time pp_collect_finished() will call task_finished callback. > > Even though the callback usually consumes the last piece of the > output since the last pp_output() call made during the normal > execution of the run_processes_parallel() loop, because we omitted > these calls, we have full output from the child task accumulated in > the children[].err strbuf. We may still not output .err for real, > as we may not be the output_owner, in which case we may only append > to .buffered_output member. > > I am puzzled simply because, if the above summary is correct, I do > not see how a word "pipe" have a chance to come into the picture. Ah I see what you mean here -- your summary is correct. Something like "buffer_output" would make much more sense. > I can sort of see that in this mode, we would end up buffering the > entire output from each child task into one strbuf each, and can > avoid stalling the child tasks waiting for their turn to see their > output pipes drained. But is this a reasonable thing to do? How do > we control the memory consumption to avoid having to spool unbounded > amount of output from child tasks in core, or do we have a good > reason to believe that we do not have to bother? You are correct that storing unbounded output doesn't seem like a good idea. One idea I have is to parse output during the periodic collection rather than waiting till the end. The other idea I had was to add another "git status --porcelain" option that would only output the necessary pieces of information so we wouldn't have to bother with worrying about unbounded output. Any other thoughts as to how I can workaround this? Thanks!