Re: [PATCHv3 06/13] run-command: add an asynchronous parallel child processor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stefan Beller <sbeller@xxxxxxxxxx> writes:

> On Tue, Sep 22, 2015 at 2:41 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>
>>> On Tue, Sep 22, 2015 at 12:53 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>>>
>>>>> So how would you find out when we are done?
>>>>
>>>>         while (1) {
>>>>                 if (we have available slot)
>>>>                         ask to start a new one;
>>>>                 if (nobody is running anymore)
>>>>                         break;
>>>>                 collect the output from running ones;
>>>>                 drain the output from the foreground one;
>>>>                 cull the finished process;
>>>>         }
>>>>
>>>
>>> Personally I do not like the break; in the middle of
>>> the loop, but that's personal preference, I'd guess.
>>> I'll also move the condition for (we have available slot)
>>> back inside the called function.
>>>
>>> So I am thinking about using this in the reroll instead:
>>>
>>>     run_processes_parallel_start_as_needed(&pp);
>>>     while (pp.nr_processes > 0) {
>>>         run_processes_parallel_buffer_stderr(&pp);
>>>         run_processes_parallel_output(&pp);
>>>         run_processes_parallel_collect_finished(&pp);
>>>         run_processes_parallel_start_as_needed(&pp);
>>>     }
>>
>> If you truly think the latter is easier to follow its logic (with
>> the duplicated call to the same function only to avoid break that
>> makes it clear why we are quitting the loop,
>
> I dislike having the call twice, too.
> ...
>> and without the
>> explicit "start only if we can afford to"), then I have to say that
>> our design tastes are fundamentally incompatible.
> ...
> I'll think about that.

Don't waste too much time on it ;-)  This is largely a taste thing,
and taste is very hard to reason about, understand, teach and learn.

Having said that, if I can pick one thing that I foresee to be
problematic the most (aside from these overlong names of the
functions that are private and do not need such long names), it is
that "start as many without giving any control to the caller"
interface.  I wrote "start *a* new one" in the outline above for a
reason.

Even if you want to utilize a moderate number of processes, say 16,
working at the steady state, I suspect that we would find it
suboptimal from perceived latency point of view, if we spin up 16
processes all at once at the very beginning before we start to
collect output from the designated foreground process and relay its
first line of output.  We may want to be able to tweak the caller to
spin up just a few first and let the loop ramp up to the full blast
gradually so that we can give an early feedback to the end user.
That is not something easy to arrange with "start as many without
giving control to the caller" interface.  We probably will discover
other kinds of scheduling issues once we get familiar with this
machinery and would find need for finer grained control.

And I consider such a ramp-up logic shouldn't be hidden inside the
"start_as_needed()" helper function---it is the way how the calling
loop wants its processes started, and I'd prefer to have the logic
visible in the caller's loop.

But that is also largely a "taste" thing.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]