Re: run-command: output owner picking strategy

William Duclot <william.duclot@xxxxxxxxxxxxxxxxxxxxxxx> · Fri, 20 May 2016 20:29:20 +0200 (CEST)

> When running in parallel we already may be out of order
> (relative to serial processing). See the second example in the
> commit message to produce a different order.

Right, I could (should) have understood that by myself.

> Consider we scheduled tasks to be run in 3 parallel processes:
> (As we NEEDSWORK comment only addresses the ouput selection,
> let's assume this is a fixes schedule, which we cannot alter.
> Which is true if we only change the code you quoted. That picks
> the process to output.)
> 
> [...]

> The output is produced by the current algorithm:
> (1) Start with process 1 (A) whose output will be live
> (2) Once A is done, flush all other done things, (B)
> (3) live output will be round robin, so process 2 (D)
> (4) Once D is done, flush all other done things (C, F, E)
>     in order of who finshed first
> 
> 
> (1) is uncontroversial. We have no information about tasks A,B,C,
>     so pick a random candidate. We hardcoded process 1 for now.
> 
> (2) also uncontroversial IMHO. There is not much we can do different.

Agreed

> (3) is what this NEEDSWORK comment is about. Instead of outputting D
>     we might have choosen C. (for $REASONS, e.g.: C is running longer than
>     D already, so we expect it to finish sooner, by assuming
>     any task takes the same expected time to finish. And as C
>     is expected to finish earlier than D, we may have smoother
>     output. "Less buffered bursts")
> 
> [...]
> 
> This seems to be better than the current behavior as we have more
> different tasks with "live" output, i.e. you see stuff moving.
> I made up the data to make the point though. We would need to use
> live data and experiment with different strategies to find a
> good/better solution.

We should probably settle on what is the behavior we want to obtain, 
before trying to find a strategy to implement (or approximate) it:
- Do we want to be as close as possible to a serial processing output? 
- Do we want to see as much live output as possible?

I do not think that being close to serial processing is a relevant 
behavior: we applied an arbitrary order to tasks when naming them for
explanations (A, B, C...), but the tasks aren't really sorted in any
way (and that's why the parallelization is relevant).Neither the user
nor git have any interest in getting these ouputs in a specific order.

Therefore, a "as much live output as possible" behavior would be more
sensible. But I wonder: is there a worthy benefit in optimizing the
output owner strategy? I'm not used to working with submodules, but I
don't think that having a great number of submodules is a common thing.
Basically: we could solve a problem, but is there a problem?
I'm not trying to bury this NEEDSWORK, I'd be happy to look into it if
need be!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html