Re: [WIP/PATCH 3/3] submodule: helper to run foreach in parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 21, 2015 at 12:23 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> +struct submodule_args {
>> +     const char *name;
>> +     const char *path;
>> +     const char *sha1;
>> +     const char *toplevel;
>> +     const char *prefix;
>> +     const char **cmd;
>> +     struct submodule_output *out;
>> +     sem_t *mutex;
>> +};
>
> I do not see what submodule_output looks like in the patch, but I
> guess you are capturing _all_ output from a task and emitting
> everything at the end, after the task is done?

It was a leftover and will be removed. Sorry for wasting your time.
So jrnieder and me had a discussion on how to do get the output right.

And we had 2 designs:

1) The worker thread acquires a mutex to be allowed to write to
  stdout. This makes early output easy for each thread in case of
  error messages. It also requires less resources than 2).
  Also the expected code complexity is lower as the decisions
  are made locally.

2) Have a single handler which deals with all output of all tasks.
  This handler would `select` on a huge number of pipes from
  all the tasks, (so we would require a lot of pipes). And this central
  handler (which would be submodule_output in this case) would
  take care of having no intermingled racy output from tasks, with
  prefixes and all bells and whistles. This handler could also
  give a progress meter (300 out of 500 submodules updated already)
  for all threads, or when the last task is running switch to piping
  the output of that task to stdout in real time, so you'd get progress
  of the last task as you're already used to.

The outcome of the discussion was to split the worker pool/load
distribution from the output handling in any case as there is no need
to couple them. In my very first designs I had the output handling
as part of the asynchronous worker pool.

This RFC implements 1), that's why there is only the mutex.
I plan on enhancing this to let the last task output in real time (no buffering)
as well as another counter "task n/m completed" after each task
is done.

>
> I have to wonder if that is what people would expect and more
> importantly if that is the most useful.  I am sympathetic to the
> desire to avoid the output totally mixed up from different processes
> emitting things in an uncoordinated manner, and "slurp everything
> and then show everything in one go" is certainly _one_ way to do so,
> but early feedback also counts.  Besides, because the order in which
> tasks are dispatched and completed is unpredictable, you cannot
> expect a machine parseable output _without_ assuming some help from
> the command invoked by each task (e.g. by prefixing the task's output
> with some string that identifies which submodule the output is about).

I was very focused on the "submodule foreach" output, which (in case of
no -q given), displays a

    "Entering %s"

as the first line for each finished task. I suspect that is not enough to
make it a good machine parseable output. So I will prefix each
line with a running number of the task. Maybe like this:

  [001/500] Entering 'foo/bar'
  [001/500] Here goes the text for task foo/bar
  [001/500] Here goes another text for task foo/bar
  [002/500] Entering 'foo/baz'
  [002/500] Here goes the text for task foo/baz
  [002/500] Here goes another text for task foo/bar
  [003/500] Entering 'foo/bla'
  ...

This will make the output for each task distinguishable from
other tasks as well offering some sort of progress meter.
(The fewer submodules you have the less fine grained
the progress bar becomes)

Jonathan suggested to add a capability to the git protocol for a
machine readable progress meter in another channel. so we do not
need to parse the current output Counting/Compressing/Transfer/etc

>
> Once you assume that the command is _aware_ that it needs to help
> the foreach-parallel infrastructure so that the user can sift their
> collective outputs to make sense of them, wouldn't a line-buffered
> intermixing also acceptable, and wouldn't it be a much lower impact
> approach to solve the same problem?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]