Re: [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning

Jonathan Nieder <jrnieder@xxxxxxxxx> · Fri, 19 Feb 2016 16:32:24 -0800

Stefan Beller wrote:
> On Fri, Feb 19, 2016 at 3:07 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:

[...]
>>> +             strbuf_addf(err, "Skipping submodule '%s'\n",
>>> +                         displaypath);
>>
>> Does the caller expect a newline at the end of err?
>>
>> In the refs code that uses an err strbuf, the convention is to
>> not end the message with a newline.  That way, a function like
>> die() can insert a newline and messages are guaranteed to be
>> newline-terminated even if someone is sloppy and does the wrong thing
>> when generating an error.
>
> Oh! So we need to add new lines ourselves here.

Now I'm confused.  The code in this patch is inconsistent.  I was
recommending consistently leaving out the \n.

It looks like pp_start_one takes the content of err without adding
a \n.  That's a bug in pp_start_one and pp_collect_finished IMHO.

For example, default_task_finished and default_start_failure don't
put a newline don't put a newline at the end of the message.  I
don't think that's a bug --- they're doing the right thing, but
pp_start_one and pp_collect_finished is just not handling it
correctly.

>>> +             if (pp->reference)
>>> +                     argv_array_push(&cp->args, pp->reference);
>>> +             if (pp->depth)
>>> +                     argv_array_push(&cp->args, pp->depth);
>>
>> What does 'cp' stand for mean?  Would a name like 'child' work?
>
> child process ? (any code which had a struct child_process referred to
> it as *cp)

Output from 'git grep --F -e "child_process *"' finds variables with
various names, corresponding to what kind of child it is.

[...]
>> Is this the 'list' subcommand?
>
> no. :(

No problem --- that's what review is for.

[...]
>>> +     if (pp.print_unmatched) {
>>> +             printf("#unmatched\n");
>>
>> I'm still confused about this.  I think a comment where
>> 'print_unmatched' is declared would probably clear it up.
>
> Probably this is a too literal translation from shell to C.
> By printing #unmatched the shell on the other end of the pipe
> of this helper program knows to just stop and error out.
>
> So this is telling the downstream program to stop.
>
> Maybe we can name the variable 'quickstop' ?
> 'abort_and_exit' ?

Interesting.

Would it work for the helper to exit at that point with nonzero status?

Lacking "set -o pipefail", it's a little messy to handle in the shell,
but it's possible:

	{
		git submodule--helper \
			--foo \
			--bar \
			--baz ||
		... handle error, e.g. by printing something
		that the other end of the pipe wants to see ...
	} |
	...

[...]
>> (just curious) why are these saved up and printed all at once instead
>> of being printed as it goes?
>
> To have a clean abort path, i.e. we need to be done with all the parallel stuff
> before we start on doing local on-disk stuff.

Interesting.

That's subtle.  Probably worth a comment so people know not to break
it.  (E.g.

	/*
	 * We saved the output and splat it all at once now.
	 * That means:
	 * - the listener does not have to interleave their (checkout)
	 *   work with our fetching.  The writes involved in a
	 *   checkout involve more straightforward sequential I/O.
	 * - the listener can avoid doing any work if fetching failed.
	 */

).

Thinking out loud: the other side could write their output to a
temporary file (e.g.  under .git) and save us the trouble of buffering
it.  But the list of submodules and statuses is not likely to be huge
--- buffering doesn't hurt much.

[...]
> In a next step, we can do the checkout in parallel if there is nothing to assume
> to lead to trouble. i.e. update strategy is checkout and the submodule is
> in a clean state at the tip of a branch.

Somebody tried parallelizing file output during checkout and found it
sped up the checkout on some systems by a small amount.  But then
later operations to read back the files were much slower.  It seems
that the output pattern affected the layout of the files on disk in a
bad way.

I'm not too afraid.  Just saying that the benefit of parallel checkout
would be something to measure.

A bigger worry is that checkout might be I/O bound and not benefit
much from parallelism.

> All problems which need to be solved by the user should be presented in
> a sequential way, i.e. present one problem and then full stop.
> That seems to be more encouraging as if we'd throw a "Here are a dozen
> broken submodule repositories which we expect the user fixes up".

It depends on the problem --- sometimes people want to see a list of
errors and fix them all (hence tools like "make -k").  I agree that
stop-on-first-error is a good default and that it's not worth worrying
about introducing --keep-going until someone asks for it.

Thanks for your thoughtfulness,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html