Re: [PATCH 05/12] git submodule update: Use its own list implementation.

Stefan Beller <sbeller@xxxxxxxxxx> · Fri, 16 Oct 2015 14:08:22 -0700

On Fri, Oct 16, 2015 at 2:02 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> Discussions turned out that we cannot parallelize the whole loop below
>> `git submodule--helper list` in `git submodule update`, because some
>> changes should be done only one at a time, such as messing up a submodule
>> and leave it up to the user to cleanup the conflicted rebase or merge.
>>
>> The submodules which are need to be cloned however do not expect to create
>> problems which require attention by the user one at a time, so we want to
>> parallelize that first.
>>
>> To do so we will start with a literal copy of `git submodule--helper list`
>> and port over features gradually.
>
> I am not sure what you mean by this.
>
> Surely, the current implementation of "update" does the fetching and
> updating as a single unit of task and iterate over these tasks, and
> we would rather want to instead have one iteration of submodules to
> do the fetching part (without doing other things that can fail and
> have to get attention of the end user), followed by another
> iteration that does the "other things", in order to get closer to
> the end goal of doing the fetch in parallel and then doing the
> remainder one-module-at-a-time sequencially.

I differentiated a bit more and moved the clone parts only.
Fetching should also be no problem. I initially assumed that to be a
problem too.

>
> I would imagine that the logical first step towards the end goal, if
> I understood you correctly, would be to split that single large loop
> that does a fetch and other things for a single module in each
> iteration into two, one that iterates and fetches all, followed by a
> new one that does the checkout/merge.

That was also one of the patch series I wrote (not sent to list)
1) split up into 2 phases
2) rewrite first phase in C
3) parallelize the first phase.

This series merges 1 and 2, so you don't have to review
the same functionality two times.

>
> What I do not understand is why that requires a different kind of
> enumerator (unless this is a kind of premature optimization, knowing
> that the set of modules iterated by these two loops are slightly
> different or something).

It is just moving all code before the clone step into the C part, so
we can call the clone in C.

>
>>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index bb8b2c7..d2d80e2 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -656,7 +656,7 @@ cmd_update()
>>       fi
>>
>>       cloned_modules=
>> -     git submodule--helper list --prefix "$wt_prefix" "$@" | {
>> +     git submodule--helper list-or-clone --prefix "$wt_prefix" "$@" | {
>>       err=
>>       while read mode sha1 stage sm_path
>>       do
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html