Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 11 Feb 2022 09:07:42 -0800

Glen Choo <chooglen@xxxxxxxxxx> writes:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>>> +static struct fetch_task *
>>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
>>> +			    const char **default_argv, struct strbuf *err)
>>> +{
>>
>>> @@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
>>>  {
>>>  	struct submodule_parallel_fetch *spf = data;
>>>  	const char *default_argv = NULL;
>>> -	struct fetch_task *task = get_fetch_task(spf, &default_argv, err);
>>> +	struct fetch_task *task =
>>> +		get_fetch_task_from_index(spf, &default_argv, err);
>>> +	if (!task)
>>> +		task = get_fetch_task_from_changed(spf, &default_argv, err);
>>
>> Hmph, intersting.  So if "from index" grabbed some submodules
>> already, then the "from the changes in the superproject, we know
>> these submodules need refreshing" is not happen at all?  I am afraid
>> that I am still not following this...
>
> Hm, perhaps the following will help:
>
> - get_next_submodule() is an iterator, specifically, it is a
>   get_next_task_fn passed to run_processes_parallel_tr2(). It gets
>   called until it is exhausted.

Ahh, yeah, I totally forgot how we designed these things to work.

Even though these functions have a loop, (1) they start iterating at
the point where they left off in the last call, and (2) they return
as soon as they find the first item in the loop, which should have
stood out as a typical generator pattern, but somehow I missed these
signs.

> So in practice:
> ...
> - get_next_submodule() has now been exhausted and we are done.

But my original question (based on my misunderstanding that a single
call to these would grab all submodules that needs fetching by
inspecting either the index or the history) still stands, doesn't it?

Presumably the "history scan" part is because we assume that we
already had all the necessary submodule commits to check out any
superproject commits before this recursive fetch started.  That is
the reason why we do not scan the history behind the "old tips".  We
inspect only the history newer than them, leading to the "new tips",
and try to grab all submodule commits that newly appear, to ensure
that we can check out all the superproject commits we just obtained
and have no missing submodule commits necessary to do so.  Combined
with the assumption on the state before this fetch that we had all
necessary submodule commits to check out the superproject commits up
to "old tips", we maintain the invariant that we can check out any
superproject commits recursively, no?

If we are doing so, especially with this series where we do the
"history scan" to complete submodule commits necessary for all new
commits in the superproject, regardless of the branch being checked
out in the superproject, why do we still need to scan the index to
ensure that the current checkout can recurse down the submodule
without hitting a missing commit?

The only case the "index scan" may make a difference is when the
assumption, the invariant that we can check out any superproject
commits recursively, did not hold before we started the fetch, no?