Glen Choo <chooglen@xxxxxxxxxx> writes: > Junio C Hamano <gitster@xxxxxxxxx> writes: > >>> +static struct fetch_task * >>> +get_fetch_task_from_changed(struct submodule_parallel_fetch *spf, >>> + const char **default_argv, struct strbuf *err) >>> +{ >> >>> @@ -1553,7 +1628,10 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err, >>> { >>> struct submodule_parallel_fetch *spf = data; >>> const char *default_argv = NULL; >>> - struct fetch_task *task = get_fetch_task(spf, &default_argv, err); >>> + struct fetch_task *task = >>> + get_fetch_task_from_index(spf, &default_argv, err); >>> + if (!task) >>> + task = get_fetch_task_from_changed(spf, &default_argv, err); >> >> Hmph, intersting. So if "from index" grabbed some submodules >> already, then the "from the changes in the superproject, we know >> these submodules need refreshing" is not happen at all? I am afraid >> that I am still not following this... > > Hm, perhaps the following will help: > > - get_next_submodule() is an iterator, specifically, it is a > get_next_task_fn passed to run_processes_parallel_tr2(). It gets > called until it is exhausted. Ahh, yeah, I totally forgot how we designed these things to work. Even though these functions have a loop, (1) they start iterating at the point where they left off in the last call, and (2) they return as soon as they find the first item in the loop, which should have stood out as a typical generator pattern, but somehow I missed these signs. > So in practice: > ... > - get_next_submodule() has now been exhausted and we are done. But my original question (based on my misunderstanding that a single call to these would grab all submodules that needs fetching by inspecting either the index or the history) still stands, doesn't it? Presumably the "history scan" part is because we assume that we already had all the necessary submodule commits to check out any superproject commits before this recursive fetch started. That is the reason why we do not scan the history behind the "old tips". We inspect only the history newer than them, leading to the "new tips", and try to grab all submodule commits that newly appear, to ensure that we can check out all the superproject commits we just obtained and have no missing submodule commits necessary to do so. Combined with the assumption on the state before this fetch that we had all necessary submodule commits to check out the superproject commits up to "old tips", we maintain the invariant that we can check out any superproject commits recursively, no? If we are doing so, especially with this series where we do the "history scan" to complete submodule commits necessary for all new commits in the superproject, regardless of the branch being checked out in the superproject, why do we still need to scan the index to ensure that the current checkout can recurse down the submodule without hitting a missing commit? The only case the "index scan" may make a difference is when the assumption, the invariant that we can check out any superproject commits recursively, did not hold before we started the fetch, no?