Re: [PATCH 7/8] fetch: fetch unpopulated, changed submodules

Glen Choo <chooglen@xxxxxxxxxx> · Mon, 14 Feb 2022 12:24:35 +0800

Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

>> @@ -1495,6 +1499,15 @@ get_fetch_task(struct submodule_parallel_fetch *spf,
>>  		if (!task)
>>  			continue;
>>  
>> +		/*
>> +		 * We might have already considered this submodule
>> +		 * because we saw it when iterating the changed
>> +		 * submodule names.
>> +		 */
>> +		if (string_list_lookup(&spf->seen_submodule_names,
>> +				       task->sub->name))
>> +			continue;
>
> [snip]
>> +		/*
>> +		 * We might have already considered this submodule
>> +		 * because we saw it in the index.
>> +		 */
>> +		if (string_list_lookup(&spf->seen_submodule_names, item.string))
>> +			continue;
>
> Hmm...it's odd that the checks happen in both places, when theoretically
> we would do one after the other, so this check would only need to be in
> one place. Maybe this is because of how we had to implement it (looping
> over everything every time when we get the next fetch task) but if it's
> easy to avoid, that would be great.

Yes, in order for the code to be correct, we only need this check once,
but I chose to check twice for defensiveness. That is, we avoid creating
implicit dependencies between the functions like "function A does not
consider whether a submodule has already been fetched, so it must always
be called before function B".

Perhaps there is another concern that overrides this? e.g. performance.