Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 03 Mar 2022 18:37:52 -0800

Glen Choo <chooglen@xxxxxxxxxx> writes:

>  		item = string_list_insert(changed, name);
> -		if (!item->util)
> +		if (item->util)
> +			cs_data = item->util;
> +		else {
>  			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
> -		cs_data = item->util;
> +			cs_data = item->util;
> +			cs_data->super_oid = commit_oid;
> +			cs_data->path = xstrdup(p->two->path);
> +		}

I do not quite get this change.

collect_changed_submodules() walks a range of revisions in the
superproject, doing an equivalent of "git log --raw" and feeding the
differences to this callback function.  The above code looks at the
path and uses the "changed" string list to record which submodule
was modified, what commit in the submodule is needed, etc.

What happens when the range has more than one change to the same
submodule?  cs_data has only one room for recording .super_oid
(which commit in the superproject touches the submodule) and .path
(where in the superproject's tree the submodule exists).  "git mv"
of a submodule might be rare and it may not hurt too much that only
a single .path can be kept, but it looks somewhat iffy.

>  		oid_array_append(&cs_data->new_commits, &p->two->oid);

At least, we are not losing any submodule commit even when the same
submodule is touched more than once by the superproject, but it is
dubious why we have cs_data.super_oid and cs_data.path in the first
place.

How are they used, or are they something that seemed useful when the
code was first written but it turned out that they weren't and left
unused?

Or do we need to make cs_data an array of 3-tuple { .super_oid,
.submodule_oid, .path } for each submodule name?