Junio C Hamano <gitster@xxxxxxxxx> writes: > Glen Choo <chooglen@xxxxxxxxxx> writes: > >> item = string_list_insert(changed, name); >> - if (!item->util) >> + if (item->util) >> + cs_data = item->util; >> + else { >> item->util = xcalloc(1, sizeof(struct changed_submodule_data)); >> - cs_data = item->util; >> + cs_data = item->util; >> + cs_data->super_oid = commit_oid; >> + cs_data->path = xstrdup(p->two->path); >> + } > > I do not quite get this change. > > collect_changed_submodules() walks a range of revisions in the > superproject, doing an equivalent of "git log --raw" and feeding the > differences to this callback function. The above code looks at the > path and uses the "changed" string list to record which submodule > was modified, what commit in the submodule is needed, etc. > > What happens when the range has more than one change to the same > submodule? cs_data has only one room for recording .super_oid > (which commit in the superproject touches the submodule) and .path > (where in the superproject's tree the submodule exists). "git mv" > of a submodule might be rare and it may not hurt too much that only > a single .path can be kept, but it looks somewhat iffy. Yes, I agree that it looks odd, which is why I added this comment to hopefully make it less opaque: + * (super_oid, path) allows the submodule config to be read from _some_ + * .gitmodules file. We store this information the first time we find a + * superproject commit that points to the submodule, but this is + * arbitrary - we can choose any (super_oid, path) that matches the + * submodule's name. I guess this only says that it is ok to store .super_oid and .path from any commit, but doesn't go in depth into _why_. It's ok because we only need (.super_oid, .path) because repo_submodule_init(..., path, treeish_name) maps these args to the submodule's name and gitdir (i.e. .git/modules/<name>). This means we don't worry about 'git mv' (super_oid's .gitmodules will tell us the correct name even if the path changed relative to some other commit), nor seeing the submodule more than once (it doesn't matter whose .gitmodules we look at so long as repo_submodule_init() derives the correct gitdir). And now that you've pointed this out, I realize that we could do away with (.super_oid, .path) altogether if we had a variant of repo_submodule_init() that takes the submodule name instead of (path, treeish_name). (We have a similar submodule_from_name(), but that only reads the submodule config, not a struct repository.) I would prefer not to introduce such a function so late into the review cycle, but I could clean this up later. >> oid_array_append(&cs_data->new_commits, &p->two->oid); > > At least, we are not losing any submodule commit even when the same > submodule is touched more than once by the superproject, but it is > dubious why we have cs_data.super_oid and cs_data.path in the first > place. On the hand, we actually need to record every submodule commit, so yes. > How are they used, or are they something that seemed useful when the > code was first written but it turned out that they weren't and left > unused? > > Or do we need to make cs_data an array of 3-tuple { .super_oid, > .submodule_oid, .path } for each submodule name? To conclude: - The changed_submodules string_list is basically a map that tells us, for a given submodule _name_, which commits we need to fetch and where repo_submodule_init() can read the submodule name from. - We only use cs_data as a string_list_item.util, and the string_list_item.string is the submodule name itself. - .new_commits tells us which commits to fetch. - .super_oid and .path tells repo_submodule_init() how to get the name of the submodule. So we don't need to make this a 3-tuple.