Glen Choo <chooglen@xxxxxxxxxx> writes: > item = string_list_insert(changed, name); > - if (!item->util) > + if (item->util) > + cs_data = item->util; > + else { > item->util = xcalloc(1, sizeof(struct changed_submodule_data)); > - cs_data = item->util; > + cs_data = item->util; > + cs_data->super_oid = commit_oid; > + cs_data->path = xstrdup(p->two->path); > + } I do not quite get this change. collect_changed_submodules() walks a range of revisions in the superproject, doing an equivalent of "git log --raw" and feeding the differences to this callback function. The above code looks at the path and uses the "changed" string list to record which submodule was modified, what commit in the submodule is needed, etc. What happens when the range has more than one change to the same submodule? cs_data has only one room for recording .super_oid (which commit in the superproject touches the submodule) and .path (where in the superproject's tree the submodule exists). "git mv" of a submodule might be rare and it may not hurt too much that only a single .path can be kept, but it looks somewhat iffy. > oid_array_append(&cs_data->new_commits, &p->two->oid); At least, we are not losing any submodule commit even when the same submodule is touched more than once by the superproject, but it is dubious why we have cs_data.super_oid and cs_data.path in the first place. How are they used, or are they something that seemed useful when the code was first written but it turned out that they weren't and left unused? Or do we need to make cs_data an array of 3-tuple { .super_oid, .submodule_oid, .path } for each submodule name?