Glen Choo <chooglen@xxxxxxxxxx> writes: > It would be good if we could avoid mixing unrelated information sources > in "struct submodule_status_util", since a) this makes it very tightly > coupled to run_diff_files() and b) it causes us to repeat ourselves in > the same function (.changed = changed, record_file_diff()). > > The only reason why the code looks this way right now is that > match_stat_with_submodule() sets defer_submodule_status based on whether > or not we should ignore the submodule, and this eventually tells > get_submodule_status() what submodules it needs to care about. But, > deciding whether to spawn a subprocess for which submodule is exactly > what the .get_next_task member is for. > >> diff --git a/submodule.c b/submodule.c >> index 426074cebb..6f6e150a3f 100644 >> --- a/submodule.c >> +++ b/submodule.c >> @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) >> return dirty_submodule; >> } >> >> +static struct status_task * >> +get_status_task_from_index(struct submodule_parallel_status *sps, >> + struct strbuf *err) >> +{ >> + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { >> + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; >> + struct status_task *task; >> + >> + if (!verify_submodule_git_directory(util->path)) >> + continue; > > So right here, we could use the "check if this submodule should be > ignored" logic form match_stat_with_submodule() to decide whether or not > to spawn the subprocess. IOW, I am advocating for > get_submodules_status() to be a parallel version of > match_stat_with_submodule() (not a parallel version of > is_submodule_modified() that shuttles extra information). It turns out to be quite difficult to implement a parallel match_stat_with_submodule(): a) we can't remove it because it still has another caller b) its internals are quite hard to refactor: one conditional arm depends on "changed", which is set by calling ie_match_stat(), which in turn requires the "struct stat" to have already been lstat()-ed... So even though this series adds a lot, it is just about as minimally invasive as possible. I suspect that there are some possible cleanups down the line, e.g. is_submodule_modified() is rightfully only called by diff-lib.c , so I think it should be a static function there. And once we move that, we can make our parallel function static, and then we don't have to worry about tight coupling to run_diff_files(). To keep the range-diff manageable, that can be left for a future cleanup though.