On Thu, May 21, 2020 at 2:52 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Thu, May 21, 2020 at 10:36 AM Matheus Tavares Bernardino > <matheus.bernardino@xxxxxx> wrote: > > > > On Thu, May 21, 2020 at 4:26 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > > > > > On Tue, May 12, 2020 at 5:17 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > > > > > Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> writes: > > > > > > > > > The idea behind not skipping gitlinks here was to be compliant with > > > > > what we have in the working tree. In 4fd683b ("sparse-checkout: > > > > > document interactions with submodules"), we decided that, if the > > > > > sparse-checkout patterns exclude a submodule, the submodule would > > > > > still appear in the working tree. The purpose was to keep these > > > > > features (submodules and sparse-checkout) independent. Along the same > > > > > lines, I think we should always recurse into initialized submodules in > > > > > > Sorry if I missed it in the code, but do you check whether the > > > submodule is initialized before descending into it, or do you descend > > > into it based on it just being a submodule? > > > > We only descend if the submodule is initialized. The new code in this > > patch doesn't do this check, but it is already implemented in > > grep_submodule() (which is called by grep_tree() and grep_cache() when > > a submodule is found). > > Good to know. To up the ante a bit: What if another branch has > directory that doesn't exist in HEAD or the current checkout, and > within that directory is a submodule. Would it be recursed into? In this case, `git grep --recurse-submodules <pattern> $branch` will recurse into the submodule, but only if it has already been initialized. I.e. if we have checked out to $branch, ran `git submodule init` and then checked out back. > What if it matched the sparsity paths? (Is it even possible to > recurse into it?) That's a great question. The idea that I tried to implement is to always recurse into _initialized_ submodules (even the ones excluded by the superproject's sparsity patterns) and, then, follow their own sparsity patterns inside. I'm not necessarily in favor (or against) this behavior, but this seemed to be the most compatible way with the design we describe in our docs: "If your sparse-checkout patterns exclude an initialized submodule, then that submodule will still appear in your working directory." (in git-sparse-checkout.txt) So, back to the original question, if you run `git grep --recurse-submodules <pattern> $branch` and $branch contains a submodule which was previously initialized, git-grep _would_ recurse into it, even if it (or its parent dir) was excluded. However, your question helped me notice an inconsistency in my patch: the behavior I just described is working for the full pattern set, but not in cone mode. That's because, in cone mode, we can mark the whole submodule's parent dir as excluded. Then, path_matches_pattern_list() will return NOT_MATCHED for the parent dir and we won't recurse into it, so we won't even get to the submodule's path to discover that it refers to a gitlink. Therefore, if we decide to keep the behavior of always recursing into submodules, we will need some extra work for the cone mode. I.e. grep_tree() will have to check if NOT_MATCHED directories contain submodules before discarding them, and recurse only into the submodules if so. As for the implementation, the first idea that came to my mind was to list the submodules' pathnames and do prefix matching for each submodule and NOT_MATCHED dir. But the places I've seen such submodule listings in the code base so far [1] seem to work only in the current branch. My second idea was to continue the tree walk when we hit NOT_MATCHED dir entries, but not doing any work, just looking for possible gitlinks to recurse into. I'm not sure if that could negatively affect the execution time, though. Does this seem like a good approach? Or is there another solution that I have not considered? Or even further, should we choose to skip the submodules in excluded paths? My only concern in this case is that it would be contrary to the design in git-sparse-checkout.txt. And the working tree grep and cached grep would differ even on a clean working tree. [1]: builtin/submodule--helper.c:module_list_compute() and submodule-config.c:config_from_gitmodules()