On Sat, May 9, 2020 at 9:23 PM Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> wrote: > > On Sat, May 9, 2020 at 9:42 PM Matheus Tavares > <matheus.bernardino@xxxxxx> wrote: > > > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > index 3bd67082eb..8509694bf1 100755 > > --- a/t/t7817-grep-sparse-checkout.sh > > +++ b/t/t7817-grep-sparse-checkout.sh > > @@ -63,12 +63,28 @@ test_expect_success 'setup' ' > > test_path_is_file sub/B/b > > ' > > > > +# The two tests bellow check a special case: the sparsity patterns exclude '/b' > > +# and sparse checkout is enable, but the path exists on the working tree (e.g. > > +# manually created after `git sparse-checkout init`). In this case, grep should > > +# honor --restrict-to-sparse-paths. > > I just want to highlight a small thing that I forgot to comment on: > Elijah and I had already discussed about --restrict-to-sparse-paths > being relevant in grep only with --cached or when a commit-ish is > given. But it had not occurred to me, before, the possibility of the > special case mentioned above. I.e. when searching in the working tree > and a path that should be excluded by the sparsity patterns is > present. In this patch, I let --restrict-to-sparse-paths control the > desired behavior for grep in this case too. But please, let me know if > that doesn't seem like a good idea. Wow, that is an interesting edge case. But it can come up during a merge or rebase or checkout -m, could be manually changed by various plumbing commands, and might just not be enforced well in various areas of the system (see e.g. [1]). Perhaps the most interesting case, given recent discussion, is submodules -- those might be left in the working tree despite not matching sparsity paths. So, should `git -c sparse.restrictCmds=true grep PATTERN` look at these paths or not? Currently, you've chosen contradictory answers -- yes to submodules, and no to other entries. I'm not certain here, but I've given it a little thought and think there's a few things to take into consideration: Users are used to the fact that grep -r PATTERN * searches existing files for PATTERN. If you delete a file, then a subsequent grep isn't going to search through it. Similarly, git grep is billed as a grep which limits searches to tracked files, thus they expect git grep PATTERN to search for files in their working copy but limiting it to files which are tracked. From this angle, I think users would be surprised if `git grep` searched through deleted files, and they would also be surprised if it ignored tracked and present files. That is a basic answer, but let's go a bit further. Since git grep also has history at its disposal, it has more options. For example: git grep REVISION PATTERN means to search through all tracked files (those are the only kinds that are recorded in revisions anyway) as of REVISION for the given PATTERN, without checking it out. Users probably expect this to behave the same as: git checkout REVISION git grep PATTERN and since checkout pays attention to sparsity rules, this is why we'd want to have both "git grep PATTERN" and "git grep REVISION PATTERN" pay attention to sparsity rules. When we think in terms of "git grep REVISION PATTERN" as an optimized version of "git checkout REVISION && git grep PATTERN" it puts us in the frame of mind of asking the following question: For each path, would it be marked as SKIP_WORKTREE if we were to check it out right now? If so, we should skip it for the grepping. Usually, the SKIP_WORKTREE bit is set for files if and only if they don't match the sparsity patterns. Also, we can't use the SKIP_WORKTREE bit of the current index to decide whether to grep through an old REVISION, because there are paths that exists in the old revision that don't exist in the current index. The sparsity rules are the only things that can tell us whether such a path would be marked as SKIP_WORKTREE if we were to check it out. So it makes sense to use the sparsity patterns when looking at REVISIONS. When dealing with the current worktree, we can check SKIP_WORKTREE directly. Usually that'll give the same answer as asking the sparsity rules but as per [1] the two aren't always identical. Rather than asking "Would we mark this as SKIP_WORKTREE if we were to checkout this version right now?", perhaps we should ask "Since we have this version checked out right now, let's just check the path directly. Is it marked as SKIP_WORKTREE?". Does that sound reasonable? [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/