On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 3/24/2020 3:15 AM, Elijah Newren wrote: > > Hi Matheus, > > > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares ... > >> Something I'm not entirely sure in this patch is how we implement the > >> mechanism to honor sparsity for the `git grep <commit-ish>` case (which > >> is treated in the grep_tree() function). Currently, the patch looks for > >> an index entry that matches the path, and then checks its skip_worktree > > > > As you discuss below, checking the index is both wrong _and_ costly. > > I'm not sure why checking the index is _wrong_, but I agree about the > performance cost. Let's say there are two directories, dir1 and dir2. Over time, there have existed a total of six files: dir1/{a,b,c} dir2/{d,e,f} At the current time, there are only four files in the index: dir1/{a,b} dir2/{d,e} And the user has done a `git sparse-checkout set dir2` and then at some point later run `git grep OTHERCOMMIT foobar`. What happens? Well, since we're in a sparse checkout, we should only search the relevant paths within OTHERCOMMIT for "foobar". Let's say we attempt to figure out the "relevant paths" using the index. We can tell that dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them. dir1/c is untracked -- what do we do with it? Include it? Exclude it? Carrying on with the other files, dir2/d and dir2/e are tracked and !SKIP_WORKTREE so we search them. dir2/f is untracked -- what do we do with it? Include it? Exclude it? We're left without the necessary information to tell whether we should search OTHERCOMMIT's dir1/c and dir2/f if we consult the index. Any decision we make is going to be wrong for one of the two paths. If we instead do not attempt to consult the index (which corresponds to a version close to HEAD) in order to ask questions about the completely different OTHERCOMMIT, but instead use the sparsity patterns to query whether those files/directories are interesting, then we get the right answer. The index can only be consulted for the right answer in the case of --cached; in all other cases (including OTHERCOMMIT == HEAD), we should use the sparsity patterns. In fact, we could also use the sparsity patterns in the case of --cached, it's just that for that one particular case consulting the index will also give the right answer.