On 3/24/2020 12:16 PM, Elijah Newren wrote: > On Tue, Mar 24, 2020 at 8:12 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: >> >> On 3/24/2020 3:15 AM, Elijah Newren wrote: >>> Hi Matheus, >>> >>> On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > ... >>>> Something I'm not entirely sure in this patch is how we implement the >>>> mechanism to honor sparsity for the `git grep <commit-ish>` case (which >>>> is treated in the grep_tree() function). Currently, the patch looks for >>>> an index entry that matches the path, and then checks its skip_worktree >>> >>> As you discuss below, checking the index is both wrong _and_ costly. >> >> I'm not sure why checking the index is _wrong_, but I agree about the >> performance cost. > > Let's say there are two directories, dir1 and dir2. Over time, there > have existed a total of six files: > dir1/{a,b,c} > dir2/{d,e,f} > At the current time, there are only four files in the index: > dir1/{a,b} > dir2/{d,e} > And the user has done a `git sparse-checkout set dir2` and then at > some point later run `git grep OTHERCOMMIT foobar`. What happens? > > Well, since we're in a sparse checkout, we should only search the > relevant paths within OTHERCOMMIT for "foobar". Let's say we attempt > to figure out the "relevant paths" using the index. We can tell that > dir1/a and dir2/a are marked as SKIP_WORKTREE so we don't search them. > dir1/c is untracked -- what do we do with it? Include it? Exclude > it? Carrying on with the other files, dir2/d and dir2/e are tracked > and !SKIP_WORKTREE so we search them. dir2/f is untracked -- what do > we do with it? Include it? Exclude it? > > We're left without the necessary information to tell whether we should > search OTHERCOMMIT's dir1/c and dir2/f if we consult the index. Any > decision we make is going to be wrong for one of the two paths. > > If we instead do not attempt to consult the index (which corresponds > to a version close to HEAD) in order to ask questions about the > completely different OTHERCOMMIT, but instead use the sparsity > patterns to query whether those files/directories are interesting, > then we get the right answer. The index can only be consulted for the > right answer in the case of --cached; in all other cases (including > OTHERCOMMIT == HEAD), we should use the sparsity patterns. In fact, > we could also use the sparsity patterns in the case of --cached, it's > just that for that one particular case consulting the index will also > give the right answer. Thanks! This helps a lot. -Stolee