On Tue, Mar 24, 2020 at 4:15 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Mon, Mar 23, 2020 at 11:12 PM Matheus Tavares > <matheus.bernardino@xxxxxx> wrote: > > > > Something I'm not entirely sure in this patch is how we implement the > > mechanism to honor sparsity for the `git grep <commit-ish>` case (which > > is treated in the grep_tree() function). Currently, the patch looks for > > an index entry that matches the path, and then checks its skip_worktree > > As you discuss below, checking the index is both wrong _and_ costly. > You should use the sparsity patterns; Stolee did a lot of work to make > those correspond to simple hashes you could check to determine whether > to even walk into a subdirectory. So, O(1). Yeah, that's "only" cone > mode but the non-cone sparsity patterns were a performance nightmare > waiting to rear its ugly head. We should just try to encourage > everyone to move to cone mode, or accept the slowness they get without > it. OK, makes sense. And your reply to Stolee, later in this thread, made it clearer for me why checking the index is not only costly but also wrong. Thanks for the great explanation! I will use the sparsity patterns directly, in the next iteration. > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 99e2685090..52ec72a036 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, > > const struct pathspec *pathspec, int cached); > > static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, > > struct tree_desc *tree, struct strbuf *base, int tn_len, > > - int check_attr); > > + int from_commit); > > I'm not familiar with grep.c and have to admit I don't know what > "check_attr" means. Slightly surprised to see you replace it, but > maybe reading the rest will explain... ... >> if (S_ISREG(entry.mode)) { >> hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, >> - check_attr ? base->buf + tn_len : NULL); >> + from_commit ? base->buf + tn_len : NULL); > > Sadly, this doesn't help me understand check_attr or from_commit. > Could you clue me in a bit? Sure! The grep machinery can optionally look the .gitattributes file, to see if a given path has a "diff" attribute assigned to it. This attribute points to a diff driver in .gitconfig, which can specify many things, such as whether the path should be treated as a binary or not. The "check_attr" flag passed to grep_tree() tells the grep machinery if it should perform this attribute lookup for the paths in the given tree. I decided to replace it with "from_commit" because the only times we want an attribute lookup when grepping a tree, is when it comes from a commit. I.e., when the tree is the root. (The reasoning goes in the same lines as for why we only check sparsity patterns in git-grep for commit-ish objects: we cannot check pattern matching for trees which we are not sure to be rooted). Since "knowing if the tree is a root or not" is useful in grep_tree() for both sparsity checks and attribute checks, I thought we could use a single "from_commit" variable instead of "check_attr" and "check_sparsity", which would always have matching values. But on second thought, I could maybe rename the variable to something as "is_root_tree" or add a comment explaining the usage of "from_commit". (I'm not a big fan of "is_root_tree", thought, because we could give a root tree to grep_tree() but not really know it.) > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh > > new file mode 100755 > > index 0000000000..fccf44e829 > > --- /dev/null > > +++ b/t/t7817-grep-sparse-checkout.sh ... > > +test_expect_success 'setup' ' > > + echo "text" >a && > > + echo "text" >b && > > + mkdir dir && > > + echo "text" >dir/c && > > + git add a b dir && > > + git commit -m "initial commit" && > > + git tag -am t-commit t-commit HEAD && > > + tree=$(git rev-parse HEAD^{tree}) && > > + git tag -am t-tree t-tree $tree && > > + cat >.git/info/sparse-checkout <<-EOF && > > + /* > > + !/b > > + !/dir > > + EOF > > + git sparse-checkout init && > > Using `git sparse-checkout init` but then manually writing to > .git/info/sparse-checkout? Seems like it'd make more sense to use > `git sparse-checkout set` than writing the patterns directly yourself. > Also, would prefer to have the examples use cone mode (even if you > have to add subdirectories), as it makes the testcase a bit easier to > read and more performant, though neither is a big deal. OK, I will make use of the builtin here. I will also use the cone mode (and leave one test without it, as Stolee suggested later in this thread). > > +test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' > > I think the test is fine but the title seems misleading. "outside" > and "inside" aren't defined because <tree-ish> isn't known to be > rooted, meaning we have no way to apply the sparsity patterns. So > perhaps just 'grep <tree-ish> should ignore sparsity patterns'? Right! "should ignore sparsity patterns" is a much better name, thanks. Thanks a lot for the thoughtful review and comments!