This series is based on the discussions in [1]. The idea is to make git-grep (and other commands, in the future) be able to restrict their output to the sparsity patterns, when requested by the user. Main changes since v1: In patch 1: - Remove two unnecessary references in git-grep.txt, as they are in the same document. Added patch 2. In patch 3: - Match paths directly against the sparsity patterns, when grepping a given tree, instead of checking the index. - Better handle searches with --recurse-submodules when the superproject and/or the submodule have sparse checkout enabled. And add tests for cases like these. - In tests, use the builtin git-sparse-checkout instead of manually writting to the sparse-checkout file. - Add tests for grepping in cone mode. - Rename the previous 'from_commit' parameter and a test name, to be more meaningful. Note: it was suggested to change some of the tests in this patch to use cone mode. I ended up using both cone mode and full patterns, so that we could check that grep behaves correctly when submodules have different pattern rules than the superproject. I tried to leave the testing repo's structure simple, though, so that the tests remain well readable. In patch 4: - Move the configuration that restrict cmds' behavior based on the sparse checkout from the 'grep' namespace to 'sparse', as the idea is to have the same setting affecting multiple cmds. - Add the --[no]-restrict-to-sparse-paths global option - Add more tests for the setting and CLI option in grep. - Add tests to ensure the submodules' values for the setting are respected when running grep with --recurse-submodules. Note: in this patch, I used the 'sparse' namespace, instead of 'core', following the idea we discussed in [2], to have the sparse checkout settings in their own namespace. We also talked about moving core.sparseCheckout and core.sparseCheckoutCone to the new namespace. I tried implementing this change in this same patchset (although, on second thought, it is probably better to do it in another one), but I still haven't managed to come up with a rename implementation that keeps good compatibility. The problems are the ones Elijah listed in [3]. So, for now, sparse.restrictCmds is the only setting in the 'sparse' namespace. But it won't be the only one for too long, as Stolee is already implementing other ones [4]. [1]: https://lore.kernel.org/git/CAHd-oW7e5qCuxZLBeVDq+Th3E+E4+P8=WzJfK8WcG2yz=n_nag@xxxxxxxxxxxxxx/t/#u [2]: https://lore.kernel.org/git/49c1e9a5-b234-1696-03cc-95bf95f4663c@xxxxxxxxx/ [3]: https://lore.kernel.org/git/CABPp-BGytfCugK0S99nLPH4_VXmcYPHWdVyLO59BZc4__4CT9w@xxxxxxxxxxxxxx/ [4]: https://lore.kernel.org/git/2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@xxxxxxxxx/ Matheus Tavares (4): doc: grep: unify info on configuration variables config: load the correct config.worktree file grep: honor sparse checkout patterns config: add setting to ignore sparsity patterns in some cmds Documentation/config.txt | 2 + Documentation/config/grep.txt | 10 +- Documentation/config/sparse.txt | 22 +++ Documentation/git-grep.txt | 37 +---- Documentation/git.txt | 4 + Makefile | 1 + builtin/grep.c | 137 +++++++++++++++- config.c | 5 +- contrib/completion/git-completion.bash | 2 + git.c | 6 + sparse-checkout.c | 16 ++ sparse-checkout.h | 11 ++ t/t7011-skip-worktree-reading.sh | 9 -- t/t7817-grep-sparse-checkout.sh | 216 +++++++++++++++++++++++++ t/t9902-completion.sh | 4 +- 15 files changed, 431 insertions(+), 51 deletions(-) create mode 100644 Documentation/config/sparse.txt create mode 100644 sparse-checkout.c create mode 100644 sparse-checkout.h create mode 100755 t/t7817-grep-sparse-checkout.sh Range-diff against v1: 1: 7ba5caf10d ! 1: c344d22313 doc: grep: unify info on configuration variables @@ Commit message Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and - "Documentation/config/grep.txt". Let's unify the information in the - second file and include it in the first. + "Documentation/config/grep.txt", which can make maintenance difficult. + The first also contains a definition not present in the latter + (grep.fullName). To avoid problems like this, let's unify the + information in the second file and include it in the first. Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx> @@ Documentation/config/grep.txt: grep.extendedRegexp:: grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. -+ Number of grep worker threads to use. See `--threads` in -+ linkgit:git-grep[1] for more information. ++ Number of grep worker threads to use. See `--threads` ++ifndef::git-grep[] ++ in linkgit:git-grep[1] ++endif::git-grep[] ++ for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. @@ Documentation/git-grep.txt: characters. An empty string as search expression ma - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - ++:git-grep: 1 +include::config/grep.txt[] OPTIONS @@ Documentation/git-grep.txt: providing this option will cause it to die. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the -+ `grep.threads` configuration (see linkgit:git-config[1]). ++ `grep.threads` configuration. -f <file>:: Read patterns from <file>, one per line. -: ---------- > 2: 882310b69f config: load the correct config.worktree file 2: 0b9b4c4b41 ! 3: e00674c727 grep: honor sparse checkout patterns @@ Commit message One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches - found outside this subset, which kind of goes in the oposity direction. + found outside this subset, which kind of goes in the opposite direction. Let's fix that, making it honor the sparsity boundaries for every grepping case: @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); -+ int from_commit); ++ int is_root_tree); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + -+ if (ce_skip_worktree(ce)) ++ if (ce_skip_worktree(ce) && !S_ISGITLINK(ce->ce_mode)) + continue; + strbuf_setlen(&name, name_base_len); @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, + return hit; + } - static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - struct tree_desc *tree, struct strbuf *base, int tn_len, +-static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, +- struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) -+ int from_commit) ++static struct pattern_list *get_sparsity_patterns(struct repository *repo) ++{ ++ struct pattern_list *patterns; ++ char *sparse_file; ++ int sparse_config, cone_config; ++ ++ if (repo_config_get_bool(repo, "core.sparsecheckout", &sparse_config) || ++ !sparse_config) { ++ return NULL; ++ } ++ ++ sparse_file = repo_git_path(repo, "info/sparse-checkout"); ++ patterns = xcalloc(1, sizeof(*patterns)); ++ ++ if (repo_config_get_bool(repo, "core.sparsecheckoutcone", &cone_config)) ++ cone_config = 0; ++ patterns->use_cone_patterns = cone_config; ++ ++ if (add_patterns_from_file_to_list(sparse_file, "", 0, patterns, NULL)) { ++ if (file_exists(sparse_file)) { ++ warning(_("failed to load sparse-checkout file: '%s'"), ++ sparse_file); ++ } ++ free(sparse_file); ++ free(patterns); ++ return NULL; ++ } ++ ++ free(sparse_file); ++ return patterns; ++} ++ ++static int in_sparse_checkout(struct strbuf *path, int prefix_len, ++ unsigned int entry_mode, ++ struct index_state *istate, ++ struct pattern_list *sparsity, ++ enum pattern_match_result parent_match, ++ enum pattern_match_result *match) ++{ ++ int dtype = DT_UNKNOWN; ++ ++ if (S_ISGITLINK(entry_mode)) ++ return 1; ++ ++ if (parent_match == MATCHED_RECURSIVE) { ++ *match = parent_match; ++ return 1; ++ } ++ ++ if (S_ISDIR(entry_mode) && !is_dir_sep(path->buf[path->len - 1])) ++ strbuf_addch(path, '/'); ++ ++ *match = path_matches_pattern_list(path->buf, path->len, ++ path->buf + prefix_len, &dtype, ++ sparsity, istate); ++ if (*match == UNDECIDED) ++ *match = parent_match; ++ ++ if (S_ISDIR(entry_mode)) ++ strbuf_trim_trailing_dir_sep(path); ++ ++ if (*match == NOT_MATCHED && (S_ISREG(entry_mode) || ++ (S_ISDIR(entry_mode) && sparsity->use_cone_patterns))) ++ return 0; ++ ++ return 1; ++} ++ ++static int do_grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, ++ struct tree_desc *tree, struct strbuf *base, int tn_len, ++ int check_attr, struct pattern_list *sparsity, ++ enum pattern_match_result default_sparsity_match) { struct repository *repo = opt->repo; int hit = 0; @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, - name_base_len = name.len; - } -+ if (from_commit && repo_read_index(repo) < 0) -+ die(_("index file corrupt")); -+ while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); ++ enum pattern_match_result sparsity_match = 0; + if (match != all_entries_interesting) { + strbuf_addstr(&name, base->buf + tn_len); @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); -+ if (from_commit) { -+ int pos = index_name_pos(repo->index, -+ base->buf + tn_len, -+ base->len - tn_len); -+ if (pos >= 0 && -+ ce_skip_worktree(repo->index->cache[pos])) { ++ if (sparsity) { ++ struct strbuf path = STRBUF_INIT; ++ strbuf_addstr(&path, base->buf + tn_len); ++ ++ if (!in_sparse_checkout(&path, old_baselen - tn_len, ++ entry.mode, repo->index, ++ sparsity, default_sparsity_match, ++ &sparsity_match)) { + strbuf_setlen(base, old_baselen); + continue; + } @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, -- check_attr ? base->buf + tn_len : NULL); -+ from_commit ? base->buf + tn_len : NULL); - } else if (S_ISDIR(entry.mode)) { - enum object_type type; - struct tree_desc sub; + check_attr ? base->buf + tn_len : NULL); @@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); - hit |= grep_tree(opt, pathspec, &sub, base, tn_len, +- hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); -+ from_commit); ++ hit |= do_grep_tree(opt, pathspec, &sub, base, tn_len, ++ check_attr, sparsity, sparsity_match); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, +@@ builtin/grep.c: static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, + return hit; + } + ++/* ++ * Note: sparsity patterns and paths' attributes will only be considered if ++ * is_root_tree has true value. (Otherwise, we cannot properly perform pattern ++ * matching on paths.) ++ */ ++static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, ++ struct tree_desc *tree, struct strbuf *base, int tn_len, ++ int is_root_tree) ++{ ++ struct pattern_list *patterns = NULL; ++ int ret; ++ ++ if (is_root_tree) ++ patterns = get_sparsity_patterns(opt->repo); ++ ++ ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree, ++ patterns, 0); ++ ++ if (patterns) { ++ clear_pattern_list(patterns); ++ free(patterns); ++ } ++ return ret; ++} ++ + static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec, + struct object *obj, const char *name, const char *path) + { ## t/t7011-skip-worktree-reading.sh ## @@ t/t7011-skip-worktree-reading.sh: test_expect_success 'ls-files --modified' ' @@ t/t7817-grep-sparse-checkout.sh (new) + +test_description='grep in sparse checkout + -+This test creates the following dir structure: ++This test creates a repo with the following structure: ++ +. -+| - a -+| - b -+| - dir -+ | - c ++|-- a ++|-- b ++|-- dir ++| `-- c ++`-- sub ++ |-- A ++ | `-- a ++ `-- B ++ `-- b + -+Only "a" should be present due to the sparse checkout patterns: -+"/*", "!/b" and "!/dir". ++Where . has non-cone mode sparsity patterns and sub is a submodule with cone ++mode sparsity patterns. The resulting sparse-checkout should leave the following ++structure: ++ ++. ++|-- a ++`-- sub ++ `-- B ++ `-- b +' + +. ./test-lib.sh @@ t/t7817-grep-sparse-checkout.sh (new) + echo "text" >b && + mkdir dir && + echo "text" >dir/c && ++ ++ git init sub && ++ ( ++ cd sub && ++ mkdir A B && ++ echo "text" >A/a && ++ echo "text" >B/b && ++ git add A B && ++ git commit -m sub && ++ git sparse-checkout init --cone && ++ git sparse-checkout set B ++ ) && ++ ++ git submodule add ./sub && + git add a b dir && -+ git commit -m "initial commit" && ++ git commit -m super && ++ git sparse-checkout init --no-cone && ++ git sparse-checkout set "/*" "!b" "!/*/" && ++ + git tag -am t-commit t-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am t-tree t-tree $tree && -+ cat >.git/info/sparse-checkout <<-EOF && -+ /* -+ !/b -+ !/dir -+ EOF -+ git sparse-checkout init && ++ + test_path_is_missing b && + test_path_is_missing dir && -+ test_path_is_file a ++ test_path_is_missing sub/A && ++ test_path_is_file a && ++ test_path_is_file sub/B/b +' + +test_expect_success 'grep in working tree should honor sparse checkout' ' @@ t/t7817-grep-sparse-checkout.sh (new) + test_cmp expect_t-commit actual_t-commit +' + -+test_expect_success 'grep <tree-ish> should search outside sparse checkout' ' ++test_expect_success 'grep <tree-ish> should ignore sparsity patterns' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && @@ t/t7817-grep-sparse-checkout.sh (new) + test_cmp expect_t-tree actual_t-tree +' + ++test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' ' ++ cat >expect <<-EOF && ++ a:text ++ sub/B/b:text ++ EOF ++ git grep --recurse-submodules --cached "text" >actual && ++ test_cmp expect actual ++' ++ ++test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse checkout in submodule' ' ++ commit=$(git rev-parse HEAD) && ++ cat >expect_commit <<-EOF && ++ $commit:a:text ++ $commit:sub/B/b:text ++ EOF ++ cat >expect_t-commit <<-EOF && ++ t-commit:a:text ++ t-commit:sub/B/b:text ++ EOF ++ git grep --recurse-submodules "text" $commit >actual_commit && ++ test_cmp expect_commit actual_commit && ++ git grep --recurse-submodules "text" t-commit >actual_t-commit && ++ test_cmp expect_t-commit actual_t-commit ++' ++ +test_done 3: a76242ecfa < -: ---------- grep: add option to ignore sparsity patterns -: ---------- > 4: 3e9e906249 config: add setting to ignore sparsity patterns in some cmds -- 2.26.2