Integrate `git-grep` with sparse-index and test the performance improvement. Changes since v4 ---------------- * Reset the length of `struct strbuf name` back to `name_base_len`, instead of 0, after `grep_tree()` returns. * Add test cases in t1092 for `grep` recursing into submodules. * Add a few NEEDSWORK to explain the current problem with submodules. Changes since v3 ---------------- * Shorten the perf result tables in commit message. * Update the commit message to reflect the changes in the commit. * Update the commit message to indicate the performance improvement is dependent on the pathspec. * Stop passing `ce_mode` through `check_attr`. Instead, set the `base_len` to 0 to make the code more reasonable and less abuse of `check_attr`. * Remove another invention of `base`. Use the existing `name` as the argument for `grep_tree()`, and reset it back to `ce->name` after `grep_tree()` returns. * Update the p2000 test to use a more general pathspec for better compatibility (i.e. do not use git repository specific pathspec). * Add tests to t1092 'grep is not expanded' to verify the change brought by "builtin/grep.c: walking tree instead of expanding index with --sparse": the index *never* expands. Changes since v2 ---------------- * Modify the commit message for "builtin/grep.c: integrate with sparse index" to make it obvious that the perf test results are not from p2000 tests, but from manual perf runs. * Add tree-walking logic as an extra (the third) patch to improve the performance when --sparse is used. This resolved the left-over-bit in v2 [1]. [1] https://lore.kernel.org/git/20220829232843.183711-1-shaoxuan.yuan02@xxxxxxxxx/ Changes since v1 ---------------- * Rewrite the commit message for "builtin/grep.c: add --sparse option" to be clearer. * Update the documentation (both in-code and man page) for --sparse. * Add a few tests to test the new behavior (when _only_ --cached is supplied). * Reformat the perf test results to not look like directly from p2000 tests. * Put the "command_requires_full_index" lines right after parse_options(). * Add a pathspec test in t1092, and reword a few test documentations. Shaoxuan Yuan (3): builtin/grep.c: add --sparse option builtin/grep.c: integrate with sparse index builtin/grep.c: walking tree instead of expanding index with --sparse Documentation/git-grep.txt | 5 +- builtin/grep.c | 58 +++++++++++++++++-- t/perf/p2000-sparse-operations.sh | 1 + t/t1092-sparse-checkout-compatibility.sh | 72 ++++++++++++++++++++++++ t/t7817-grep-sparse-checkout.sh | 34 +++++++++-- 5 files changed, 159 insertions(+), 11 deletions(-) Range-diff against v4: 1: 00a8b3a68e = 1: c3d33e487c builtin/grep.c: add --sparse option 2: 3e0786722c = 2: c5366f51b8 builtin/grep.c: integrate with sparse index 3: 81afe2fcb3 ! 3: 52bb802eae builtin/grep.c: walking tree instead of expanding index with --sparse @@ Commit message 2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈) 2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈) + -------------------------- + NEEDSWORK about submodules + + There are a few NEEDSWORKs that belong to improvements beyond this + topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for + more context. The other two NEEDSWORKs in t1092 are also relative. + Suggested-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> Helped-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> Helped-by: Victoria Dye <vdye@xxxxxxxxxx> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx> ## builtin/grep.c ## +@@ builtin/grep.c: static int grep_submodule(struct grep_opt *opt, + * subrepo's odbs to the in-memory alternates list. + */ + obj_read_lock(); ++ ++ /* ++ * NEEDSWORK: when reading a submodule, the sparsity settings in the ++ * superproject are incorrectly forgotten or misused. For example: ++ * ++ * 1. "command_requires_full_index" ++ * When this setting is turned on for `grep`, only the superproject ++ * knows it. All the submodules are read with their own configs ++ * and get prepare_repo_settings()'d. Therefore, these submodules ++ * "forget" the sparse-index feature switch. As a result, the index ++ * of these submodules are expanded unexpectedly. ++ * ++ * 2. "core_apply_sparse_checkout" ++ * When running `grep` in the superproject, this setting is ++ * populated using the superproject's configs. However, once ++ * initialized, this config is globally accessible and is read by ++ * prepare_repo_settings() for the submodules. For instance, if a ++ * submodule is using a sparse-checkout, however, the superproject ++ * is not, the result is that the config from the superproject will ++ * dictate the behavior for the submodule, making it "forget" its ++ * sparse-checkout state. ++ * ++ * 3. "core_sparse_checkout_cone" ++ * ditto. ++ * ++ * Note that this list is not exhaustive. ++ */ + repo_read_gitmodules(subrepo, 0); + + /* @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, if (repo_read_index(repo) < 0) die(_("index file corrupt")); @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, - if (S_ISREG(ce->ce_mode) && + hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0); -+ strbuf_reset(&name); ++ strbuf_setlen(&name, name_base_len); + strbuf_addstr(&name, ce->name); + free(data); + } else if (S_ISREG(ce->ce_mode) && @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git read-tree -mu HEAD test_done ## t/t1092-sparse-checkout-compatibility.sh ## +@@ t/t1092-sparse-checkout-compatibility.sh: init_repos () { + git -C sparse-index sparse-checkout set deep + } + ++init_repos_as_submodules () { ++ git reset --hard && ++ init_repos && ++ git submodule add ./full-checkout && ++ git submodule add ./sparse-checkout && ++ git submodule add ./sparse-index && ++ ++ git submodule status >actual && ++ grep full-checkout actual && ++ grep sparse-checkout actual && ++ grep sparse-index actual ++} ++ + run_on_sparse () { + ( + cd sparse-checkout && @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'grep is not expanded' ' # All files within the folder1/* pathspec are sparse, @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'grep is not expan + # test in-cone pathspec with or without wildcard + ensure_not_expanded grep --sparse --cached a -- "deep/a" && + ensure_not_expanded grep --sparse --cached a -- "deep/*" ++' ++ ++# NEEDSWORK: when running `grep` in the superproject with --recurse-submodules, ++# Git expands the index of the submodules unexpectedly. Even though `grep` ++# builtin is marked as "command_requires_full_index = 0", this config is only ++# useful for the superproject. Namely, the submodules have their own configs, ++# which are _not_ populated by the one-time sparse-index feature switch. ++test_expect_failure 'grep within submodules is not expanded' ' ++ init_repos_as_submodules && ++ ++ # do not use ensure_not_expanded() here, becasue `grep` should be ++ # run in the superproject, not in "./sparse-index" ++ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ ++ git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" && ++ test_region ! index ensure_full_index trace2.txt ++' ++ ++# NEEDSWORK: this test is not actually testing the code. The design purpose ++# of this test is to verify the grep result when the submodules are using a ++# sparse-index. Namely, we want "folder1/" as a tree (a sparse directory); but ++# because of the index expansion, we are now grepping the "folder1/a" blob. ++# Because of the problem stated above 'grep within submodules is not expanded', ++# we don't have the ideal test environment yet. ++test_expect_success 'grep sparse directory within submodules' ' ++ init_repos_as_submodules && ++ ++ cat >expect <<-\EOF && ++ full-checkout/folder1/a:a ++ sparse-checkout/folder1/a:a ++ sparse-index/folder1/a:a ++ EOF ++ git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" >actual && ++ test_cmp actual expect ' test_done base-commit: 79f2338b3746d23454308648b2491e5beba4beff -- 2.37.0