This series is based on vd/sparse-reset. It integrates the sparse index with git diff and git blame and includes: 1. tests added to t1092 and p2000 to establish the baseline functionality of the commands 2. repository settings to enable the sparse index The p2000 tests demonstrate a ~44% execution time reduction for 'git diff' and a ~86% execution time reduction for 'git diff --staged' using a sparse index. For 'git blame', the reduction time was ~60% for a file two levels deep and ~30% for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.30: git diff (full-v3) 0.33 0.34 +3.0% 2000.31: git diff (full-v4) 0.33 0.35 +6.1% 2000.32: git diff (sparse-v3) 0.53 0.31 -41.5% 2000.33: git diff (sparse-v4) 0.54 0.29 -46.3% 2000.34: git diff --cached (full-v3) 0.07 0.07 +0.0% 2000.35: git diff --cached (full-v4) 0.07 0.08 +14.3% 2000.36: git diff --cached (sparse-v3) 0.28 0.04 -85.7% 2000.37: git diff --cached (sparse-v4) 0.23 0.03 -87.0% 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% Changes since V1 ================ * Fix failing diff partially-staged test in t1092-sparse-checkout-compatibility.sh, which was breaking in seen. Changes since V2 ================ * Update diff commit description to include patches that make the checkout and status commands work with the sparse index for readers to reference. * Add new test case to verify diff behaves as expected when run against files outside the sparse checkout cone. * Indent error message in blame commit * Check error message in blame with pathspec outside sparse definition test matches expectations. * Loop blame tests (instead of running the same command multiple time against different files). Changes since V3 ================ * Update diff p2000 tests to use --cached instead of --staged. Execute new run and update results in commit description and cover letter. * Update comment on blame with pathspec outside sparse definition test in t1092-sparse-checkout-compatibility.sh to clarify that it tests the current state and could be improved in the future. * Ensure sparse index is only activated when diff is running against files in a Git repo. * BUG if prepare_repo_settings() is called outside a repository. * Ensure sparse index is not activated for calls to blame, checkout, or pack-object with -h. * Ensure commit-graph is only loaded if a git directory exists. Changes since V4 ================ * Remove startup_info->have_repository check from checkout, pack-objects, and blame. Update git.c to no longer bypass setup when -h is passed instead. * Move commit-graph, test-read-cache, and repo-settings changes into their own patches with details in commit description of why the changes are being made. * Update t1092-sparse-checkout-compatibility.sh tests to use --cached instead of --staged. * Use 10-character hash abbreviations for commits referenced in diff commit message. * Clarify that being unable to blame files outside the working directory is not supported in either sparse or non-sparse checkouts both in comment on blame with pathspec outside sparse definition test in t1092-sparse-checkout-compatibility.sh and blame commit message. Thanks, Lessley Lessley Dennington (7): git: esnure correct git directory setup with -h commit-graph: return if there is no git directory test-read-cache: set up repo after git directory repo-settings: prepare_repo_settings only in git repos diff: replace --staged with --cached in t1092 tests diff: enable and test the sparse index blame: enable and test the sparse index builtin/blame.c | 3 + builtin/diff.c | 5 ++ commit-graph.c | 5 +- git.c | 37 ++++---- repo-settings.c | 3 + t/helper/test-read-cache.c | 5 +- t/perf/p2000-sparse-operations.sh | 4 + t/t1092-sparse-checkout-compatibility.sh | 109 +++++++++++++++++++---- 8 files changed, 132 insertions(+), 39 deletions(-) base-commit: f2a454e0a5e26c0f7b840970f69d195c37b16565 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1050 Range-diff vs v4: -: ----------- > 1: 09c2ff9f898 git: esnure correct git directory setup with -h 1: 81e208cf454 ! 2: 9e53a6435e4 sparse index: enable only for git repos @@ Metadata Author: Lessley Dennington <lessleydennington@xxxxxxxxx> ## Commit message ## - sparse index: enable only for git repos + commit-graph: return if there is no git directory - Check whether git dir exists before adding any repo settings. If it - does not exist, BUG with the message that one cannot add settings for an - uninitialized repository. If it does exist, proceed with adding repo - settings. - - Additionally, ensure the above BUG is not triggered when users pass the -h - flag by adding a check for the repository to the checkout and pack-objects - builtins. - - Finally, ensure the above BUG is not triggered for commit-graph by - returning early if the git directory does not exist. + Return early if git directory does not exist. This will protect against + test failures in the upcoming change to BUG in prepare_repo_settings if no + git directory exists. Signed-off-by: Lessley Dennington <lessleydennington@xxxxxxxxx> - ## builtin/checkout.c ## -@@ builtin/checkout.c: static int checkout_main(int argc, const char **argv, const char *prefix, - - git_config(git_checkout_config, opts); - -- prepare_repo_settings(the_repository); -- the_repository->settings.command_requires_full_index = 0; -+ if (startup_info->have_repository) { -+ prepare_repo_settings(the_repository); -+ the_repository->settings.command_requires_full_index = 0; -+ } - - opts->track = BRANCH_TRACK_UNSPECIFIED; - - - ## builtin/pack-objects.c ## -@@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const char *prefix) - read_replace_refs = 0; - - sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1); -- prepare_repo_settings(the_repository); -- if (sparse < 0) -- sparse = the_repository->settings.pack_use_sparse; -+ -+ if (startup_info->have_repository) { -+ prepare_repo_settings(the_repository); -+ if (sparse < 0) -+ sparse = the_repository->settings.pack_use_sparse; -+ } - - reset_pack_idx_option(&pack_idx_opts); - git_config(git_pack_config, NULL); - ## commit-graph.c ## @@ commit-graph.c: static int prepare_commit_graph(struct repository *r) struct object_directory *odb; @@ commit-graph.c: static int prepare_commit_graph(struct repository *r) return 0; if (r->objects->commit_graph_attempted) - - ## repo-settings.c ## -@@ repo-settings.c: void prepare_repo_settings(struct repository *r) - char *strval; - int manyfiles; - -+ if (!r->gitdir) -+ BUG("Cannot add settings for uninitialized repository"); -+ - if (r->settings.initialized++) - return; - 2: 5bc5e8465ab ! 3: 219a4158b6a test-read-cache: set up repo after git directory @@ Metadata ## Commit message ## test-read-cache: set up repo after git directory - Move repo setup to occur after git directory is set up. This will ensure - enabling the sparse index for `diff` (and guarding against the nongit - scenario) will not cause tests to start failing, since that change will include - adding a check to prepare_repo_settings() with the new BUG. + Move repo setup to occur after git directory is set up. This will protect + against test failures in the upcoming change to BUG in + prepare_repo_settings if no git directory exists. Signed-off-by: Lessley Dennington <lessleydennington@xxxxxxxxx> -: ----------- > 4: 4d8d58c473b repo-settings: prepare_repo_settings only in git repos -: ----------- > 5: 85e3e5c78e7 diff: replace --staged with --cached in t1092 tests 3: 273ee16b74e ! 6: 4f16366e5ad diff: enable and test the sparse index @@ Commit message with the 'git status' and 'git checkout' commands that were already integrated. For more details see: - d76723e (status: use sparse-index throughout, 2021-07-14) - 1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29) + d76723ee53 (status: use sparse-index throughout, 2021-07-14) + 1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29) The most interesting thing to do is to add tests that verify that 'git diff' behaves correctly when the sparse index is enabled. These cases are: @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n + run_on_all ../edit-contents deep/testfile && + + test_all_match git diff && -+ test_all_match git diff --staged && ++ test_all_match git diff --cached && + ensure_not_expanded diff && -+ ensure_not_expanded diff --staged && ++ ensure_not_expanded diff --cached && + + # Add file outside cone + test_all_match git reset --hard && @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n + test_sparse_match git sparse-checkout set && + + test_all_match git diff && -+ test_all_match git diff --staged && ++ test_all_match git diff --cached && + ensure_not_expanded diff && -+ ensure_not_expanded diff --staged && ++ ensure_not_expanded diff --cached && + + # Merge conflict outside cone + # The sparse checkout will report a warning that is not in the @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n + test_all_match test_must_fail git merge merge-right && + + test_all_match git diff && -+ test_all_match git diff --staged && ++ test_all_match git diff --cached && + ensure_not_expanded diff && -+ ensure_not_expanded diff --staged ++ ensure_not_expanded diff --cached +' + # NEEDSWORK: a sparse-checkout behaves differently from a full checkout 4: 7acf5118bf5 ! 7: 04532378734 blame: enable and test the sparse index @@ Commit message 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame - currently does not support blaming files outside of the sparse definition. - Attempting to do so fails with the following error: - - fatal: no such path '<path outside sparse definition>' in HEAD + does not support blaming files that are not present in the working + directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@xxxxxxxxx> ## builtin/blame.c ## -@@ builtin/blame.c: int cmd_blame(int argc, const char **argv, const char *prefix) - long anchor; - const int hexsz = the_hash_algo->hexsz; +@@ builtin/blame.c: parse_done: + revs.diffopt.flags.follow_renames = 0; + argc = parse_options_end(&ctx); -+ if (startup_info->have_repository) { -+ prepare_repo_settings(the_repository); -+ the_repository->settings.command_requires_full_index = 0; -+ } ++ prepare_repo_settings(the_repository); ++ the_repository->settings.command_requires_full_index = 0; + - setup_default_color_by_age(); - git_config(git_blame_config, &output_option); - repo_init_revisions(the_repository, &revs, NULL); + if (incremental || (output_option & OUTPUT_PORCELAIN)) { + if (show_progress > 0) + die(_("--progress can't be used with --incremental or porcelain formats")); ## t/perf/p2000-sparse-operations.sh ## -@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset - test_perf_on_all git reset --hard +@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset --hard test_perf_on_all git reset -- does-not-exist test_perf_on_all git diff --test_perf_on_all git diff --cached -+test_perf_on_all git diff --staged + test_perf_on_all git diff --cached +test_perf_on_all git blame $SPARSE_CONE/a +test_perf_on_all git blame $SPARSE_CONE/f3/a @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec -# TODO: blame currently does not support blaming files outside of the -# sparse definition. It complains that the file doesn't exist locally. -test_expect_failure 'blame with pathspec outside sparse definition' ' -+# NEEDSWORK: This test documents the current behavior, but this could -+# change in the future if we decide to support blaming files outside -+# the sparse definition. ++# Without a revision specified, blame will error if passed any file that ++# is not present in the working directory (even if the file is tracked). ++# Here we just verify that this is also true with sparse checkouts. +test_expect_success 'blame with pathspec outside sparse definition' ' init_repos && + test_sparse_match git sparse-checkout set && @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec test_expect_success 'checkout and reset (mixed)' ' @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is not expanded: diff' ' - ensure_not_expanded diff --staged + ensure_not_expanded diff --cached ' +test_expect_success 'sparse index is not expanded: blame' ' -- gitgitgadget