On Tue, Mar 30, 2021 at 6:11 AM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@xxxxxxxxx/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > > Updates in V5 > ============= > > This version is updated to use an index extension instead of a repository > format extension. Thanks, Szeder! This one change affects the range-diff > quite a bit, so please review those changes carefully. > > In particular: git sparse-checkout init --cone --sparse-index now sets a new > index.sparse config option as an indicator that we should attempt writing > the index in sparse form. > > > Updates in V4 > ============= > > * Rebased onto the latest copy of ab/read-tree. > * Updated the design document as per Junio's comments. > * Updated the submodule handling in the performance test. > * Followed up on some other review from Ævar, mostly style or commit > message things. > > > Updates in V3 > ============= > > For this version, I took Ævar's latest patches and applied them to v2.31.0 > and rebased this series on top. It uses his new "read_tree_at()" helper and > the associated changes to the function pointer type. > > * Fixed more typos. Thanks Martin and Elijah! > * Updated the test_sparse_match() macro to use "$@" instead of $* > * Added a test that git sparse-checkout init --no-sparse-index rewrites the > index to be full. > > > Updates in V2 > ============= > > * Various typos and awkward grammar is fixed. > * Cleaned up unnecessary commands in p2000-sparse-operations.sh > * Added a comment to the sparse_index member of struct index_state. > * Used tree_type, commit_type, and blob_type in test-read-cache.c. > > Thanks, -Stolee > > Derrick Stolee (21): > sparse-index: design doc and format update > t/perf: add performance test for sparse operations > t1092: clean up script quoting > sparse-index: add guard to ensure full index > sparse-index: implement ensure_full_index() > t1092: compare sparse-checkout to sparse-index > test-read-cache: print cache entries with --table > test-tool: don't force full index > unpack-trees: ensure full index > sparse-checkout: hold pattern list in index > sparse-index: add 'sdir' index extension > sparse-index: convert from full to sparse > submodule: sparse-index should not collapse links > unpack-trees: allow sparse directories > sparse-index: check index conversion happens > sparse-index: add index.sparse config option > sparse-checkout: toggle sparse index from builtin > sparse-checkout: disable sparse-index > cache-tree: integrate with sparse directory entries > sparse-index: loose integration with cache_tree_verify() > p2000: add sparse-index repos > > Documentation/config/index.txt | 5 + > Documentation/git-sparse-checkout.txt | 14 ++ > Documentation/technical/index-format.txt | 19 ++ > Documentation/technical/sparse-index.txt | 175 ++++++++++++++ > Makefile | 1 + > builtin/sparse-checkout.c | 44 +++- > cache-tree.c | 40 ++++ > cache.h | 18 +- > read-cache.c | 44 +++- > repo-settings.c | 15 ++ > repository.c | 11 +- > repository.h | 3 + > sparse-index.c | 285 +++++++++++++++++++++++ > sparse-index.h | 11 + > t/README | 3 + > t/helper/test-read-cache.c | 66 +++++- > t/perf/p2000-sparse-operations.sh | 101 ++++++++ > t/t1091-sparse-checkout-builtin.sh | 13 ++ > t/t1092-sparse-checkout-compatibility.sh | 143 ++++++++++-- > unpack-trees.c | 17 +- > 20 files changed, 988 insertions(+), 40 deletions(-) > create mode 100644 Documentation/technical/sparse-index.txt > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > create mode 100755 t/perf/p2000-sparse-operations.sh > > > base-commit: 47957485b3b731a7860e0554d2bd12c0dce1c75a > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v5 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v5 > Pull-Request: https://github.com/gitgitgadget/git/pull/883 > > Range-diff vs v4: > > 1: 6426a5c60e53 ! 1: 7b600d536c6e sparse-index: design doc and format update > @@ Documentation/technical/sparse-index.txt (new) > +The only noticeable change in behavior will be that the serialized index > +file contains sparse-directory entries. > + > -+To start, we use a new repository extension, `extensions.sparseIndex`, to > -+allow inserting sparse-directory entries into indexes with file format > ++To start, we use a new required index extension, `sdir`, to allow > ++inserting sparse-directory entries into indexes with file format > +versions 2, 3, and 4. This prevents Git versions that do not understand > -+the sparse-index from operating on one, but it also prevents other > -+operations that do not use the index at all. A new format, index v5, will > -+be introduced that includes sparse-directory entries by default. It might > -+also introduce other features that have been considered for improving the > ++the sparse-index from operating on one, while allowing tools that do not > ++understand the sparse-index to operate on repositories as long as they do > ++not interact with the index. A new format, index v5, will be introduced > ++that includes sparse-directory entries by default. It might also > ++introduce other features that have been considered for improving the > +index, as well. > + > +Next, consumers of the index will be guarded against operating on a > 2: 7eabc1d0586c = 2: 202253ec82f3 t/perf: add performance test for sparse operations > 3: c9e21d78ecba = 3: 437a0f144e57 t1092: clean up script quoting > 4: 03cdde756563 = 4: b7e1bf5c55a7 sparse-index: add guard to ensure full index > 5: 6b3b6d86385d = 5: e41d55d2cca9 sparse-index: implement ensure_full_index() > 6: 7f67adba0498 = 6: 7bfbfbd17321 t1092: compare sparse-checkout to sparse-index > 7: 7ebd9570b1ad = 7: a1b8135c0fc8 test-read-cache: print cache entries with --table > 8: db7bbd06dbcc = 8: dd84a2a9121b test-tool: don't force full index > 9: 3ddd5e794b5e = 9: b276d2ed5323 unpack-trees: ensure full index > 10: 7308c87697f1 = 10: c3651e26dc3a sparse-checkout: hold pattern list in index > -: ------------ > 11: f926cf8b2e01 sparse-index: add 'sdir' index extension > 11: 7c10d653ca6b = 12: c870ae5e8749 sparse-index: convert from full to sparse > 12: 6db36f33e960 = 13: bcf0da959ef3 submodule: sparse-index should not collapse links > 13: d24bd3348d98 = 14: 7191b48237de unpack-trees: allow sparse directories > 14: 08d9f5f3c0d1 = 15: 57be9b4a728b sparse-index: check index conversion happens > 15: 6f38cef196b0 ! 16: c22b4111e49e sparse-index: create extension for compatibility > @@ Metadata > Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > ## Commit message ## > - sparse-index: create extension for compatibility > + sparse-index: add index.sparse config option > > - Previously, we enabled the sparse index format only using > - GIT_TEST_SPARSE_INDEX=1. This is not a feasible direction for users to > - actually select this mode. Further, sparse directory entries are not > - understood by the index formats as advertised. > - > - We _could_ add a new index version that explicitly adds these > - capabilities, but there are nuances to index formats 2, 3, and 4 that > - are still valuable to select as options. Until we add index format > - version 5, create a repo extension, "extensions.sparseIndex", that > - specifies that the tool reading this repository must understand sparse > - directory entries. > - > - This change only encodes the extension and enables it when > - GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI > - mechanism. > + When enabled, this config option signals that index writes should > + attempt to use sparse-directory entries. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > - ## Documentation/config/extensions.txt ## > -@@ Documentation/config/extensions.txt: extensions.objectFormat:: > - Note that this setting should only be set by linkgit:git-init[1] or > - linkgit:git-clone[1]. Trying to change it after initialization will not > - work and will produce hard-to-diagnose issues. > + ## Documentation/config/index.txt ## > +@@ Documentation/config/index.txt: index.recordOffsetTable:: > + Defaults to 'true' if index.threads has been explicitly enabled, > + 'false' otherwise. > + > ++index.sparse:: > ++ When enabled, write the index using sparse-directory entries. This > ++ has no effect unless `core.sparseCheckout` and > ++ `core.sparseCheckoutCone` are both enabled. Defaults to 'false'. > + > -+extensions.sparseIndex:: > -+ When combined with `core.sparseCheckout=true` and > -+ `core.sparseCheckoutCone=true`, the index may contain entries > -+ corresponding to directories outside of the sparse-checkout > -+ definition in lieu of containing each path under such directories. > -+ Versions of Git that do not understand this extension do not > -+ expect directory entries in the index. > + index.threads:: > + Specifies the number of threads to spawn when loading the index. > + This is meant to reduce index load time on multiprocessor machines. > > ## cache.h ## > @@ cache.h: struct repository_format { > @@ repo-settings.c: void prepare_repo_settings(struct repository *r) > + * Initialize this as off. > + */ > + r->settings.sparse_index = 0; > -+ if (!repo_config_get_bool(r, "extensions.sparseindex", &value) && value) > ++ if (!repo_config_get_bool(r, "index.sparse", &value) && value) > + r->settings.sparse_index = 1; > } > > @@ repository.h: struct repo_settings { > > struct repository { > > - ## setup.c ## > -@@ setup.c: static enum extension_result handle_extension(const char *var, > - return error("invalid value for 'extensions.objectformat'"); > - data->hash_algo = format; > - return EXTENSION_OK; > -+ } else if (!strcmp(ext, "sparseindex")) { > -+ data->sparse_index = 1; > -+ return EXTENSION_OK; > - } > - return EXTENSION_UNKNOWN; > - } > - > ## sparse-index.c ## > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > return num_converted - start_converted; > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > +{ > + const char *config_path = repo_git_path(repo, "config.worktree"); > + > -+ if (upgrade_repository_format(1) < 0) { > -+ warning(_("unable to upgrade repository format to enable sparse-index")); > -+ return -1; > -+ } > + git_config_set_in_file_gently(config_path, > -+ "extensions.sparseIndex", > ++ "index.sparse", > + "true"); > + > + prepare_repo_settings(repo); > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + > + /* > + * The GIT_TEST_SPARSE_INDEX environment variable triggers the > -+ * extensions.sparseIndex config variable to be on. > ++ * index.sparse config variable to be on. > + */ > + if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { > + int err = enable_sparse_index(istate->repo); > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > - * GIT_TEST_SPARSE_INDEX environment variable. We will relax > - * this once we have a proper way to opt-in (and later still, > - * opt-out). > -+ * Only convert to sparse if extensions.sparseIndex is set. > ++ * Only convert to sparse if index.sparse is set. > */ > - if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) > + prepare_repo_settings(istate->repo); > 16: 923081e7e079 ! 17: 75fe9b0f57da sparse-checkout: toggle sparse index from builtin > @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees > +that is not completely understood by external tools. If you have trouble > +with this compatibility, then run `git sparse-checkout init --no-sparse-index` > +to rewrite your index to not be sparse. Older versions of Git will not > -+understand the `sparseIndex` repository extension and may fail to interact > -+with your repository until it is disabled. > ++understand the sparse directory entries index extension and may fail to > ++interact with your repository until it is disabled. > > 'set':: > Write a set of patterns to the sparse-checkout file, as given as > @@ builtin/sparse-checkout.c: static int sparse_checkout_init(int argc, const char > > ## sparse-index.c ## > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + return num_converted - start_converted; > + } > > - static int enable_sparse_index(struct repository *repo) > +-static int enable_sparse_index(struct repository *repo) > ++static int set_index_sparse_config(struct repository *repo, int enable) > { > - const char *config_path = repo_git_path(repo, "config.worktree"); > -+ int res; > - > - if (upgrade_repository_format(1) < 0) { > - warning(_("unable to upgrade repository format to enable sparse-index")); > - return -1; > - } > +- > - git_config_set_in_file_gently(config_path, > -- "extensions.sparseIndex", > +- "index.sparse", > - "true"); > -+ res = git_config_set_gently("extensions.sparseindex", "true"); > ++ int res; > ++ char *config_path = repo_git_path(repo, "config.worktree"); > ++ res = git_config_set_in_file_gently(config_path, > ++ "index.sparse", > ++ enable ? "true" : NULL); > ++ free(config_path); > > prepare_repo_settings(repo); > repo->settings.sparse_index = 1; > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > + > +int set_sparse_index_config(struct repository *repo, int enable) > +{ > -+ int res; > -+ > -+ if (enable) > -+ return enable_sparse_index(repo); > -+ > -+ /* Don't downgrade repository format, just remove the extension. */ > -+ res = git_config_set_gently("extensions.sparseindex", NULL); > ++ int res = set_index_sparse_config(repo, enable); > + > + prepare_repo_settings(repo); > -+ repo->settings.sparse_index = 0; > ++ repo->settings.sparse_index = enable; > + return res; > } > > @@ sparse-index.c: static int convert_to_sparse_rec(struct index_state *istate, > !core_apply_sparse_checkout || !core_sparse_checkout_cone) > return 0; > @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) > - istate->repo = the_repository; > - > - /* > -- * The GIT_TEST_SPARSE_INDEX environment variable triggers the > -- * extensions.sparseIndex config variable to be on. > -+ * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex > -+ * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), > -+ * then purposefully disable the setting. > + * The GIT_TEST_SPARSE_INDEX environment variable triggers the > + * index.sparse config variable to be on. > */ > - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { > - int err = enable_sparse_index(istate->repo); > @@ sparse-index.c: int convert_to_sparse(struct index_state *istate) > + set_sparse_index_config(istate->repo, test_env); > > /* > - * Only convert to sparse if extensions.sparseIndex is set. > + * Only convert to sparse if index.sparse is set. > > ## sparse-index.h ## > @@ sparse-index.h: struct index_state; > @@ t/t1092-sparse-checkout-compatibility.sh: init_repos () { > - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && > - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep > + git -C sparse-index sparse-checkout init --cone --sparse-index && > -+ test_cmp_config -C sparse-index true extensions.sparseindex && > ++ test_cmp_config -C sparse-index true index.sparse && > + git -C sparse-index sparse-checkout set deep > } > > 17: 6f1ad72c390d ! 18: 7f55a232e647 sparse-checkout: disable sparse-index > @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable > > +test_expect_success 'sparse-index enabled and disabled' ' > + git -C repo sparse-checkout init --cone --sparse-index && > -+ test_cmp_config -C repo true extensions.sparseIndex && > ++ test_cmp_config -C repo true index.sparse && > + test-tool -C repo read-cache --table >cache && > + grep " tree " cache && > + > @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'sparse-checkout disable > + test-tool -C repo read-cache --table >cache && > + ! grep " tree " cache && > + git -C repo config --list >config && > -+ ! grep extensions.sparseindex config > ++ ! grep index.sparse config > +' > + > test_expect_success 'cone mode: init and set' ' > 18: bd94e6b7d089 = 19: 365901809d9d cache-tree: integrate with sparse directory entries > 19: e7190376b806 = 20: 9b068c458898 sparse-index: loose integration with cache_tree_verify() > 20: bcf0a58eb38c = 21: 66602733cc95 p2000: add sparse-index repos I've read through the range-diff and individually read through the new patch 11. Perhaps unsurprisingly since you addressed all my feedback by about round 3, I didn't find any problems with this new version. Looks good to me.