On Wed, Mar 10, 2021 at 11:31 AM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > Here is the first full patch series submission coming out of the > sparse-index RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@xxxxxxxxx/ > > I won't waste too much space here, because PATCH 1 includes a sizeable > design document that describes the feature, the reasoning behind it, and my > plan for getting this implemented widely throughout the codebase. > > There are some new things here that were not in the RFC: > > * Design doc and format updates. (Patch 1) > * Performance test script. (Patches 2 and 20) > > Notably missing in this series from the RFC: > > * The mega-patch inserting ensure_full_index() throughout the codebase. > That will be a follow-up series to this one. > * The integrations with git status and git add to demonstrate the improved > performance. Those will also appear in their own series later. > > I plan to keep my latest work in this area in my 'sparse-index/wip' branch > [2]. It includes all of the work from the RFC right now, updated with the > work from this series. > > [2] https://github.com/derrickstolee/git/tree/sparse-index/wip > > > Updates in V2 > ============= > > * Various typos and awkward grammar is fixed. > * Cleaned up unnecessary commands in p2000-sparse-operations.sh > * Added a comment to the sparse_index member of struct index_state. > * Used tree_type, commit_type, and blob_type in test-read-cache.c. I read through the range-diff and comments from the previous series. There's only a few things left (as I noted in comments), but they're all pretty trivial so this one is: Reviewed-by: Elijah Newren <newren@xxxxxxxxx> > > Thanks, -Stolee > > Derrick Stolee (20): > sparse-index: design doc and format update > t/perf: add performance test for sparse operations > t1092: clean up script quoting > sparse-index: add guard to ensure full index > sparse-index: implement ensure_full_index() > t1092: compare sparse-checkout to sparse-index > test-read-cache: print cache entries with --table > test-tool: don't force full index > unpack-trees: ensure full index > sparse-checkout: hold pattern list in index > sparse-index: convert from full to sparse > submodule: sparse-index should not collapse links > unpack-trees: allow sparse directories > sparse-index: check index conversion happens > sparse-index: create extension for compatibility > sparse-checkout: toggle sparse index from builtin > sparse-checkout: disable sparse-index > cache-tree: integrate with sparse directory entries > sparse-index: loose integration with cache_tree_verify() > p2000: add sparse-index repos > > Documentation/config/extensions.txt | 8 + > Documentation/git-sparse-checkout.txt | 14 ++ > Documentation/technical/index-format.txt | 7 + > Documentation/technical/sparse-index.txt | 173 ++++++++++++++ > Makefile | 1 + > builtin/sparse-checkout.c | 44 +++- > cache-tree.c | 40 ++++ > cache.h | 18 +- > read-cache.c | 35 ++- > repo-settings.c | 15 ++ > repository.c | 11 +- > repository.h | 3 + > setup.c | 3 + > sparse-index.c | 290 +++++++++++++++++++++++ > sparse-index.h | 11 + > t/README | 3 + > t/helper/test-read-cache.c | 66 +++++- > t/perf/p2000-sparse-operations.sh | 102 ++++++++ > t/t1091-sparse-checkout-builtin.sh | 13 + > t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++-- > unpack-trees.c | 16 +- > 21 files changed, 969 insertions(+), 40 deletions(-) > create mode 100644 Documentation/technical/sparse-index.txt > create mode 100644 sparse-index.c > create mode 100644 sparse-index.h > create mode 100755 t/perf/p2000-sparse-operations.sh > > > base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/883 > > Range-diff vs v1: > > 1: daa9a6bcefbc ! 1: 2fe413fdac80 sparse-index: design doc and format update > @@ Documentation/technical/sparse-index.txt (new) > +If we need to discover the details for paths within that directory, we > +can parse trees to find that list. > + > -+This addition of sparse-directory entries violates expectations about the > ++At time of writing, sparse-directory entries violate expectations about the > +index format and its in-memory data structure. There are many consumers in > +the codebase that expect to iterate through all of the index entries and > +see only files. In addition, they expect to see all files at `HEAD`. One > @@ Documentation/technical/sparse-index.txt (new) > +* `git merge` > +* `git rebase` > + > ++Hopefully, commands such as `git merge` and `git rebase` can benefit > ++instead from merge algorithms that do not use the index as a data > ++structure, such as the merge-ORT strategy. As these topics mature, we > ++may enalbe the ORT strategy by default for repositories using the > ++sparse-index feature. > ++ > +Along with `git status` and `git add`, these commands cover the majority > +of users' interactions with the working directory. In addition, we can > +integrate with these commands: > 2: a8c6322a3dbe ! 2: 540ab5495065 t/perf: add performance test for sparse operations > @@ t/perf/p2000-sparse-operations.sh (new) > + # Remove submodules from the example repo, because our > + # duplication of the entire repo creates an unlikly data shape. > + git config --file .gitmodules --get-regexp "submodule.*.path" >modules && > -+ rm -f .gitmodules && > -+ git add .gitmodules && > ++ git rm -f .gitmodules && > + for module in $(awk "{print \$2}" modules) > + do > + git rm $module || return 1 > + done && > -+ git add . && > + git commit -m "remove submodules" && > + > + echo bogus >a && > 3: 6e783c88821e = 3: 5cbedb377b37 t1092: clean up script quoting > 4: 01da4c48a1fa = 4: 6e21f776e883 sparse-index: add guard to ensure full index > 5: 2b83989fbcd3 ! 5: 399ddb0bad56 sparse-index: implement ensure_full_index() > @@ cache.h: struct index_state { > updated_skipworktree : 1, > - fsmonitor_has_run_once : 1; > + fsmonitor_has_run_once : 1, > ++ > ++ /* > ++ * sparse_index == 1 when sparse-directory > ++ * entries exist. Requires sparse-checkout > ++ * in cone mode. > ++ */ > + sparse_index : 1; > struct hashmap name_hash; > struct hashmap dir_hash; > 6: c9910a37579c = 6: eac2db5efc22 t1092: compare sparse-checkout to sparse-index > 7: 3d92df7a0cf9 ! 7: e9c82d2eda82 test-read-cache: print cache entries with --table > @@ Commit message > > ## t/helper/test-read-cache.c ## > @@ > + #include "test-tool.h" > #include "cache.h" > #include "config.h" > - > ++#include "blob.h" > ++#include "commit.h" > ++#include "tree.h" > ++ > +static void print_cache_entry(struct cache_entry *ce) > +{ > -+ printf("%06o ", ce->ce_mode & 0777777); > ++ const char *type; > ++ printf("%06o ", ce->ce_mode & 0177777); > + > + if (S_ISSPARSEDIR(ce->ce_mode)) > -+ printf("tree "); > ++ type = tree_type; > + else if (S_ISGITLINK(ce->ce_mode)) > -+ printf("commit "); > ++ type = commit_type; > + else > -+ printf("blob "); > ++ type = blob_type; > + > -+ printf("%s\t%s\n", > ++ printf("%s %s\t%s\n", > ++ type, > + oid_to_hex(&ce->oid), > + ce->name); > +} > + > -+static void print_cache(struct index_state *cache) > ++static void print_cache(struct index_state *istate) > +{ > + int i; > -+ for (i = 0; i < the_index.cache_nr; i++) > -+ print_cache_entry(the_index.cache[i]); > ++ for (i = 0; i < istate->cache_nr; i++) > ++ print_cache_entry(istate->cache[i]); > +} > -+ > + > int cmd__read_cache(int argc, const char **argv) > { > + struct repository *r = the_repository; > 8: 94373e2bfbbc ! 8: 243541fc5820 test-tool: don't force full index > @@ Commit message > > ## t/helper/test-read-cache.c ## > @@ > - #include "test-tool.h" > - #include "cache.h" > - #include "config.h" > + #include "blob.h" > + #include "commit.h" > + #include "tree.h" > +#include "sparse-index.h" > > static void print_cache_entry(struct cache_entry *ce) > 9: e71f033c2871 = 9: 48f65093b3da unpack-trees: ensure full index > 10: f86d3dc154d1 ! 10: 83aac8b7a1ec sparse-checkout: hold pattern list in index > @@ Commit message > pattern set, we need access to that in-memory copy. Place a pointer to > a 'struct pattern_list' in the index so we can access this on-demand. > This will be used in the next change which uses the sparse-checkout > - definition to filter out directories that are outsie the sparse cone. > + definition to filter out directories that are outside the sparse cone. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > 11: a2d77c23a0cb ! 11: f6db0c27a285 sparse-index: convert from full to sparse > @@ read-cache.c: int verify_path(const char *path, unsigned mode) > return 0; > + /* > + * allow terminating directory separators for > -+ * sparse directory enries. > ++ * sparse directory entries. > + */ > + if (c == '\0') > + return S_ISDIR(mode); > @@ sparse-index.c > + struct cache_entry *ce = istate->cache[i]; > + > + /* > -+ * Detect if this is a normal entry oustide of any subtree > ++ * Detect if this is a normal entry outside of any subtree > + * entry. > + */ > + base = ce->name + ct_pathlen; > 12: 4405a9115c3b = 12: f2a3e7298798 submodule: sparse-index should not collapse links > 13: fda23f07e6a2 ! 13: 6f1ebe6ccc08 unpack-trees: allow sparse directories > @@ Commit message > is possible to have a directory in a sparse index as long as that entry > is itself marked with the skip-worktree bit. > > - The negation of the 'pos' variable must be conditioned to only when it > - starts as negative. This is identical behavior as before when the index > - is full. > + The 'pos' variable is assigned a negative value if an exact match is not > + found. Since a directory name can be an exact match, it is no longer an > + error to have a nonnegative 'pos' value. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > 14: 7d4627574bb8 = 14: 3fa684b315fb sparse-index: check index conversion happens > 15: 564503f78784 ! 15: d74576d677f6 sparse-index: create extension for compatibility > @@ Commit message > > We _could_ add a new index version that explicitly adds these > capabilities, but there are nuances to index formats 2, 3, and 4 that > - are still valuable to select as options. For now, create a repo > - extension, "extensions.sparseIndex", that specifies that the tool > - reading this repository must understand sparse directory entries. > + are still valuable to select as options. Until we add index format > + version 5, create a repo extension, "extensions.sparseIndex", that > + specifies that the tool reading this repository must understand sparse > + directory entries. > > This change only encodes the extension and enables it when > GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI > @@ Documentation/config/extensions.txt: extensions.objectFormat:: > + When combined with `core.sparseCheckout=true` and > + `core.sparseCheckoutCone=true`, the index may contain entries > + corresponding to directories outside of the sparse-checkout > -+ definition. Versions of Git that do not understand this extension > -+ do not expect directory entries in the index. > ++ definition in lieu of containing each path under such directories. > ++ Versions of Git that do not understand this extension do not > ++ expect directory entries in the index. > > ## cache.h ## > @@ cache.h: struct repository_format { > 16: 6d6b230e3318 ! 16: e530ca5f668d sparse-checkout: toggle sparse index from builtin > @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees > +a sparse index until they are properly integrated with the feature. > ++ > +**WARNING:** Using a sparse index requires modifying the index in a way > -+that is not completely understood by other tools. Enabling sparse index > -+enables the `extensions.spareseIndex` config value, which might cause > -+other tools to stop working with your repository. If you have trouble with > -+this compatibility, then run `git sparse-checkout sparse-index disable` to > -+remove this config and rewrite your index to not be sparse. > ++that is not completely understood by external tools. If you have trouble > ++with this compatibility, then run `git sparse-checkout sparse-index disable` > ++to rewrite your index to not be sparse. Older versions of Git will not > ++understand the `sparseIndex` repository extension and may fail to interact > ++with your repository until it is disabled. > > 'set':: > Write a set of patterns to the sparse-checkout file, as given as > 17: bcf960ef2362 = 17: 42d0da9c5def sparse-checkout: disable sparse-index > 18: e6afec58674e = 18: 6bb0976a6295 cache-tree: integrate with sparse directory entries > 19: 2be4981fe698 = 19: 07f34e80609a sparse-index: loose integration with cache_tree_verify() > 20: a738b0ba8ab4 = 20: 41e3b56b9c17 p2000: add sparse-index repos > > -- > gitgitgadget