On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > Here is the second patch series submission coming out of the sparse-index > RFC [1]. > > [1] > https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@xxxxxxxxx/ > > This is based on v3 of the format series [2]. > > [2] > https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@xxxxxxxxx/ > > The point of this series is to insert protections for the consumers of the > in-memory index to avoid unintended behavior change when using a sparse > index versus a full one. > > We mark certain regions of code as needing a full index, so we call > ensure_full_index() to expand a sparse index to a full one, if necessary. > These protections are inserted file-by-file in every loop over all cache > entries. Well, "most" loops, because some are going to be handled in the > very next series so I leave them out. > > Many callers use index_name_pos() to find a path by name. In these cases, we > can check if that position resolves to a sparse directory instance. In those > cases, we just expand to a full index and run the search again. > > The last few patches deal with the name-hash hashtable for doing O(1) > lookups. > > These protections don't do much right now, since the previous series created > the_repository->settings.command_requires_full_index to guard all index > reads and writes to ensure the in-memory copy is full for commands that have > not been tested with the sparse index yet. > > However, after this series is complete, we now have a straight-forward plan > for making commands "sparse aware" one-by-one: > > 1. Disable settings.command_requires_full_index to allow an in-memory > sparse-index. > 2. Run versions of that command under a debugger, breaking on > ensure_full_index(). > 3. Examine the call stack to determine the context of that expansion, then > implement the proper behavior in those locations. > 4. Add tests to ensure we are checking this logic in the presence of sparse > directory entries. > > I will admit that mostly it is the writing of the test cases that takes the > most time in the conversions I've done so far. > > > Updates in v2 > ============= > > * Rebased onto v5 of ds/sparse-index > * Updated the technical doc to describe how these protections are guards to > keep behavior consistent between a sparse-index and a full index. Whether > or not that behavior is "correct" can be interrogated later. > * Calls to ensure_full_index() are marked with a TODO comment saying these > calls should be audited later (with tests). > * Fixed an incorrectly squashed commit message. > * Dropped the diff-lib.c commit because it was erroneously included in v2. > * Dropped the merge-ort.c commit because of conflicts with work in flight > and a quick audit that it is not needed. > * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and > found it equivalent to what I would have done. You addressed most my concerns moving from the RFC to v1, and this version addresses most of my remaining comments/questions/suggestions. I only had two minor comments on the last two patches. This series is looking pretty good to me. > Thanks, -Stolee > > Derrick Stolee (25): > sparse-index: API protection strategy > *: remove 'const' qualifier for struct index_state > read-cache: expand on query into sparse-directory entry > cache: move ensure_full_index() to cache.h > add: ensure full index > checkout-index: ensure full index > checkout: ensure full index > commit: ensure full index > difftool: ensure full index > fsck: ensure full index > grep: ensure full index > ls-files: ensure full index > merge-index: ensure full index > rm: ensure full index > stash: ensure full index > update-index: ensure full index > dir: ensure full index > entry: ensure full index > merge-recursive: ensure full index > pathspec: ensure full index > read-cache: ensure full index > resolve-undo: ensure full index > revision: ensure full index > sparse-index: expand_to_path() > name-hash: use expand_to_path() > > Documentation/technical/sparse-index.txt | 37 +++++++++++- > attr.c | 14 ++--- > attr.h | 4 +- > builtin/add.c | 2 + > builtin/checkout-index.c | 2 + > builtin/checkout.c | 5 ++ > builtin/commit.c | 4 ++ > builtin/difftool.c | 3 + > builtin/fsck.c | 2 + > builtin/grep.c | 2 + > builtin/ls-files.c | 14 +++-- > builtin/merge-index.c | 5 ++ > builtin/rm.c | 2 + > builtin/stash.c | 2 + > builtin/update-index.c | 2 + > cache.h | 7 ++- > convert.c | 26 ++++----- > convert.h | 20 +++---- > dir.c | 14 +++-- > dir.h | 8 +-- > entry.c | 2 + > merge-recursive.c | 4 +- > name-hash.c | 10 ++++ > pathspec.c | 8 ++- > pathspec.h | 6 +- > read-cache.c | 35 ++++++++++-- > resolve-undo.c | 4 ++ > revision.c | 2 + > sparse-index.c | 72 ++++++++++++++++++++++++ > sparse-index.h | 14 ++++- > submodule.c | 6 +- > submodule.h | 6 +- > 32 files changed, 273 insertions(+), 71 deletions(-) > > > base-commit: 66602733cc95d9a53594520cd8b28d3338e258ea > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/906 > > Range-diff vs v1: > > 3: bbf19f8a2be5 ! 1: 7484e085e342 sparse-index: API protection strategy > @@ Commit message > Looking for non-zero stage: > * builtin/add.c:chmod_pathspec() > * builtin/merge.c:count_unmerged_entries() > + * merge-ort.c:record_conflicted_index_entries() > * read-cache.c:unmerged_index() > * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining() > * revision.c:prepare_show_merge() > @@ Commit message > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > ## Documentation/technical/sparse-index.txt ## > -@@ Documentation/technical/sparse-index.txt: also introduce other features that have been considered for improving the > - index, as well. > +@@ Documentation/technical/sparse-index.txt: index, as well. > > Next, consumers of the index will be guarded against operating on a > --sparse-index by inserting calls to `ensure_full_index()` or > + sparse-index by inserting calls to `ensure_full_index()` or > -`expand_index_to_path()`. After these guards are in place, we can begin > -leaving sparse-directory entries in the in-memory index structure. > -+sparse-index by inserting calls to `ensure_full_index()` before iterating > -+over all cache entries. If a specific path is requested, then those will > ++`expand_index_to_path()`. If a specific path is requested, then those will > +be protected from within the `index_file_exists()` and `index_name_pos()` > -+API calls: they will call `ensure_full_index()` if necessary. > ++API calls: they will call `ensure_full_index()` if necessary. The > ++intention here is to preserve existing behavior when interacting with a > ++sparse-checkout. We don't want a change to happen by accident, without > ++tests. Many of these locations may not need any change before removing the > ++guards, but we should not do so without tests to ensure the expected > ++behavior happens. > ++ > ++It may be desirable to _change_ the behavior of some commands in the > ++presence of a sparse index or more generally in any sparse-checkout > ++scenario. In such cases, these should be carefully communicated and > ++tested. No such behavior changes are intended during this phase. > + > +During a scan of the codebase, not every iteration of the cache entries > +needs an `ensure_full_index()` check. The basic reasons include: > @@ Documentation/technical/sparse-index.txt: also introduce other features that hav > + > +6. The sparse-index is disabled at this point when using the split-index > + feature, so no effort is made to protect the split-index API. > -+ > -+After these guards are in place, we can begin leaving sparse-directory > -+entries in the in-memory index structure. > > Even after inserting these guards, we will keep expanding sparse-indexes > for most Git commands using the `command_requires_full_index` repository > 1: 628e16dd3fc7 = 2: 098b2c9ef352 *: remove 'const' qualifier for struct index_state > 2: 8e11e8917019 = 3: 737d27e18d64 read-cache: expand on query into sparse-directory entry > 4: 7e4079c48eb2 = 4: db5c100f3e2b cache: move ensure_full_index() to cache.h > 5: 142df1ab8e3e ! 5: 4a5fc2eb5a9f add: ensure full index > @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths > { > int i, retval = 0; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > struct cache_entry *ce = active_cache[i]; > 6: bfa0164cc3c1 ! 6: 11c38f7277c5 checkout-index: ensure full index > @@ builtin/checkout-index.c: static void checkout_all(const char *prefix, int prefi > int i, errs = 0; > struct cache_entry *last_ce = NULL; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr ; i++) { > struct cache_entry *ce = active_cache[i]; > 7: 1bb7b6117e41 ! 7: fd04adbb3f79 checkout: ensure full index > @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt > NULL); > > enable_delayed_checkout(&state); > ++ > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (pos = 0; pos < active_nr; pos++) { > struct cache_entry *ce = active_cache[pos]; > @@ builtin/checkout.c: static int checkout_paths(const struct checkout_opts *opts, > * Make sure all pathspecs participated in locating the paths > * to be checked out. > */ > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (pos = 0; pos < active_nr; pos++) > if (opts->overlay_mode) > 8: a59e45c4ae8f ! 8: 65704f39edc9 commit: ensure full index > @@ builtin/commit.c: static int list_paths(struct string_list *list, const char *wi > free(max_prefix); > } > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > const struct cache_entry *ce = active_cache[i]; > @@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const cha > if (get_oid(parent, &oid)) { > int i, ita_nr = 0; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) > if (ce_intent_to_add(active_cache[i])) > 9: 83040e7558f3 ! 9: 739f3fe9edf2 difftool: ensure full index > @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co > setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1); > rc = run_command_v_opt(helper_argv, flags); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&wtindex); > + > /* > 10: 988f7bd2d736 ! 10: 779a86ad1ec4 fsck: ensure full index > @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix) > verify_index_checksum = 1; > verify_ce_order = 1; > read_cache(); > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > unsigned int mode; > 11: abe189051a0c ! 11: 8c0d377054fa grep: ensure full index > @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt, > if (repo_read_index(repo) < 0) > die(_("index file corrupt")); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(repo->index); > for (nr = 0; nr < repo->index->cache_nr; nr++) { > const struct cache_entry *ce = repo->index->cache[nr]; > 12: 00c8a7e1d119 ! 12: beaa1467cabb ls-files: ensure full index > @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s > > if (!(show_cached || show_stage || show_deleted || show_modified)) > return; > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(repo->index); > for (i = 0; i < repo->index->cache_nr; i++) { > const struct cache_entry *ce = repo->index->cache[i]; > @@ builtin/ls-files.c: void overlay_tree_on_index(struct index_state *istate, > die("bad tree-ish %s", tree_name); > > /* Hoist the unmerged entries up to stage #3 to make room */ > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > 13: 1288f02177d2 ! 13: 73684141fcff merge-index: ensure full index > @@ builtin/merge-index.c: static void merge_one_path(const char *path) > static void merge_all(void) > { > int i; > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > const struct cache_entry *ce = active_cache[i]; > @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch > > read_cache(); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > + > i = 1; > 14: d6816560b32f ! 14: 6ea81a49c6b5 rm: ensure full index > @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix) > > seen = xcalloc(pathspec.nr, 1); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > const struct cache_entry *ce = active_cache[i]; > 15: 92ccd31dd343 ! 15: 49ca5ed05c8d sparse-checkout: ensure full index > @@ Metadata > Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > ## Commit message ## > - sparse-checkout: ensure full index > + stash: ensure full index > > - When adjusting the sparse-checkout definition, ensure that a sparse > - index is expanded to a full one to avoid unexpected behavior. This is a > - top candidate for later integration with the sparse-index, but requires > - careful testing. > + Before iterating over all cache entries, ensure that a sparse index is > + expanded to a full index to avoid unexpected behavior. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char > int i; > char *ps_matched = xcalloc(ps->nr, 1); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) > ce_path_match(&the_index, active_cache[i], ps, > 16: 796449758a08 ! 16: 9c4bb187c15d update-index: ensure full index > @@ builtin/update-index.c: static int do_reupdate(int ac, const char **av, > */ > has_head = 0; > redo: > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(&the_index); > for (pos = 0; pos < active_nr; pos++) { > const struct cache_entry *ce = active_cache[pos]; > 17: 45bbed6150a2 < -: ------------ diff-lib: ensure full index > 18: 8ce1f80985a4 ! 17: fae4c078c3ef dir: ensure full index > @@ dir.c: static void connect_wt_gitdir_in_nested(const char *sub_worktree, > if (repo_read_index(&subrepo) < 0) > die(_("index file corrupt in repo %s"), subrepo.gitdir); > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(subrepo.index); > for (i = 0; i < subrepo.index->cache_nr; i++) { > const struct cache_entry *ce = subrepo.index->cache[i]; > 19: 6e0b452f44c1 ! 18: 2b9180ee77d3 entry: ensure full index > @@ entry.c: static void mark_colliding_entries(const struct checkout *state, > > ce->ce_flags |= CE_MATCHED; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(state->istate); > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > 20: fb4c7038b746 < -: ------------ merge-ort: ensure full index > 21: 57d59825627f ! 19: 1e3f6085a405 merge-recursive: ensure full index > @@ merge-recursive.c: static struct string_list *get_unmerged(struct index_state *i > > unmerged->strdup_strings = 1; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > struct string_list_item *item; > 22: f4c470686b27 ! 20: e62a597a9725 pathspec: ensure full index > @@ pathspec.c: void add_pathspec_matches_against_index(const struct pathspec *paths > num_unmatched++; > if (!num_unmatched) > return; > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > const struct cache_entry *ce = istate->cache[i]; > 23: 81519782d4b2 ! 21: ebfffdbdd6ad read-cache: ensure full index > @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags, > */ > preload_index(istate, pathspec, 0); > trace2_region_enter("index", "refresh", NULL); > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce, *new_entry; > @@ read-cache.c: int repo_index_has_changes(struct repository *repo, > diff_flush(&opt); > return opt.flags.has_changes != 0; > } else { > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; sb && i < istate->cache_nr; i++) { > if (i) > 24: 39e8bea0c1ca ! 22: 495b07a87973 resolve-undo: ensure full index > @@ resolve-undo.c: void unmerge_marked_index(struct index_state *istate) > if (!istate->resolve_undo) > return; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > const struct cache_entry *ce = istate->cache[i]; > @@ resolve-undo.c: void unmerge_index(struct index_state *istate, const struct path > if (!istate->resolve_undo) > return; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > const struct cache_entry *ce = istate->cache[i]; > 25: 6abb1813ae10 ! 23: 3144114d1a75 revision: ensure full index > @@ revision.c: static void do_add_index_objects_to_pending(struct rev_info *revs, > { > int i; > > ++ /* TODO: audit for interaction with sparse-index. */ > + ensure_full_index(istate); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > 26: 1690d33c11e0 = 24: d52c72b4a7b9 sparse-index: expand_to_path() > 27: 40cfc59f695a = 25: 7e2d3fae9a2a name-hash: use expand_to_path() > > -- > gitgitgadget