[PATCH v2 00/25] Sparse Index: API protections

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is the second patch series submission coming out of the sparse-index
RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@xxxxxxxxx/

This is based on v3 of the format series [2].

[2]
https://lore.kernel.org/git/pull.883.v3.git.1615912983.gitgitgadget@xxxxxxxxx/

The point of this series is to insert protections for the consumers of the
in-memory index to avoid unintended behavior change when using a sparse
index versus a full one.

We mark certain regions of code as needing a full index, so we call
ensure_full_index() to expand a sparse index to a full one, if necessary.
These protections are inserted file-by-file in every loop over all cache
entries. Well, "most" loops, because some are going to be handled in the
very next series so I leave them out.

Many callers use index_name_pos() to find a path by name. In these cases, we
can check if that position resolves to a sparse directory instance. In those
cases, we just expand to a full index and run the search again.

The last few patches deal with the name-hash hashtable for doing O(1)
lookups.

These protections don't do much right now, since the previous series created
the_repository->settings.command_requires_full_index to guard all index
reads and writes to ensure the in-memory copy is full for commands that have
not been tested with the sparse index yet.

However, after this series is complete, we now have a straight-forward plan
for making commands "sparse aware" one-by-one:

 1. Disable settings.command_requires_full_index to allow an in-memory
    sparse-index.
 2. Run versions of that command under a debugger, breaking on
    ensure_full_index().
 3. Examine the call stack to determine the context of that expansion, then
    implement the proper behavior in those locations.
 4. Add tests to ensure we are checking this logic in the presence of sparse
    directory entries.

I will admit that mostly it is the writing of the test cases that takes the
most time in the conversions I've done so far.


Updates in v2
=============

 * Rebased onto v5 of ds/sparse-index
 * Updated the technical doc to describe how these protections are guards to
   keep behavior consistent between a sparse-index and a full index. Whether
   or not that behavior is "correct" can be interrogated later.
 * Calls to ensure_full_index() are marked with a TODO comment saying these
   calls should be audited later (with tests).
 * Fixed an incorrectly squashed commit message.
 * Dropped the diff-lib.c commit because it was erroneously included in v2.
 * Dropped the merge-ort.c commit because of conflicts with work in flight
   and a quick audit that it is not needed.
 * I reviewed the merge of this topic with mt/add-rm-in-sparse-checkout and
   found it equivalent to what I would have done.

Thanks, -Stolee

Derrick Stolee (25):
  sparse-index: API protection strategy
  *: remove 'const' qualifier for struct index_state
  read-cache: expand on query into sparse-directory entry
  cache: move ensure_full_index() to cache.h
  add: ensure full index
  checkout-index: ensure full index
  checkout: ensure full index
  commit: ensure full index
  difftool: ensure full index
  fsck: ensure full index
  grep: ensure full index
  ls-files: ensure full index
  merge-index: ensure full index
  rm: ensure full index
  stash: ensure full index
  update-index: ensure full index
  dir: ensure full index
  entry: ensure full index
  merge-recursive: ensure full index
  pathspec: ensure full index
  read-cache: ensure full index
  resolve-undo: ensure full index
  revision: ensure full index
  sparse-index: expand_to_path()
  name-hash: use expand_to_path()

 Documentation/technical/sparse-index.txt | 37 +++++++++++-
 attr.c                                   | 14 ++---
 attr.h                                   |  4 +-
 builtin/add.c                            |  2 +
 builtin/checkout-index.c                 |  2 +
 builtin/checkout.c                       |  5 ++
 builtin/commit.c                         |  4 ++
 builtin/difftool.c                       |  3 +
 builtin/fsck.c                           |  2 +
 builtin/grep.c                           |  2 +
 builtin/ls-files.c                       | 14 +++--
 builtin/merge-index.c                    |  5 ++
 builtin/rm.c                             |  2 +
 builtin/stash.c                          |  2 +
 builtin/update-index.c                   |  2 +
 cache.h                                  |  7 ++-
 convert.c                                | 26 ++++-----
 convert.h                                | 20 +++----
 dir.c                                    | 14 +++--
 dir.h                                    |  8 +--
 entry.c                                  |  2 +
 merge-recursive.c                        |  4 +-
 name-hash.c                              | 10 ++++
 pathspec.c                               |  8 ++-
 pathspec.h                               |  6 +-
 read-cache.c                             | 35 ++++++++++--
 resolve-undo.c                           |  4 ++
 revision.c                               |  2 +
 sparse-index.c                           | 72 ++++++++++++++++++++++++
 sparse-index.h                           | 14 ++++-
 submodule.c                              |  6 +-
 submodule.h                              |  6 +-
 32 files changed, 273 insertions(+), 71 deletions(-)


base-commit: 66602733cc95d9a53594520cd8b28d3338e258ea
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-906%2Fderrickstolee%2Fsparse-index%2Fprotections-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-906/derrickstolee/sparse-index/protections-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/906

Range-diff vs v1:

  3:  bbf19f8a2be5 !  1:  7484e085e342 sparse-index: API protection strategy
     @@ Commit message
          Looking for non-zero stage:
          * builtin/add.c:chmod_pathspec()
          * builtin/merge.c:count_unmerged_entries()
     +    * merge-ort.c:record_conflicted_index_entries()
          * read-cache.c:unmerged_index()
          * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining()
          * revision.c:prepare_show_merge()
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Documentation/technical/sparse-index.txt ##
     -@@ Documentation/technical/sparse-index.txt: also introduce other features that have been considered for improving the
     - index, as well.
     +@@ Documentation/technical/sparse-index.txt: index, as well.
       
       Next, consumers of the index will be guarded against operating on a
     --sparse-index by inserting calls to `ensure_full_index()` or
     + sparse-index by inserting calls to `ensure_full_index()` or
      -`expand_index_to_path()`. After these guards are in place, we can begin
      -leaving sparse-directory entries in the in-memory index structure.
     -+sparse-index by inserting calls to `ensure_full_index()` before iterating
     -+over all cache entries. If a specific path is requested, then those will
     ++`expand_index_to_path()`. If a specific path is requested, then those will
      +be protected from within the `index_file_exists()` and `index_name_pos()`
     -+API calls: they will call `ensure_full_index()` if necessary.
     ++API calls: they will call `ensure_full_index()` if necessary. The
     ++intention here is to preserve existing behavior when interacting with a
     ++sparse-checkout. We don't want a change to happen by accident, without
     ++tests. Many of these locations may not need any change before removing the
     ++guards, but we should not do so without tests to ensure the expected
     ++behavior happens.
     ++
     ++It may be desirable to _change_ the behavior of some commands in the
     ++presence of a sparse index or more generally in any sparse-checkout
     ++scenario. In such cases, these should be carefully communicated and
     ++tested. No such behavior changes are intended during this phase.
      +
      +During a scan of the codebase, not every iteration of the cache entries
      +needs an `ensure_full_index()` check. The basic reasons include:
     @@ Documentation/technical/sparse-index.txt: also introduce other features that hav
      +
      +6. The sparse-index is disabled at this point when using the split-index
      +   feature, so no effort is made to protect the split-index API.
     -+
     -+After these guards are in place, we can begin leaving sparse-directory
     -+entries in the in-memory index structure.
       
       Even after inserting these guards, we will keep expanding sparse-indexes
       for most Git commands using the `command_requires_full_index` repository
  1:  628e16dd3fc7 =  2:  098b2c9ef352 *: remove 'const' qualifier for struct index_state
  2:  8e11e8917019 =  3:  737d27e18d64 read-cache: expand on query into sparse-directory entry
  4:  7e4079c48eb2 =  4:  db5c100f3e2b cache: move ensure_full_index() to cache.h
  5:  142df1ab8e3e !  5:  4a5fc2eb5a9f add: ensure full index
     @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
       {
       	int i, retval = 0;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		struct cache_entry *ce = active_cache[i];
  6:  bfa0164cc3c1 !  6:  11c38f7277c5 checkout-index: ensure full index
     @@ builtin/checkout-index.c: static void checkout_all(const char *prefix, int prefi
       	int i, errs = 0;
       	struct cache_entry *last_ce = NULL;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr ; i++) {
       		struct cache_entry *ce = active_cache[i];
  7:  1bb7b6117e41 !  7:  fd04adbb3f79 checkout: ensure full index
     @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
       			       NULL);
       
       	enable_delayed_checkout(&state);
     ++
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++) {
       		struct cache_entry *ce = active_cache[pos];
     @@ builtin/checkout.c: static int checkout_paths(const struct checkout_opts *opts,
       	 * Make sure all pathspecs participated in locating the paths
       	 * to be checked out.
       	 */
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++)
       		if (opts->overlay_mode)
  8:  a59e45c4ae8f !  8:  65704f39edc9 commit: ensure full index
     @@ builtin/commit.c: static int list_paths(struct string_list *list, const char *wi
       		free(max_prefix);
       	}
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
     @@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const cha
       		if (get_oid(parent, &oid)) {
       			int i, ita_nr = 0;
       
     ++			/* TODO: audit for interaction with sparse-index. */
      +			ensure_full_index(&the_index);
       			for (i = 0; i < active_nr; i++)
       				if (ce_intent_to_add(active_cache[i]))
  9:  83040e7558f3 !  9:  739f3fe9edf2 difftool: ensure full index
     @@ builtin/difftool.c: static int run_dir_diff(const char *extcmd, int symlinks, co
       		setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
       	rc = run_command_v_opt(helper_argv, flags);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&wtindex);
      +
       	/*
 10:  988f7bd2d736 ! 10:  779a86ad1ec4 fsck: ensure full index
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
       		verify_index_checksum = 1;
       		verify_ce_order = 1;
       		read_cache();
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(&the_index);
       		for (i = 0; i < active_nr; i++) {
       			unsigned int mode;
 11:  abe189051a0c ! 11:  8c0d377054fa grep: ensure full index
     @@ builtin/grep.c: static int grep_cache(struct grep_opt *opt,
       	if (repo_read_index(repo) < 0)
       		die(_("index file corrupt"));
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(repo->index);
       	for (nr = 0; nr < repo->index->cache_nr; nr++) {
       		const struct cache_entry *ce = repo->index->cache[nr];
 12:  00c8a7e1d119 ! 12:  beaa1467cabb ls-files: ensure full index
     @@ builtin/ls-files.c: static void show_files(struct repository *repo, struct dir_s
       
       	if (!(show_cached || show_stage || show_deleted || show_modified))
       		return;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(repo->index);
       	for (i = 0; i < repo->index->cache_nr; i++) {
       		const struct cache_entry *ce = repo->index->cache[i];
     @@ builtin/ls-files.c: void overlay_tree_on_index(struct index_state *istate,
       		die("bad tree-ish %s", tree_name);
       
       	/* Hoist the unmerged entries up to stage #3 to make room */
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
 13:  1288f02177d2 ! 13:  73684141fcff merge-index: ensure full index
     @@ builtin/merge-index.c: static void merge_one_path(const char *path)
       static void merge_all(void)
       {
       	int i;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
     @@ builtin/merge-index.c: int cmd_merge_index(int argc, const char **argv, const ch
       
       	read_cache();
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
      +
       	i = 1;
 14:  d6816560b32f ! 14:  6ea81a49c6b5 rm: ensure full index
     @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
       
       	seen = xcalloc(pathspec.nr, 1);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (i = 0; i < active_nr; i++) {
       		const struct cache_entry *ce = active_cache[i];
 15:  92ccd31dd343 ! 15:  49ca5ed05c8d sparse-checkout: ensure full index
     @@ Metadata
      Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    sparse-checkout: ensure full index
     +    stash: ensure full index
      
     -    When adjusting the sparse-checkout definition, ensure that a sparse
     -    index is expanded to a full one to avoid unexpected behavior. This is a
     -    top candidate for later integration with the sparse-index, but requires
     -    careful testing.
     +    Before iterating over all cache entries, ensure that a sparse index is
     +    expanded to a full index to avoid unexpected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
     @@ builtin/stash.c: static int do_push_stash(const struct pathspec *ps, const char
       		int i;
       		char *ps_matched = xcalloc(ps->nr, 1);
       
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(&the_index);
       		for (i = 0; i < active_nr; i++)
       			ce_path_match(&the_index, active_cache[i], ps,
 16:  796449758a08 ! 16:  9c4bb187c15d update-index: ensure full index
     @@ builtin/update-index.c: static int do_reupdate(int ac, const char **av,
       		 */
       		has_head = 0;
        redo:
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(&the_index);
       	for (pos = 0; pos < active_nr; pos++) {
       		const struct cache_entry *ce = active_cache[pos];
 17:  45bbed6150a2 <  -:  ------------ diff-lib: ensure full index
 18:  8ce1f80985a4 ! 17:  fae4c078c3ef dir: ensure full index
     @@ dir.c: static void connect_wt_gitdir_in_nested(const char *sub_worktree,
       	if (repo_read_index(&subrepo) < 0)
       		die(_("index file corrupt in repo %s"), subrepo.gitdir);
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(subrepo.index);
       	for (i = 0; i < subrepo.index->cache_nr; i++) {
       		const struct cache_entry *ce = subrepo.index->cache[i];
 19:  6e0b452f44c1 ! 18:  2b9180ee77d3 entry: ensure full index
     @@ entry.c: static void mark_colliding_entries(const struct checkout *state,
       
       	ce->ce_flags |= CE_MATCHED;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(state->istate);
       	for (i = 0; i < state->istate->cache_nr; i++) {
       		struct cache_entry *dup = state->istate->cache[i];
 20:  fb4c7038b746 <  -:  ------------ merge-ort: ensure full index
 21:  57d59825627f ! 19:  1e3f6085a405 merge-recursive: ensure full index
     @@ merge-recursive.c: static struct string_list *get_unmerged(struct index_state *i
       
       	unmerged->strdup_strings = 1;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct string_list_item *item;
 22:  f4c470686b27 ! 20:  e62a597a9725 pathspec: ensure full index
     @@ pathspec.c: void add_pathspec_matches_against_index(const struct pathspec *paths
       			num_unmatched++;
       	if (!num_unmatched)
       		return;
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
 23:  81519782d4b2 ! 21:  ebfffdbdd6ad read-cache: ensure full index
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	 */
       	preload_index(istate, pathspec, 0);
       	trace2_region_enter("index", "refresh", NULL);
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
     @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
       		diff_flush(&opt);
       		return opt.flags.has_changes != 0;
       	} else {
     ++		/* TODO: audit for interaction with sparse-index. */
      +		ensure_full_index(istate);
       		for (i = 0; sb && i < istate->cache_nr; i++) {
       			if (i)
 24:  39e8bea0c1ca ! 22:  495b07a87973 resolve-undo: ensure full index
     @@ resolve-undo.c: void unmerge_marked_index(struct index_state *istate)
       	if (!istate->resolve_undo)
       		return;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
     @@ resolve-undo.c: void unmerge_index(struct index_state *istate, const struct path
       	if (!istate->resolve_undo)
       		return;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		const struct cache_entry *ce = istate->cache[i];
 25:  6abb1813ae10 ! 23:  3144114d1a75 revision: ensure full index
     @@ revision.c: static void do_add_index_objects_to_pending(struct rev_info *revs,
       {
       	int i;
       
     ++	/* TODO: audit for interaction with sparse-index. */
      +	ensure_full_index(istate);
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
 26:  1690d33c11e0 = 24:  d52c72b4a7b9 sparse-index: expand_to_path()
 27:  40cfc59f695a = 25:  7e2d3fae9a2a name-hash: use expand_to_path()

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux