[PATCH v2 0/8] Sparse-index: integrate with status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is the first "payoff" series in the sparse-index work. It makes 'git
status' very fast when a sparse-index is enabled on a repository with
cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee


Updates in V2
=============

 * Based on the feedback, it is clear that 'git add' will require much more
   careful testing and thought. I'm splitting it out of this series and it
   will return with a follow-up.
 * Test cases are improved, both in coverage and organization.
 * The previous "unpack-trees: make sparse aware" patch is split into three
   now.
 * Stale messages based on an old implementation of the "protections" topic
   are now fixed.
 * Performance tests were re-run.

Derrick Stolee (8):
  t1092: add tests for status/add and sparse files
  unpack-trees: preserve cache_bottom
  unpack-trees: compare sparse directories correctly
  unpack-trees: stop recursing into sparse directories
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  fsmonitor: test with sparse index

 builtin/commit.c                         |  3 ++
 dir.c                                    | 11 +++++
 read-cache.c                             | 10 +++-
 t/t1092-sparse-checkout-compatibility.sh | 61 ++++++++++++++++++++++--
 t/t7519-status-fsmonitor.sh              | 48 +++++++++++++++++++
 unpack-trees.c                           | 25 ++++++++--
 wt-status.c                              | 14 ++++--
 wt-status.h                              |  1 +
 8 files changed, 161 insertions(+), 12 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Range-diff vs v1:

  1:  b2cb5401eff8 !  1:  3bac9edae7d8 t1092: add tests for status/add and sparse files
     @@ Commit message
          Untracked files are fine: adding new files both with 'git add .' and
          'git add folder1/' works just as in a full checkout. This may not be
          entirely desirable, but we are not intending to change behavior at the
     -    moment, only document it.
     +    moment, only document it. A future change could alter the behavior to
     +    be more sensible, and this test could be modified to satisfy the new
     +    expected behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'add, commit, chec
      +	# This "git add folder1/a" is completely ignored
      +	# by the sparse-checkout repos. It causes the
      +	# full repo to have a different staged environment.
     -+	test_must_fail git -C sparse-checkout add folder1/a &&
     -+	test_must_fail git -C sparse-index add folder1/a &&
     ++	#
     ++	# This is not a desirable behavior, but this test
     ++	# ensures that the sparse-index is not the cause
     ++	# of a behavior change.
     ++	test_sparse_match test_must_fail git add folder1/a &&
     ++	test_sparse_match test_must_fail git add --refresh folder1/a &&
      +	git -C full-checkout checkout HEAD -- folder1/a &&
     -+	test_sparse_match git status --porcelain=v2 &&
     ++	test_all_match git status --porcelain=v2 &&
      +
      +	test_all_match git add . &&
      +	test_all_match git status --porcelain=v2 &&
  -:  ------------ >  2:  19344394379d unpack-trees: preserve cache_bottom
  -:  ------------ >  3:  24e71d8c0622 unpack-trees: compare sparse directories correctly
  2:  0a3892d2ec9e !  4:  d3c8948d0a33 unpack-trees: make sparse aware
     @@ Metadata
      Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    unpack-trees: make sparse aware
     +    unpack-trees: stop recursing into sparse directories
      
     -    As a first step to integrate 'git status' and 'git add' with the sparse
     -    index, we must start integrating unpack_trees() with sparse directory
     -    entries. These changes are currently impossible to trigger because
     -    unpack_trees() calls ensure_full_index() if command_requires_full_index
     -    is true. This is the case for all commands at the moment. As we expand
     -    more commands to be sparse-aware, we might find that more changes are
     -    required to unpack_trees(). The current changes will suffice for
     -    'status' and 'add'.
     +    When walking trees using traverse_trees_recursive() and
     +    unpack_callback(), we must not attempt to walk into a sparse directory
     +    entry. There are no index entries within that directory to compare to
     +    the tree object at that position, so skip over the entries of that tree.
      
     -    unpack_trees() calls the traverse_trees() API using unpack_callback()
     -    to decide if we should recurse into a subtree. We must add new abilities
     -    to skip a subtree if it corresponds to a sparse directory entry.
     -
     -    It is important to be careful about the trailing directory separator
     -    that exists in the sparse directory entries but not in the subtree
     -    paths.
     +    This code is used in many places, so the only way to test it is to start
     +    removing the command_requres_full_index option from one builtin at a
     +    time and carefully test that its use of unpack_trees() behaves correctly
     +    with a sparse-index. Such tests will be added by later changes.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
     - ## dir.h ##
     -@@ dir.h: static inline int ce_path_match(struct index_state *istate,
     - 				char *seen)
     - {
     - 	return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
     --			      S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     -+			      S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
     - }
     - 
     - static inline int dir_path_match(struct index_state *istate,
     -
     - ## preload-index.c ##
     -@@ preload-index.c: static void *preload_thread(void *_data)
     - 			continue;
     - 		if (S_ISGITLINK(ce->ce_mode))
     - 			continue;
     -+		if (S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     - 		if (ce_uptodate(ce))
     - 			continue;
     - 		if (ce_skip_worktree(ce))
     -
     - ## read-cache.c ##
     -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     - 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     - 			continue;
     - 
     -+		if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode))
     -+			continue;
     -+
     - 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     - 			filtered = 1;
     - 
     -
       ## unpack-trees.c ##
     -@@ unpack-trees.c: static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o)
     - {
     - 	ce->ce_flags |= CE_UNPACKED;
     - 
     -+	/*
     -+	 * If this is a sparse directory, don't advance cache_bottom.
     -+	 * That will be advanced later using the cache-tree data.
     -+	 */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return;
     -+
     - 	if (o->cache_bottom < o->src_index->cache_nr &&
     - 	    o->src_index->cache[o->cache_bottom] == ce) {
     - 		int bottom = o->cache_bottom;
     -@@ unpack-trees.c: static int do_compare_entry(const struct cache_entry *ce,
     - 	ce_len -= pathlen;
     - 	ce_name = ce->name + pathlen;
     - 
     -+	/* remove directory separator if a sparse directory entry */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		ce_len--;
     - 	return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode);
     - }
     - 
     -@@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const struct traverse_inf
     - 	if (cmp)
     - 		return cmp;
     - 
     -+	/* If ce is a sparse directory, then allow equality here. */
     -+	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		return 0;
     -+
     - 	/*
     - 	 * Even if the beginning compared identically, the ce should
     - 	 * compare as bigger than a directory leading up to it!
      @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
       	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
       	struct unpack_trees_options *o = info->data;
       	const struct name_entry *p = names;
     -+	unsigned recurse = 1;
     ++	unsigned unpack_tree = 1;
       
       	/* Find first entry with a real name (we could use "mask" too) */
       	while (!p->mode)
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       				src[0] = ce;
      +
      +				if (S_ISSPARSEDIR(ce->ce_mode))
     -+					recurse = 0;
     ++					unpack_tree = 0;
       			}
       			break;
       		}
       	}
       
      -	if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
     -+	if (recurse &&
     ++	if (unpack_tree &&
      +	    unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
       		return -1;
       
     @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned l
       		}
       
      -		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
     -+		if (recurse &&
     ++		if (unpack_tree &&
      +		    traverse_trees_recursive(n, dirmask, mask & ~dirmask,
       					     names, info) < 0)
       			return -1;
  3:  28ca717e6526 !  5:  fd96b71968b6 dir.c: accept a directory as part of cone-mode patterns
     @@ dir.c: enum pattern_match_result path_matches_pattern_list(
       	strbuf_addch(&parent_pathname, '/');
       	strbuf_add(&parent_pathname, pathname, pathlen);
       
     -+	/* Directory requests should be added as if they are a file */
     -+	if (parent_pathname.len > 1 &&
     ++	/*
     ++	 * Directory entries are matched if and only if a file
     ++	 * contained immediately within them is matched. For the
     ++	 * case of a directory entry, modify the path to create
     ++	 * a fake filename within this directory, allowing us to
     ++	 * use the file-base matching logic in an equivalent way.
     ++	 */
     ++	if (parent_pathname.len > 0 &&
      +	    parent_pathname.buf[parent_pathname.len - 1] == '/')
      +		strbuf_add(&parent_pathname, "-", 1);
      +
  4:  e86f874dd412 =  6:  1f4ba56e7416 status: skip sparse-checkout percentage with sparse-index
  5:  d7d4cad8be0b !  7:  3d09368c0541 status: use sparse-index throughout
     @@ Commit message
          implementation details are already integrated with sparse-checkout, so
          modify command_requires_full_index to be zero for cmd_status().
      
     -    By running the debugger for 'git status -uno' after that change, we find
     -    two instances of ensure_full_index() that were added for extra safety,
     -    but can be removed without issue.
     +    In refresh_index(), we loop through the index entries to refresh their
     +    stat() information. However, sparse directories have no stat()
     +    information to populate. Ignore these entries.
      
     -    In refresh_index(), we loop through the index entries. The
     -    refresh_cache_ent() method copies the sparse directories into the
     -    refreshed index without issue.
     +    This allows 'git status' to no longer expand a sparse index to a full
     +    one. This is further tested by dropping the "-uno" option and adding an
     +    untracked file into the worktree.
      
     -    The loop within run_diff_files() skips things that are in stage 0 and
     -    have skip-worktree enabled, so seems safe to disable ensure_full_index()
     -    here.
     -
     -    This allows some cases of 'git status' to no longer expand a sparse
     -    index to a full one, giving the following performance improvements for
     -    p2000-sparse-checkout-operations.sh:
     +    The performance test p2000-sparse-checkout-operations.sh demonstrates
     +    these improvements:
      
          Test                                  HEAD~1           HEAD
          -----------------------------------------------------------------------------
     -    2000.2: git status (full-index-v3)    0.38(0.36+0.07)  0.37(0.31+0.10) -2.6%
     -    2000.3: git status (full-index-v4)    0.38(0.29+0.12)  0.37(0.30+0.11) -2.6%
     -    2000.4: git status (sparse-index-v3)  2.43(2.33+0.14)  0.04(0.05+0.04) -98.4%
     -    2000.5: git status (sparse-index-v4)  2.44(2.35+0.13)  0.05(0.04+0.07) -98.0%
     +    2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
     +    2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
     +    2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
     +    2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%
      
          Note that since HEAD~1 was expanding the sparse index by parsing trees,
          it was artificially slower than the full index case. Thus, the 98%
     -    improvement is misleading, and instead we should celebrate the 0.37s to
     -    0.05s improvement of 82%. This is more indicative of the peformance
     +    improvement is misleading, and instead we should celebrate the 0.34s to
     +    0.05s improvement of 85%. This is more indicative of the peformance
          gains we are expecting by using a sparse index.
      
          Note: we are dropping the assignment of core.fsmonitor here. This is not
     @@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
       	trace2_region_enter("index", "refresh", NULL);
      -	/* TODO: audit for interaction with sparse-index. */
      -	ensure_full_index(istate);
     ++
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce, *new_entry;
       		int cache_errno = 0;
     +@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
     + 		if (ignore_skip_worktree && ce_skip_worktree(ce))
     + 			continue;
     + 
     ++		/*
     ++		 * If this entry is a sparse directory, then there isn't
     ++		 * any stat() information to update. Ignore the entry.
     ++		 */
     ++		if (S_ISSPARSEDIR(ce->ce_mode))
     ++			continue;
     ++
     + 		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
     + 			filtered = 1;
     + 
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is expanded and converted back' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
      +	init_repos &&
      +
      +	rm -f trace2.txt &&
     ++	echo >>sparse-index/untracked.txt &&
       	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      -		git -C sparse-index -c core.fsmonitor="" status -uno &&
      -	test_region index ensure_full_index trace2.txt
     -+		git -C sparse-index status -uno &&
     ++		git -C sparse-index status &&
      +	test_region ! index ensure_full_index trace2.txt
       '
       
  6:  434306541613 <  -:  ------------ dir: use expand_to_path() for sparse directories
  7:  f1a9ce4ef0e5 <  -:  ------------ add: allow operating on a sparse-only index
  8:  6d7f30f2b90a <  -:  ------------ pathspec: stop calling ensure_full_index
  9:  75199bbe8ca1 <  -:  ------------ t7519: add sparse directories to FS monitor tests
 10:  9d1183ddd280 !  8:  1fd033a6ebb2 fsmonitor: test with sparse index
     @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
       	)
       '
       
     -+test_expect_success 'status succeeds with sparse index' '
     -+	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++# Usage:
     ++# check_sparse_index_behavior [!]
     ++# If "!" is supplied, then we verify that we do not call ensure_full_index
     ++# during a call to 'git status'. Otherwise, we verify that we _do_ call it.
     ++check_sparse_index_behavior () {
      +	git status --porcelain=v2 >expect &&
      +	git sparse-checkout init --cone --sparse-index &&
     ++	git sparse-checkout set dir1 dir2 &&
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     ++	test_region $1 index ensure_full_index trace2.txt &&
      +	test_cmp expect actual &&
      +	rm trace2.txt &&
     ++	git sparse-checkout disable
     ++}
     ++
     ++test_expect_success 'status succeeds with sparse index' '
     ++	git reset --hard &&
     ++
     ++	test_config core.fsmonitor "$TEST_DIRECTORY/t7519/fsmonitor-all" &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +	EOF
      +	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     -+	rm trace2.txt &&
     ++	check_sparse_index_behavior ! &&
      +
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region ! index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual &&
     ++	check_sparse_index_behavior ! &&
      +
     ++	cp -r dir1 dir1a &&
     ++	git add dir1a &&
     ++	git commit -m "add dir1a" &&
     ++
     ++	# This one modifies outside the sparse-checkout definition
     ++	# and hence we expect to expand the sparse-index.
      +	write_script .git/hooks/fsmonitor-test<<-\EOF &&
      +		printf "last_update_token\0"
      +		printf "dir1a/modified\0"
      +	EOF
     -+	git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+	git status --porcelain=v2 >expect &&
     -+	git sparse-checkout init --cone --sparse-index &&
     -+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     -+		git status --porcelain=v2 >actual &&
     -+	test_region index ensure_full_index trace2.txt &&
     -+	test_cmp expect actual
     ++	check_sparse_index_behavior
      +'
      +
       test_done

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux