[PATCH v2 00/20] Sparse Index: Design, Format, Tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is the first full patch series submission coming out of the
sparse-index RFC [1].

[1]
https://lore.kernel.org/git/pull.847.git.1611596533.gitgitgadget@xxxxxxxxx/

I won't waste too much space here, because PATCH 1 includes a sizeable
design document that describes the feature, the reasoning behind it, and my
plan for getting this implemented widely throughout the codebase.

There are some new things here that were not in the RFC:

 * Design doc and format updates. (Patch 1)
 * Performance test script. (Patches 2 and 20)

Notably missing in this series from the RFC:

 * The mega-patch inserting ensure_full_index() throughout the codebase.
   That will be a follow-up series to this one.
 * The integrations with git status and git add to demonstrate the improved
   performance. Those will also appear in their own series later.

I plan to keep my latest work in this area in my 'sparse-index/wip' branch
[2]. It includes all of the work from the RFC right now, updated with the
work from this series.

[2] https://github.com/derrickstolee/git/tree/sparse-index/wip


Updates in V2
=============

 * Various typos and awkward grammar is fixed.
 * Cleaned up unnecessary commands in p2000-sparse-operations.sh
 * Added a comment to the sparse_index member of struct index_state.
 * Used tree_type, commit_type, and blob_type in test-read-cache.c.

Thanks, -Stolee

Derrick Stolee (20):
  sparse-index: design doc and format update
  t/perf: add performance test for sparse operations
  t1092: clean up script quoting
  sparse-index: add guard to ensure full index
  sparse-index: implement ensure_full_index()
  t1092: compare sparse-checkout to sparse-index
  test-read-cache: print cache entries with --table
  test-tool: don't force full index
  unpack-trees: ensure full index
  sparse-checkout: hold pattern list in index
  sparse-index: convert from full to sparse
  submodule: sparse-index should not collapse links
  unpack-trees: allow sparse directories
  sparse-index: check index conversion happens
  sparse-index: create extension for compatibility
  sparse-checkout: toggle sparse index from builtin
  sparse-checkout: disable sparse-index
  cache-tree: integrate with sparse directory entries
  sparse-index: loose integration with cache_tree_verify()
  p2000: add sparse-index repos

 Documentation/config/extensions.txt      |   8 +
 Documentation/git-sparse-checkout.txt    |  14 ++
 Documentation/technical/index-format.txt |   7 +
 Documentation/technical/sparse-index.txt | 173 ++++++++++++++
 Makefile                                 |   1 +
 builtin/sparse-checkout.c                |  44 +++-
 cache-tree.c                             |  40 ++++
 cache.h                                  |  18 +-
 read-cache.c                             |  35 ++-
 repo-settings.c                          |  15 ++
 repository.c                             |  11 +-
 repository.h                             |   3 +
 setup.c                                  |   3 +
 sparse-index.c                           | 290 +++++++++++++++++++++++
 sparse-index.h                           |  11 +
 t/README                                 |   3 +
 t/helper/test-read-cache.c               |  66 +++++-
 t/perf/p2000-sparse-operations.sh        | 102 ++++++++
 t/t1091-sparse-checkout-builtin.sh       |  13 +
 t/t1092-sparse-checkout-compatibility.sh | 136 +++++++++--
 unpack-trees.c                           |  16 +-
 21 files changed, 969 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/technical/sparse-index.txt
 create mode 100644 sparse-index.c
 create mode 100644 sparse-index.h
 create mode 100755 t/perf/p2000-sparse-operations.sh


base-commit: 966e671106b2fd38301e7c344c754fd118d0bb07
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-883%2Fderrickstolee%2Fsparse-index%2Fformat-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-883/derrickstolee/sparse-index/format-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/883

Range-diff vs v1:

  1:  daa9a6bcefbc !  1:  2fe413fdac80 sparse-index: design doc and format update
     @@ Documentation/technical/sparse-index.txt (new)
      +If we need to discover the details for paths within that directory, we
      +can parse trees to find that list.
      +
     -+This addition of sparse-directory entries violates expectations about the
     ++At time of writing, sparse-directory entries violate expectations about the
      +index format and its in-memory data structure. There are many consumers in
      +the codebase that expect to iterate through all of the index entries and
      +see only files. In addition, they expect to see all files at `HEAD`. One
     @@ Documentation/technical/sparse-index.txt (new)
      +* `git merge`
      +* `git rebase`
      +
     ++Hopefully, commands such as `git merge` and `git rebase` can benefit
     ++instead from merge algorithms that do not use the index as a data
     ++structure, such as the merge-ORT strategy. As these topics mature, we
     ++may enalbe the ORT strategy by default for repositories using the
     ++sparse-index feature.
     ++
      +Along with `git status` and `git add`, these commands cover the majority
      +of users' interactions with the working directory. In addition, we can
      +integrate with these commands:
  2:  a8c6322a3dbe !  2:  540ab5495065 t/perf: add performance test for sparse operations
     @@ t/perf/p2000-sparse-operations.sh (new)
      +	# Remove submodules from the example repo, because our
      +	# duplication of the entire repo creates an unlikly data shape.
      +	git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
     -+	rm -f .gitmodules &&
     -+	git add .gitmodules &&
     ++	git rm -f .gitmodules &&
      +	for module in $(awk "{print \$2}" modules)
      +	do
      +		git rm $module || return 1
      +	done &&
     -+	git add . &&
      +	git commit -m "remove submodules" &&
      +
      +	echo bogus >a &&
  3:  6e783c88821e =  3:  5cbedb377b37 t1092: clean up script quoting
  4:  01da4c48a1fa =  4:  6e21f776e883 sparse-index: add guard to ensure full index
  5:  2b83989fbcd3 !  5:  399ddb0bad56 sparse-index: implement ensure_full_index()
     @@ cache.h: struct index_state {
       		 updated_skipworktree : 1,
      -		 fsmonitor_has_run_once : 1;
      +		 fsmonitor_has_run_once : 1,
     ++
     ++		 /*
     ++		  * sparse_index == 1 when sparse-directory
     ++		  * entries exist. Requires sparse-checkout
     ++		  * in cone mode.
     ++		  */
      +		 sparse_index : 1;
       	struct hashmap name_hash;
       	struct hashmap dir_hash;
  6:  c9910a37579c =  6:  eac2db5efc22 t1092: compare sparse-checkout to sparse-index
  7:  3d92df7a0cf9 !  7:  e9c82d2eda82 test-read-cache: print cache entries with --table
     @@ Commit message
      
       ## t/helper/test-read-cache.c ##
      @@
     + #include "test-tool.h"
       #include "cache.h"
       #include "config.h"
     - 
     ++#include "blob.h"
     ++#include "commit.h"
     ++#include "tree.h"
     ++
      +static void print_cache_entry(struct cache_entry *ce)
      +{
     -+	printf("%06o ", ce->ce_mode & 0777777);
     ++	const char *type;
     ++	printf("%06o ", ce->ce_mode & 0177777);
      +
      +	if (S_ISSPARSEDIR(ce->ce_mode))
     -+		printf("tree ");
     ++		type = tree_type;
      +	else if (S_ISGITLINK(ce->ce_mode))
     -+		printf("commit ");
     ++		type = commit_type;
      +	else
     -+		printf("blob ");
     ++		type = blob_type;
      +
     -+	printf("%s\t%s\n",
     ++	printf("%s %s\t%s\n",
     ++	       type,
      +	       oid_to_hex(&ce->oid),
      +	       ce->name);
      +}
      +
     -+static void print_cache(struct index_state *cache)
     ++static void print_cache(struct index_state *istate)
      +{
      +	int i;
     -+	for (i = 0; i < the_index.cache_nr; i++)
     -+		print_cache_entry(the_index.cache[i]);
     ++	for (i = 0; i < istate->cache_nr; i++)
     ++		print_cache_entry(istate->cache[i]);
      +}
     -+
     + 
       int cmd__read_cache(int argc, const char **argv)
       {
      +	struct repository *r = the_repository;
  8:  94373e2bfbbc !  8:  243541fc5820 test-tool: don't force full index
     @@ Commit message
      
       ## t/helper/test-read-cache.c ##
      @@
     - #include "test-tool.h"
     - #include "cache.h"
     - #include "config.h"
     + #include "blob.h"
     + #include "commit.h"
     + #include "tree.h"
      +#include "sparse-index.h"
       
       static void print_cache_entry(struct cache_entry *ce)
  9:  e71f033c2871 =  9:  48f65093b3da unpack-trees: ensure full index
 10:  f86d3dc154d1 ! 10:  83aac8b7a1ec sparse-checkout: hold pattern list in index
     @@ Commit message
          pattern set, we need access to that in-memory copy. Place a pointer to
          a 'struct pattern_list' in the index so we can access this on-demand.
          This will be used in the next change which uses the sparse-checkout
     -    definition to filter out directories that are outsie the sparse cone.
     +    definition to filter out directories that are outside the sparse cone.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
 11:  a2d77c23a0cb ! 11:  f6db0c27a285 sparse-index: convert from full to sparse
     @@ read-cache.c: int verify_path(const char *path, unsigned mode)
       				return 0;
      +			/*
      +			 * allow terminating directory separators for
     -+			 * sparse directory enries.
     ++			 * sparse directory entries.
      +			 */
      +			if (c == '\0')
      +				return S_ISDIR(mode);
     @@ sparse-index.c
      +		struct cache_entry *ce = istate->cache[i];
      +
      +		/*
     -+		 * Detect if this is a normal entry oustide of any subtree
     ++		 * Detect if this is a normal entry outside of any subtree
      +		 * entry.
      +		 */
      +		base = ce->name + ct_pathlen;
 12:  4405a9115c3b = 12:  f2a3e7298798 submodule: sparse-index should not collapse links
 13:  fda23f07e6a2 ! 13:  6f1ebe6ccc08 unpack-trees: allow sparse directories
     @@ Commit message
          is possible to have a directory in a sparse index as long as that entry
          is itself marked with the skip-worktree bit.
      
     -    The negation of the 'pos' variable must be conditioned to only when it
     -    starts as negative. This is identical behavior as before when the index
     -    is full.
     +    The 'pos' variable is assigned a negative value if an exact match is not
     +    found. Since a directory name can be an exact match, it is no longer an
     +    error to have a nonnegative 'pos' value.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
 14:  7d4627574bb8 = 14:  3fa684b315fb sparse-index: check index conversion happens
 15:  564503f78784 ! 15:  d74576d677f6 sparse-index: create extension for compatibility
     @@ Commit message
      
          We _could_ add a new index version that explicitly adds these
          capabilities, but there are nuances to index formats 2, 3, and 4 that
     -    are still valuable to select as options. For now, create a repo
     -    extension, "extensions.sparseIndex", that specifies that the tool
     -    reading this repository must understand sparse directory entries.
     +    are still valuable to select as options. Until we add index format
     +    version 5, create a repo extension, "extensions.sparseIndex", that
     +    specifies that the tool reading this repository must understand sparse
     +    directory entries.
      
          This change only encodes the extension and enables it when
          GIT_TEST_SPARSE_INDEX=1. Later, we will add a more user-friendly CLI
     @@ Documentation/config/extensions.txt: extensions.objectFormat::
      +	When combined with `core.sparseCheckout=true` and
      +	`core.sparseCheckoutCone=true`, the index may contain entries
      +	corresponding to directories outside of the sparse-checkout
     -+	definition. Versions of Git that do not understand this extension
     -+	do not expect directory entries in the index.
     ++	definition in lieu of containing each path under such directories.
     ++	Versions of Git that do not understand this extension do not
     ++	expect directory entries in the index.
      
       ## cache.h ##
      @@ cache.h: struct repository_format {
 16:  6d6b230e3318 ! 16:  e530ca5f668d sparse-checkout: toggle sparse index from builtin
     @@ Documentation/git-sparse-checkout.txt: To avoid interfering with other worktrees
      +a sparse index until they are properly integrated with the feature.
      ++
      +**WARNING:** Using a sparse index requires modifying the index in a way
     -+that is not completely understood by other tools. Enabling sparse index
     -+enables the `extensions.spareseIndex` config value, which might cause
     -+other tools to stop working with your repository. If you have trouble with
     -+this compatibility, then run `git sparse-checkout sparse-index disable` to
     -+remove this config and rewrite your index to not be sparse.
     ++that is not completely understood by external tools. If you have trouble
     ++with this compatibility, then run `git sparse-checkout sparse-index disable`
     ++to rewrite your index to not be sparse. Older versions of Git will not
     ++understand the `sparseIndex` repository extension and may fail to interact
     ++with your repository until it is disabled.
       
       'set'::
       	Write a set of patterns to the sparse-checkout file, as given as
 17:  bcf960ef2362 = 17:  42d0da9c5def sparse-checkout: disable sparse-index
 18:  e6afec58674e = 18:  6bb0976a6295 cache-tree: integrate with sparse directory entries
 19:  2be4981fe698 = 19:  07f34e80609a sparse-index: loose integration with cache_tree_verify()
 20:  a738b0ba8ab4 = 20:  41e3b56b9c17 p2000: add sparse-index repos

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux