Re: [PATCH 11/20] sparse-index: convert from full to sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> If we have a full index, then we can convert it to a sparse index by
> replacing directories outside of the sparse cone with sparse directory
> entries. The convert_to_sparse() method does this, when the situation is
> appropriate.
>
> For now, we avoid converting the index to a sparse index if:
>
>  1. the index is split.
>  2. the index is already sparse.
>  3. sparse-checkout is disabled.
>  4. sparse-checkout does not use cone mode.
>
> Finally, we currently limit the conversion to when the
> GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
> config will be added in a later change.
>
> The trickiest thing about this conversion is that we might not be able
> to mark a directory as a sparse directory just because it is outside the
> sparse cone. There might be unmerged files within that directory, so we
> need to look for those. Also, if there is some strange reason why a file
> is not marked with CE_SKIP_WORKTREE, then we should give up on
> converting that directory. There is still hope that some of its
> subdirectories might be able to convert to sparse, so we keep looking
> deeper.
>
> The conversion process is assisted by the cache-tree extension. This is
> calculated from the full index if it does not already exist. We then
> abandon the cache-tree as it no longer applies to the newly-sparse
> index. Thus, this cache-tree will be recalculated in every
> sparse-full-sparse round-trip until we integrate the cache-tree
> extension with the sparse index.
>
> Some Git commands use the index after writing it. For example, 'git add'
> will update the index, then write it to disk, then read its entries to
> report information. To keep the in-memory index in a full state after
> writing, we re-expand it to a full one after the write. This is wasteful
> for commands that only write the index and do not read from it again,
> but that is only the case until we make those commands "sparse aware."
>
> We can compare the behavior of the sparse-index in
> t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
> when operating on the 'sparse-index' repo. We can also compare the two
> sparse repos directly, such as comparing their indexes (when expanded to
> full in the case of the 'sparse-index' repo). We also verify that the
> index is actually populated with sparse directory entries.
>
> The 'checkout and reset (mixed)' test is marked for failure when
> comparing a sparse repo to a full repo, but we can compare the two
> sparse-checkout cases directly to ensure that we are not changing the
> behavior when using a sparse index.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  cache-tree.c                             |   3 +
>  cache.h                                  |   2 +
>  read-cache.c                             |  26 ++++-
>  sparse-index.c                           | 139 +++++++++++++++++++++++
>  sparse-index.h                           |   1 +
>  t/t1092-sparse-checkout-compatibility.sh |  61 +++++++++-
>  6 files changed, 227 insertions(+), 5 deletions(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 2fb483d3c083..5f07a39e501e 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -6,6 +6,7 @@
>  #include "object-store.h"
>  #include "replace-object.h"
>  #include "promisor-remote.h"
> +#include "sparse-index.h"
>
>  #ifndef DEBUG_CACHE_TREE
>  #define DEBUG_CACHE_TREE 0
> @@ -442,6 +443,8 @@ int cache_tree_update(struct index_state *istate, int flags)
>         if (i)
>                 return i;
>
> +       ensure_full_index(istate);
> +
>         if (!istate->cache_tree)
>                 istate->cache_tree = cache_tree();
>
> diff --git a/cache.h b/cache.h
> index d75b352f38d3..e8b7d3b4fb33 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -251,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode)
>  {
>         if (S_ISLNK(mode))
>                 return S_IFLNK;
> +       if (mode == S_IFDIR)
> +               return S_IFDIR;
>         if (S_ISDIR(mode) || S_ISGITLINK(mode))
>                 return S_IFGITLINK;
>         return S_IFREG | ce_permissions(mode);
> diff --git a/read-cache.c b/read-cache.c
> index 97dbf2434f30..67acbf202f4e 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -25,6 +25,7 @@
>  #include "fsmonitor.h"
>  #include "thread-utils.h"
>  #include "progress.h"
> +#include "sparse-index.h"
>
>  /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -1002,8 +1003,14 @@ int verify_path(const char *path, unsigned mode)
>
>                         c = *path++;
>                         if ((c == '.' && !verify_dotfile(path, mode)) ||
> -                           is_dir_sep(c) || c == '\0')
> +                           is_dir_sep(c))
>                                 return 0;
> +                       /*
> +                        * allow terminating directory separators for
> +                        * sparse directory enries.

enries -> entries

> +                        */
> +                       if (c == '\0')
> +                               return S_ISDIR(mode);

Yaay, much simpler (than the RFC version).

>                 } else if (c == '\\' && protect_ntfs) {
>                         if (is_ntfs_dotgit(path))
>                                 return 0;
> @@ -3061,6 +3068,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>                                  unsigned flags)
>  {
>         int ret;
> +       int was_full = !istate->sparse_index;
> +
> +       ret = convert_to_sparse(istate);
> +
> +       if (ret) {
> +               warning(_("failed to convert to a sparse-index"));
> +               return ret;
> +       }
>
>         /*
>          * TODO trace2: replace "the_repository" with the actual repo instance
> @@ -3072,6 +3087,9 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
>         trace2_region_leave_printf("index", "do_write_index", the_repository,
>                                    "%s", get_lock_file_path(lock));
>
> +       if (was_full)
> +               ensure_full_index(istate);
> +
>         if (ret)
>                 return ret;
>         if (flags & COMMIT_LOCK)
> @@ -3162,9 +3180,10 @@ static int write_shared_index(struct index_state *istate,
>                               struct tempfile **temp)
>  {
>         struct split_index *si = istate->split_index;
> -       int ret;
> +       int ret, was_full = !istate->sparse_index;
>
>         move_cache_to_base_index(istate);
> +       convert_to_sparse(istate);
>
>         trace2_region_enter_printf("index", "shared/do_write_index",
>                                    the_repository, "%s", get_tempfile_path(*temp));
> @@ -3172,6 +3191,9 @@ static int write_shared_index(struct index_state *istate,
>         trace2_region_leave_printf("index", "shared/do_write_index",
>                                    the_repository, "%s", get_tempfile_path(*temp));
>
> +       if (was_full)
> +               ensure_full_index(istate);
> +
>         if (ret)
>                 return ret;
>         ret = adjust_shared_perm(get_tempfile_path(*temp));
> diff --git a/sparse-index.c b/sparse-index.c
> index 316cb949b74b..cb1f85635fbc 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -4,6 +4,145 @@
>  #include "tree.h"
>  #include "pathspec.h"
>  #include "trace2.h"
> +#include "cache-tree.h"
> +#include "config.h"
> +#include "dir.h"
> +#include "fsmonitor.h"
> +
> +static struct cache_entry *construct_sparse_dir_entry(
> +                               struct index_state *istate,
> +                               const char *sparse_dir,
> +                               struct cache_tree *tree)
> +{
> +       struct cache_entry *de;
> +
> +       de = make_cache_entry(istate, S_IFDIR, &tree->oid, sparse_dir, 0, 0);
> +
> +       de->ce_flags |= CE_SKIP_WORKTREE;
> +       return de;
> +}
> +
> +/*
> + * Returns the number of entries "inserted" into the index.
> + */
> +static int convert_to_sparse_rec(struct index_state *istate,
> +                                int num_converted,
> +                                int start, int end,
> +                                const char *ct_path, size_t ct_pathlen,
> +                                struct cache_tree *ct)
> +{
> +       int i, can_convert = 1;
> +       int start_converted = num_converted;
> +       enum pattern_match_result match;
> +       int dtype;
> +       struct strbuf child_path = STRBUF_INIT;
> +       struct pattern_list *pl = istate->sparse_checkout_patterns;
> +
> +       /*
> +        * Is the current path outside of the sparse cone?
> +        * Then check if the region can be replaced by a sparse
> +        * directory entry (everything is sparse and merged).
> +        */
> +       match = path_matches_pattern_list(ct_path, ct_pathlen,
> +                                         NULL, &dtype, pl, istate);
> +       if (match != NOT_MATCHED)
> +               can_convert = 0;

Not sure if you saw my comments on the flow control at
https://lore.kernel.org/git/CABPp-BE9wPwmC0=pA4p1_QSRDHrO8RzqfJQdE2NxYZsYL_Rcig@xxxxxxxxxxxxxx/
(the typos elsewhere seem to still be present).  If you saw it and
decided against it, that's fine, just wanted the idea to at least be
floated.

> +
> +       for (i = start; can_convert && i < end; i++) {
> +               struct cache_entry *ce = istate->cache[i];
> +
> +               if (ce_stage(ce) ||
> +                   !(ce->ce_flags & CE_SKIP_WORKTREE))
> +                       can_convert = 0;
> +       }
> +
> +       if (can_convert) {
> +               struct cache_entry *se;
> +               se = construct_sparse_dir_entry(istate, ct_path, ct);
> +
> +               istate->cache[num_converted++] = se;
> +               return 1;
> +       }
> +
> +       for (i = start; i < end; ) {
> +               int count, span, pos = -1;
> +               const char *base, *slash;
> +               struct cache_entry *ce = istate->cache[i];
> +
> +               /*
> +                * Detect if this is a normal entry oustide of any subtree

s/oustide/outside/

> +                * entry.
> +                */
> +               base = ce->name + ct_pathlen;
> +               slash = strchr(base, '/');
> +
> +               if (slash)
> +                       pos = cache_tree_subtree_pos(ct, base, slash - base);
> +
> +               if (pos < 0) {
> +                       istate->cache[num_converted++] = ce;
> +                       i++;
> +                       continue;
> +               }
> +
> +               strbuf_setlen(&child_path, 0);
> +               strbuf_add(&child_path, ce->name, slash - ce->name + 1);
> +
> +               span = ct->down[pos]->cache_tree->entry_count;
> +               count = convert_to_sparse_rec(istate,
> +                                             num_converted, i, i + span,
> +                                             child_path.buf, child_path.len,
> +                                             ct->down[pos]->cache_tree);
> +               num_converted += count;
> +               i += span;
> +       }
> +
> +       strbuf_release(&child_path);
> +       return num_converted - start_converted;
> +}
> +
> +int convert_to_sparse(struct index_state *istate)
> +{
> +       if (istate->split_index || istate->sparse_index ||
> +           !core_apply_sparse_checkout || !core_sparse_checkout_cone)
> +               return 0;
> +
> +       /*
> +        * For now, only create a sparse index with the
> +        * GIT_TEST_SPARSE_INDEX environment variable. We will relax
> +        * this once we have a proper way to opt-in (and later still,
> +        * opt-out).
> +        */
> +       if (!git_env_bool("GIT_TEST_SPARSE_INDEX", 0))
> +               return 0;
> +
> +       if (!istate->sparse_checkout_patterns) {
> +               istate->sparse_checkout_patterns = xcalloc(1, sizeof(struct pattern_list));
> +               if (get_sparse_checkout_patterns(istate->sparse_checkout_patterns) < 0)
> +                       return 0;
> +       }
> +
> +       if (!istate->sparse_checkout_patterns->use_cone_patterns) {
> +               warning(_("attempting to use sparse-index without cone mode"));
> +               return -1;
> +       }
> +
> +       if (cache_tree_update(istate, 0)) {
> +               warning(_("unable to update cache-tree, staying full"));
> +               return -1;
> +       }
> +
> +       remove_fsmonitor(istate);
> +
> +       trace2_region_enter("index", "convert_to_sparse", istate->repo);
> +       istate->cache_nr = convert_to_sparse_rec(istate,
> +                                                0, 0, istate->cache_nr,
> +                                                "", 0, istate->cache_tree);
> +       istate->drop_cache_tree = 1;
> +       istate->sparse_index = 1;
> +       trace2_region_leave("index", "convert_to_sparse", istate->repo);
> +       return 0;
> +}
>
>  static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
>  {
> diff --git a/sparse-index.h b/sparse-index.h
> index 09a20d036c46..64380e121d80 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -3,5 +3,6 @@
>
>  struct index_state;
>  void ensure_full_index(struct index_state *istate);
> +int convert_to_sparse(struct index_state *istate);
>
>  #endif
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 4d789fe86b9d..ca87033d30b0 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -2,6 +2,9 @@
>
>  test_description='compare full workdir to sparse workdir'
>
> +GIT_TEST_CHECK_CACHE_TREE=0

Same question as I posted for the RFC series:

Why do you need to set this?  I vaguely remember needing to mess with
this when working with sparse checkouts because it did weird stuff but
I don't remember details.  But since your patch touches cache_trees, it
seems weird to show up without explanation.

> +GIT_TEST_SPLIT_INDEX=0
> +
>  . ./test-lib.sh
>
>  test_expect_success 'setup' '
> @@ -121,15 +124,49 @@ run_on_all () {
>  test_all_match () {
>         run_on_all "$@" &&
>         test_cmp full-checkout-out sparse-checkout-out &&
> -       test_cmp full-checkout-err sparse-checkout-err
> +       test_cmp full-checkout-out sparse-index-out &&
> +       test_cmp full-checkout-err sparse-checkout-err &&
> +       test_cmp full-checkout-err sparse-index-err
>  }
>
>  test_sparse_match () {
> -       run_on_sparse $* &&
> +       run_on_sparse "$@" &&
>         test_cmp sparse-checkout-out sparse-index-out &&
>         test_cmp sparse-checkout-err sparse-index-err
>  }
>
> +test_expect_success 'sparse-index contents' '
> +       init_repos &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in folder1 folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done &&

Thanks for making the output look more like ls-tree output; it's
easier to parse that way, at least for me.

> +
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in deep folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done &&
> +
> +       GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 &&
> +
> +       test-tool -C sparse-index read-cache --table >cache &&
> +       for dir in deep/deeper2 folder1 folder2 x
> +       do
> +               TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
> +               grep "040000 tree $TREE $dir/" cache \
> +                       || return 1
> +       done
> +'
> +
>  test_expect_success 'expanded in-memory index matches full index' '
>         init_repos &&
>         test_sparse_match test-tool read-cache --expand --table
> @@ -137,6 +174,7 @@ test_expect_success 'expanded in-memory index matches full index' '
>
>  test_expect_success 'status with options' '
>         init_repos &&
> +       test_sparse_match ls &&
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git status --porcelain=v2 -z -u &&
>         test_all_match git status --porcelain=v2 -uno &&
> @@ -273,6 +311,17 @@ test_expect_failure 'checkout and reset (mixed)' '
>         test_all_match git reset update-folder2
>  '
>
> +# Ensure that sparse-index behaves identically to
> +# sparse-checkout with a full index.
> +test_expect_success 'checkout and reset (mixed) [sparse]' '
> +       init_repos &&
> +
> +       test_sparse_match git checkout -b reset-test update-deep &&
> +       test_sparse_match git reset deepest &&
> +       test_sparse_match git reset update-folder1 &&
> +       test_sparse_match git reset update-folder2
> +'
> +
>  test_expect_success 'merge' '
>         init_repos &&
>
> @@ -309,14 +358,20 @@ test_expect_success 'clean' '
>         test_all_match git status --porcelain=v2 &&
>         test_all_match git clean -f &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
>         test_all_match git clean -xf &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
>         test_all_match git clean -xdf &&
>         test_all_match git status --porcelain=v2 &&
> +       test_sparse_match ls &&
> +       test_sparse_match ls folder1 &&
>
> -       test_path_is_dir sparse-checkout/folder1
> +       test_sparse_match test_path_is_dir folder1
>  '
>
>  test_done
> --
> gitgitgadget

I mostly read over the range-diff since it was much shorter.  You've
addressed a number of questions/comments I had on the RFC version, but
there's still some I didn't see a response to so I reposted them here.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux