Re: [PATCH v2 24/25] sparse-index: expand_to_path()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 31, 2021 at 6:50 PM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> Some users of the index API have a specific path they are looking for,
> but choose to use index_file_exists() to rely on the name-hash hashtable
> instead of doing binary search with index_name_pos(). These users only
> need to know a yes/no answer, not a position within the cache array.
>
> When the index is sparse, the name-hash hash table does not contain the
> full list of paths within sparse directories. It _does_ contain the
> directory names for the sparse-directory entries.
>
> Create a helper function, expand_to_path(), for intended use with the
> name-hash hashtable functions. The integration with name-hash.c will
> follow in a later change.
>
> The solution here is to use ensure_full_index() when we determine that
> the requested path is within a sparse directory entry. This will
> populate the name-hash hashtable as the index is recomputed from
> scratch.
>
> There may be cases where the caller is trying to find an untracked path
> that is not in the index but also is not within a sparse directory
> entry. We want to minimize the overhead for these requests. If we used
> index_name_pos() to find the insertion order of the path, then we could
> determine from that position if a sparse-directory exists. (In fact,
> just calling index_name_pos() in that case would lead to expanding the
> index to a full index.) However, this takes O(log N) time where N is the
> number of cache entries.
>
> To keep the performance of this call based mostly on the input string,
> use index_file_exists() to look for the ancestors of the path. Using the
> heuristic that a sparse directory is likely to have a small number of
> parent directories, we start from the bottom and build up. Use a string
> buffer to allow mutating the path name to terminate after each slash for
> each hashset test.
>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  sparse-index.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  sparse-index.h | 13 +++++++++
>  2 files changed, 85 insertions(+)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index 95ea17174da3..8a1223041296 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -283,3 +283,75 @@ void ensure_full_index(struct index_state *istate)
>
>         trace2_region_leave("index", "ensure_full_index", istate->repo);
>  }
> +
> +/*
> + * This static global helps avoid infinite recursion between
> + * expand_to_path() and index_file_exists().
> + */
> +static int in_expand_to_path = 0;
> +
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase)
> +{
> +       struct strbuf path_mutable = STRBUF_INIT;
> +       size_t substr_len;
> +
> +       /* prevent extra recursion */
> +       if (in_expand_to_path)
> +               return;
> +
> +       if (!istate || !istate->sparse_index)
> +               return;
> +
> +       if (!istate->repo)
> +               istate->repo = the_repository;
> +
> +       in_expand_to_path = 1;
> +
> +       /*
> +        * We only need to actually expand a region if the
> +        * following are both true:
> +        *
> +        * 1. 'path' is not already in the index.
> +        * 2. Some parent directory of 'path' is a sparse directory.
> +        */
> +
> +       if (index_file_exists(istate, path, pathlen, icase))
> +               goto cleanup;
> +
> +       strbuf_add(&path_mutable, path, pathlen);
> +       strbuf_addch(&path_mutable, '/');
> +
> +       /* Check the name hash for all parent directories */
> +       substr_len = 0;
> +       while (substr_len < pathlen) {
> +               char temp;
> +               char *replace = strchr(path_mutable.buf + substr_len, '/');
> +
> +               if (!replace)
> +                       break;
> +
> +               /* replace the character _after_ the slash */
> +               replace++;
> +               temp = *replace;
> +               *replace = '\0';
> +               if (index_file_exists(istate, path_mutable.buf,
> +                                     path_mutable.len, icase)) {
> +                       /*
> +                        * We found a parent directory in the name-hash
> +                        * hashtable, which means that this entry could
> +                        * exist within a sparse-directory entry. Expand
> +                        * accordingly.

"this entry" is a bit unclear; it might be worth referring to the
"path" variable instead.  I think it'd also help to explain the
reasoning for using the '/' character, because while it's clear when
looking at the series, just running across it in the code it might be
easy to forget or miss.  Perhaps:

                    * We found a parent directory in the name-hash
                    * hashtable, because only sparse directory entries
                    * have a trailing '/' character.  Since "path" wasn't
                    * in the index, perhaps it exists within this
                    * sparse-directory.  Expand accordingly.

> +                        */
> +                       ensure_full_index(istate);
> +                       break;
> +               }
> +
> +               *replace = temp;
> +               substr_len = replace - path_mutable.buf;
> +       }
> +
> +cleanup:
> +       strbuf_release(&path_mutable);
> +       in_expand_to_path = 0;
> +}
> diff --git a/sparse-index.h b/sparse-index.h
> index 0268f38753c0..1115a0d7dd98 100644
> --- a/sparse-index.h
> +++ b/sparse-index.h
> @@ -4,6 +4,19 @@
>  struct index_state;
>  int convert_to_sparse(struct index_state *istate);
>
> +/*
> + * Some places in the codebase expect to search for a specific path.
> + * This path might be outside of the sparse-checkout definition, in
> + * which case a sparse-index may not contain a path for that index.
> + *
> + * Given an index and a path, check to see if a leading directory for
> + * 'path' exists in the index as a sparse directory. In that case,
> + * expand that sparse directory to a full range of cache entries and
> + * populate the index accordingly.
> + */
> +void expand_to_path(struct index_state *istate,
> +                   const char *path, size_t pathlen, int icase);
> +
>  struct repository;
>  int set_sparse_index_config(struct repository *repo, int enable);
>
> --
> gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux