Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 16, 2022 at 1:37 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@xxxxxxxxx>
> > [...]
> > +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> > +                   int *dir_found)
> > +{
> > +     struct stat st;
> > +     char *newdir;
> > +     char *tmp;
> > +
> > +     /*
> > +      * If dirname corresponds to a directory that doesn't exist, and this
> > +      * path starts with dirname, then path can't exist.
> > +      */
> > +     if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> > +             return 0;
> > +
> > +     /*
> > +      * If path itself exists, return 1.
> > +      */
> > +     if (!lstat(path, &st))
> > +             return 1;
> > +
> > +     /*
> > +      * Otherwise, path does not exist so we'll return 0...but we'll first
> > +      * determine some info about its parent directory so we can avoid
> > +      * lstat calls for future cache entries.
> > +      */
> > +     newdir = strrchr(path, '/');
> > +     if (!newdir)
> > +             return 0; /* Didn't find a parent dir; just return 0 now. */
> > +
> > +     /*
> > +      * If path starts with directory (which we already lstat'ed and found),
> > +      * then no need to lstat parent directory again.
> > +      */
> > +     if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> > +             return 0;
>
> I really don't care/just asking, but there was a discussion on another
> topic about guarding calls to the mem*() family when n=0:
> https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/
>
> Is this the same sort of redundancy where we could lose the "&&
> *dirname" part, or is it still important because a "\0" dirname would
> have corresponding non-0 *dir_len?

No, dirname is a char**, not a char*.  I need to make sure *dirname is
non-NULL before passing to memcmp or we get segfaults (and *dirname
will be NULL the first time it gets to this line, so the check is
critical).

> More generally ... (see below)...
>
> > +
> > +     /* Free previous dirname, and cache path's dirname */
> > +     *dirname = path;
> > +     *dir_len = newdir - path + 1;
> > +
> > +     tmp = xstrndup(path, *dir_len);
> > +     *dir_found = !lstat(tmp, &st);
>
> In most other places we're a bit more careful about lstat() error handling, e.g.:
>
>     builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
>     builtin/init-db.c-                      if (errno != ENOENT)
>     builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
>     builtin/init-db.c-              }
>
> Shouldn't we do the same here and at least error() on return values of
> -1 with an accompanying errno that isn't ENOENT?

If we should do that everywhere, should we have an xlstat in wrapper.[ch]?

> > +     free(tmp);
> > +
> > +     return 0;
> > +}
> > +
> >  void clear_skip_worktree_from_present_files(struct index_state *istate)
> >  {
> > +     const char *last_dirname = NULL;
> > +     size_t dir_len = 0;
> > +     int dir_found = 1;
> > +
> >       int i;
> > +
> >       if (!core_apply_sparse_checkout)
> >               return;
> >
> >  restart:
> >       for (i = 0; i < istate->cache_nr; i++) {
> >               struct cache_entry *ce = istate->cache[i];
> > -             struct stat st;
> >
> > -             if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> > +             if (ce_skip_worktree(ce) &&
> > +                 path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>
> ...(continued from above) is the "path is zero" part of this even
> reachable? I tried with this on top and ran your tests (and the rest of
> t*sparse*.sh) successfully:
>
>         diff --git a/sparse-index.c b/sparse-index.c
>         index eed170cd8f7..f89c944d8cd 100644
>         --- a/sparse-index.c
>         +++ b/sparse-index.c
>         @@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>                 for (i = 0; i < istate->cache_nr; i++) {
>                         struct cache_entry *ce = istate->cache[i];
>
>         +               assert(*ce->name);
>                         if (ce_skip_worktree(ce) &&
>                             path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>                                 if (S_ISSPARSEDIR(ce->ce_mode)) {
>
> I.e. isn't this undue paranoia about the cache API giving us zero-length
> paths?

Nope, not related at all, for two reasons: the code above was checking
for NULL pointers rather than NUL characters, and the argument I was
checking was last_dirname, not ce->name.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux