On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@xxxxxxxxx> > [...] > +static int path_found(const char *path, const char **dirname, size_t *dir_len, > + int *dir_found) > +{ > + struct stat st; > + char *newdir; > + char *tmp; > + > + /* > + * If dirname corresponds to a directory that doesn't exist, and this > + * path starts with dirname, then path can't exist. > + */ > + if (!*dir_found && !memcmp(path, *dirname, *dir_len)) > + return 0; > + > + /* > + * If path itself exists, return 1. > + */ > + if (!lstat(path, &st)) > + return 1; > + > + /* > + * Otherwise, path does not exist so we'll return 0...but we'll first > + * determine some info about its parent directory so we can avoid > + * lstat calls for future cache entries. > + */ > + newdir = strrchr(path, '/'); > + if (!newdir) > + return 0; /* Didn't find a parent dir; just return 0 now. */ > + > + /* > + * If path starts with directory (which we already lstat'ed and found), > + * then no need to lstat parent directory again. > + */ > + if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len)) > + return 0; I really don't care/just asking, but there was a discussion on another topic about guarding calls to the mem*() family when n=0: https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/ Is this the same sort of redundancy where we could lose the "&& *dirname" part, or is it still important because a "\0" dirname would have corresponding non-0 *dir_len? More generally ... (see below)... > + > + /* Free previous dirname, and cache path's dirname */ > + *dirname = path; > + *dir_len = newdir - path + 1; > + > + tmp = xstrndup(path, *dir_len); > + *dir_found = !lstat(tmp, &st); In most other places we're a bit more careful about lstat() error handling, e.g.: builtin/init-db.c: if (lstat(path->buf, &st_git)) { builtin/init-db.c- if (errno != ENOENT) builtin/init-db.c- die_errno(_("cannot stat '%s'"), path->buf); builtin/init-db.c- } Shouldn't we do the same here and at least error() on return values of -1 with an accompanying errno that isn't ENOENT? > + free(tmp); > + > + return 0; > +} > + > void clear_skip_worktree_from_present_files(struct index_state *istate) > { > + const char *last_dirname = NULL; > + size_t dir_len = 0; > + int dir_found = 1; > + > int i; > + > if (!core_apply_sparse_checkout) > return; > > restart: > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > - struct stat st; > > - if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) { > + if (ce_skip_worktree(ce) && > + path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { ...(continued from above) is the "path is zero" part of this even reachable? I tried with this on top and ran your tests (and the rest of t*sparse*.sh) successfully: diff --git a/sparse-index.c b/sparse-index.c index eed170cd8f7..f89c944d8cd 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate) for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; + assert(*ce->name); if (ce_skip_worktree(ce) && path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { if (S_ISSPARSEDIR(ce->ce_mode)) { I.e. isn't this undue paranoia about the cache API giving us zero-length paths?