On 4/20/2021 7:00 PM, Elijah Newren wrote: > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> diff --git a/dir.h b/dir.h >> index 51cb0e217247..9d6666f520f3 100644 >> --- a/dir.h >> +++ b/dir.h >> @@ -503,7 +503,7 @@ static inline int ce_path_match(struct index_state *istate, >> char *seen) >> { >> return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen, >> - S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)); >> + S_ISSPARSEDIR(ce->ce_mode) || S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)); > > I'm confused why this change would be needed, or why it'd semantically > be meaningful here either. Doesn't S_ISSPARSEDIR() being true imply > S_ISDIR() is true (and perhaps even vice versa?). > > By chance, was this a leftover from your early RFC changes from a few > series ago when you had an entirely different mode for sparse > directory entries? I will double-check on this with additional testing and debugging. Your comments below make it clear that this patch would benefit from some additional splitting. >> } >> >> static inline int dir_path_match(struct index_state *istate, >> diff --git a/preload-index.c b/preload-index.c >> index e5529a586366..35e67057ca9b 100644 >> --- a/preload-index.c >> +++ b/preload-index.c >> @@ -55,6 +55,8 @@ static void *preload_thread(void *_data) >> continue; >> if (S_ISGITLINK(ce->ce_mode)) >> continue; >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + continue; >> if (ce_uptodate(ce)) >> continue; >> if (ce_skip_worktree(ce)) > > Don't we have S_ISSPARSEDIR(ce->ce_mode) implies ce_skip_worktree(ce)? > Is this a duplicate check? If so, is it still desirable for > future-proofing or code clarity, or is it strictly redundant? You're right, we could skip this one because the ce_skip_worktree(ce) is enough to cover this case. I think I created this one because I was auditing uses of S_ISGITLINK(). >> diff --git a/read-cache.c b/read-cache.c >> index 29ffa9ac5db9..6308234b4838 100644 >> --- a/read-cache.c >> +++ b/read-cache.c >> @@ -1594,6 +1594,9 @@ int refresh_index(struct index_state *istate, unsigned int flags, >> if (ignore_skip_worktree && ce_skip_worktree(ce)) >> continue; >> >> + if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode)) >> + continue; >> + > > I'm a bit confused about what could trigger ce_skip_worktree(ce) && > !ignore_skip_worktree and why it'd be desirable to refresh > skip-worktree entries. However, this is tangential to your patch and > has apparently been around since 2009 (in particular, from 56cac48c35 > ("ie_match_stat(): do not ignore skip-worktree bit with > CE_MATCH_IGNORE_VALID", 2009-12-14)). This is probably better served with a statement like this earlier in the method: if (ignore_skip_worktree) ensure_full_index(istate); It seems like ignoring the skip worktree bits is a rare occasion and it will be worth expanding the index for that case. >> if (pathspec && !ce_path_match(istate, ce, pathspec, seen)) >> filtered = 1; >> >> diff --git a/unpack-trees.c b/unpack-trees.c >> index dddf106d5bd4..9a62e823928a 100644 >> --- a/unpack-trees.c >> +++ b/unpack-trees.c >> @@ -586,6 +586,13 @@ static void mark_ce_used(struct cache_entry *ce, struct unpack_trees_options *o) >> { >> ce->ce_flags |= CE_UNPACKED; >> >> + /* >> + * If this is a sparse directory, don't advance cache_bottom. >> + * That will be advanced later using the cache-tree data. >> + */ >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + return; >> + > > I don't understand cache_bottom stuff; we might want to get Junio to > look over it. Or maybe I just need to dig a bit further and attempt > to understand it. I remember looking very careful at this when I created this (and found it worth a comment) but I don't recall enough off the top of my head. This is worth splitting out with a careful message, which will force me to reexamine the cache_bottom member. >> if (o->cache_bottom < o->src_index->cache_nr && >> o->src_index->cache[o->cache_bottom] == ce) { >> int bottom = o->cache_bottom; >> @@ -984,6 +991,9 @@ static int do_compare_entry(const struct cache_entry *ce, >> ce_len -= pathlen; >> ce_name = ce->name + pathlen; >> >> + /* remove directory separator if a sparse directory entry */ >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + ce_len--; >> return df_name_compare(ce_name, ce_len, S_IFREG, name, namelen, mode); > > Shouldn't we be passing ce->ce_mode instead of S_IFREG here as well? > > Note the following sort order: > foo > foo.txt > foo/ > foo/bar > > You've trimmed off the '/', so 'foo/' would be ordered where 'foo' is, > but df_name_compare() exists to make "foo" sort exactly where "foo/" > would when "foo" is a directory. Will your df_name_compare() call > here result in foo.txt being placed after all the "foo/<subpath>" > entries in the index and perhaps cause other problems down the line? > (Are there issues, e.g. with cache-trees getting wrong ordering from > this, or even writing out indexes or tree objects with the wrong > ordering? I've written out trees to disk with wrong ordering before > and git usually survives but gets really confused with diffs.) > > Since at least one caller of compare_entry() takes the return result > and does a "if (cmp < 0)", this order is going to matter in some > cases. Perhaps we need some testcases where there is a sparse > directory entry named "foo/" and a file recorded in some relevant tree > with the name "foo.txt" to be able to trigger these lines of code? I will do some testing to find out why removing the separator here was necessary or valuable. >> } >> >> @@ -993,6 +1003,10 @@ static int compare_entry(const struct cache_entry *ce, const struct traverse_inf >> if (cmp) >> return cmp; >> >> + /* If ce is a sparse directory, then allow equality here. */ >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + return 0; >> + > > Um...so a sparse directory compares equal to _anything_ at all? I'm > really confused why this would be desirable. Am I missing something > here? The context is that is removed from the patch is that "cmp" is the response from do_compare_entry(), which does a length-limited comparison. If cmp is non-zero, then we've already returned the difference. The rest of the method is checking if the 'info' input is actually a parent directory of the _path_ given at this cache entry. >> /* >> * Even if the beginning compared identically, the ce should >> * compare as bigger than a directory leading up to it! The line after this is: return ce_namelen(ce) > traverse_path_len(info, tree_entry_len(n)); This comparison is saying "these paths match up to the directory specified by info and n, but we need 'ce' to be a file within that directory." But in the case of a sparse directory entry, we can skip this comparison. >> @@ -1243,6 +1257,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str >> struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, }; >> struct unpack_trees_options *o = info->data; >> const struct name_entry *p = names; >> + unsigned recurse = 1; > > "recurse" sent my mind off into questions about safety checks, base > cases, etc., instead of just the simple "we don't want to read in > directories corresponding to sparse entries". I think this would be > clearer either if the variable had the sparsity concept embedded in > its name somewhere (e.g. "unsigned sparse_entry = 0", and check for > (!sparse_entry) instead of (recurse) below), or with a comment about > why there are cases where you want to avoid recursion. I can understand that. This callback is confusing because it _does_ recurse, but through a sequence of methods instead of actually calling itself. It would be better to say something like "unpack_subdirectories = 1" and disabling it when we are in a sparse directory. >> >> /* Find first entry with a real name (we could use "mask" too) */ >> while (!p->mode) >> @@ -1284,12 +1299,16 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str >> } >> } >> src[0] = ce; >> + >> + if (S_ISSPARSEDIR(ce->ce_mode)) >> + recurse = 0; > > Ah, the context here doesn't show it but this is in the "if (!cmp)" > block, i.e. if we found a match for the sparse directory. This makes > sense, to me, _if_ we ignore the above question about sparse > directories matching equal to anything and everything. I believe that "anything and everything" concern has been resolved. >> @@ -1319,7 +1338,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str >> } >> } >> >> - if (traverse_trees_recursive(n, dirmask, mask & ~dirmask, >> + if (recurse && >> + traverse_trees_recursive(n, dirmask, mask & ~dirmask, >> names, info) < 0) >> return -1; >> return mask; > > Nice. :-) > > > I think your patch was mostly about the recurse stuff, which other > than the name or a comment about it look good to me. However, all the > other preparatory small tweaks brought up a lot of questions or > confusion for me. I'm worried there might be a bug or two, though I > may have just misunderstood some of the code bits. This patch could probably be split up a little to make these things clearer. Thanks for bringing up the tricky bits. -Stolee