Hi Jeff, On Fri, 22 Apr 2022, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > Never set CE_FSMONITOR_VALID on the cache-entry of submodule > directories. > > During a client command like 'git status', we may need to recurse > into each submodule to compute a status summary for the submodule. > Since the purpose of the ce_flag is to let Git avoid scanning a > cache-entry, setting the flag causes the recursive call to be > avoided and we report incorrect (no status) for the submodule. > > We created an OS watch on the root directory of our working > directory and we receive events for everything in the cone > under it. When submodules are present inside our working > directory, we receive events for both our repo (the super) and > any subs within it. Since our index doesn't have any information > for items within the submodules, we can't use those events. > > We could try to truncate the paths of those events back to the > submodule boundary and mark the GITLINK as dirty, but that > feels expensive since we would have to prefix compare every FS > event that we receive against a list of submodule roots. And > it still wouldn't be sufficient to correctly report status on > the submodule, since we don't have any space in the cache-entry > to cache the submodule's status (the 'SCMU' bits in porcelain > V2 speak). That is, the CE_FSMONITOR_VALID bit just says that > we don't need to scan/inspect it because we already know the > answer -- it doesn't say that the item is clean -- and we > don't have space in the cache-entry to store those answers. > So we should always do the recursive scan. > > Therefore, we should never set the flag on GITLINK cache-entries. > > Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > --- > fsmonitor.c | 2 + > fsmonitor.h | 11 ++++ > t/t7527-builtin-fsmonitor.sh | 111 +++++++++++++++++++++++++++++++++++ > 3 files changed, 124 insertions(+) > > diff --git a/fsmonitor.c b/fsmonitor.c > index e1229c289cf..57d6a483bee 100644 > --- a/fsmonitor.c > +++ b/fsmonitor.c > @@ -580,6 +580,8 @@ void tweak_fsmonitor(struct index_state *istate) > if (fsmonitor_enabled) { > /* Mark all entries valid */ > for (i = 0; i < istate->cache_nr; i++) { > + if (S_ISGITLINK(istate->cache[i]->ce_mode)) > + continue; > istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID; > } > > diff --git a/fsmonitor.h b/fsmonitor.h > index 3f41f653691..edf7ce5203b 100644 > --- a/fsmonitor.h > +++ b/fsmonitor.h > @@ -68,6 +68,15 @@ static inline int is_fsmonitor_refreshed(const struct index_state *istate) > * Set the given cache entries CE_FSMONITOR_VALID bit. This should be > * called any time the cache entry has been updated to reflect the > * current state of the file on disk. > + * > + * However, never mark submodules as valid. When commands like "git > + * status" run they might need to recurse into the submodule (using a > + * child process) to get a summary of the submodule state. We don't > + * have (and don't want to create) the facility to translate every > + * FS event that we receive and that happens to be deep inside of a > + * submodule back to the submodule root, so we cannot correctly keep > + * track of this bit on the gitlink directory. Therefore, we never > + * set it on submodules. > */ > static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache_entry *ce) > { > @@ -75,6 +84,8 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache > > if (fsm_mode > FSMONITOR_MODE_DISABLED && > !(ce->ce_flags & CE_FSMONITOR_VALID)) { > + if (S_ISGITLINK(ce->ce_mode)) > + return; > istate->cache_changed = 1; > ce->ce_flags |= CE_FSMONITOR_VALID; > trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name); > diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh > index d0e681d008f..4c49ae5a684 100755 > --- a/t/t7527-builtin-fsmonitor.sh > +++ b/t/t7527-builtin-fsmonitor.sh > @@ -721,4 +721,115 @@ do > ' > done > > +# Test fsmonitor interaction with submodules. > +# > +# If we start the daemon in the super, it will see FS events for > +# everything in the working directory cone and this includes any > +# files/directories contained *within* the submodules. > +# > +# A `git status` at top level will get events for items within the > +# submodule and ignore them, since they aren't named in the index > +# of the super repo. This makes the fsmonitor response a little > +# noisy, but it doesn't alter the correctness of the state of the > +# super-proper. > +# > +# When we have submodules, `git status` normally does a recursive > +# status on each of the submodules and adds a summary row for any > +# dirty submodules. (See the "S..." bits in porcelain V2 output.) > +# > +# It is therefore important that the top level status not be tricked > +# by the FSMonitor response to skip those recursive calls. That is, > +# even if FSMonitor says that the mtime of the submodule directory > +# hasn't changed and it could be implicitly marked valid, we must > +# not take that shortcut. We need to force the recusion into the > +# submodule so that we get a summary of the status *within* the > +# submodule. > + > +create_super () { > + super="$1" && > + > + git init "$super" && > + echo x >"$super/file_1" && > + echo y >"$super/file_2" && > + echo z >"$super/file_3" && > + mkdir "$super/dir_1" && > + echo a >"$super/dir_1/file_11" && > + echo b >"$super/dir_1/file_12" && > + mkdir "$super/dir_1/dir_2" && > + echo a >"$super/dir_1/dir_2/file_21" && > + echo b >"$super/dir_1/dir_2/file_22" && > + git -C "$super" add . && > + git -C "$super" commit -m "initial $super commit" > +} > + > +create_sub () { > + sub="$1" && > + > + git init "$sub" && > + echo x >"$sub/file_x" && > + echo y >"$sub/file_y" && > + echo z >"$sub/file_z" && > + mkdir "$sub/dir_x" && > + echo a >"$sub/dir_x/file_a" && > + echo b >"$sub/dir_x/file_b" && > + mkdir "$sub/dir_x/dir_y" && > + echo a >"$sub/dir_x/dir_y/file_a" && > + echo b >"$sub/dir_x/dir_y/file_b" && > + git -C "$sub" add . && > + git -C "$sub" commit -m "initial $sub commit" > +} > + > +my_match_and_clean () { > + git -C super --no-optional-locks status --porcelain=v2 >actual.with && > + git -C super --no-optional-locks -c core.fsmonitor=false \ > + status --porcelain=v2 >actual.without && > + test_cmp actual.with actual.without && > + > + git -C super/dir_1/dir_2/sub reset --hard && > + git -C super/dir_1/dir_2/sub clean -d -f > +} > + > +test_expect_success "Submodule always visited" ' I almost feel bad offering this nit: could you use single-quotes, and start with a lower-case `s`? Thanks, Dscho > + test_when_finished "git -C super fsmonitor--daemon stop; \ > + rm -rf super; \ > + rm -rf sub" && > + > + create_super super && > + create_sub sub && > + > + git -C super submodule add ../sub ./dir_1/dir_2/sub && > + git -C super commit -m "add sub" && > + > + start_daemon -C super && > + git -C super config core.fsmonitor true && > + git -C super update-index --fsmonitor && > + git -C super status && > + > + # Now run pairs of commands w/ and w/o FSMonitor while we make > + # some dirt in the submodule and confirm matching output. > + > + # Completely clean status. > + my_match_and_clean && > + > + # .M S..U > + echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_u && > + my_match_and_clean && > + > + # .M S.M. > + echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_m && > + git -C super/dir_1/dir_2/sub add . && > + my_match_and_clean && > + > + # .M S.M. > + echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a && > + git -C super/dir_1/dir_2/sub add . && > + my_match_and_clean && > + > + # .M SC.. > + echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a && > + git -C super/dir_1/dir_2/sub add . && > + git -C super/dir_1/dir_2/sub commit -m "SC.." && > + my_match_and_clean > +' > + > test_done > -- > gitgitgadget > >