Re: [PATCH v6 23/28] fsmonitor: never set CE_FSMONITOR_VALID on submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jeff,

On Fri, 22 Apr 2022, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>
> Never set CE_FSMONITOR_VALID on the cache-entry of submodule
> directories.
>
> During a client command like 'git status', we may need to recurse
> into each submodule to compute a status summary for the submodule.
> Since the purpose of the ce_flag is to let Git avoid scanning a
> cache-entry, setting the flag causes the recursive call to be
> avoided and we report incorrect (no status) for the submodule.
>
> We created an OS watch on the root directory of our working
> directory and we receive events for everything in the cone
> under it.  When submodules are present inside our working
> directory, we receive events for both our repo (the super) and
> any subs within it.  Since our index doesn't have any information
> for items within the submodules, we can't use those events.
>
> We could try to truncate the paths of those events back to the
> submodule boundary and mark the GITLINK as dirty, but that
> feels expensive since we would have to prefix compare every FS
> event that we receive against a list of submodule roots.  And
> it still wouldn't be sufficient to correctly report status on
> the submodule, since we don't have any space in the cache-entry
> to cache the submodule's status (the 'SCMU' bits in porcelain
> V2 speak).  That is, the CE_FSMONITOR_VALID bit just says that
> we don't need to scan/inspect it because we already know the
> answer -- it doesn't say that the item is clean -- and we
> don't have space in the cache-entry to store those answers.
> So we should always do the recursive scan.
>
> Therefore, we should never set the flag on GITLINK cache-entries.
>
> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> ---
>  fsmonitor.c                  |   2 +
>  fsmonitor.h                  |  11 ++++
>  t/t7527-builtin-fsmonitor.sh | 111 +++++++++++++++++++++++++++++++++++
>  3 files changed, 124 insertions(+)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index e1229c289cf..57d6a483bee 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -580,6 +580,8 @@ void tweak_fsmonitor(struct index_state *istate)
>  		if (fsmonitor_enabled) {
>  			/* Mark all entries valid */
>  			for (i = 0; i < istate->cache_nr; i++) {
> +				if (S_ISGITLINK(istate->cache[i]->ce_mode))
> +					continue;
>  				istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
>  			}
>
> diff --git a/fsmonitor.h b/fsmonitor.h
> index 3f41f653691..edf7ce5203b 100644
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -68,6 +68,15 @@ static inline int is_fsmonitor_refreshed(const struct index_state *istate)
>   * Set the given cache entries CE_FSMONITOR_VALID bit. This should be
>   * called any time the cache entry has been updated to reflect the
>   * current state of the file on disk.
> + *
> + * However, never mark submodules as valid.  When commands like "git
> + * status" run they might need to recurse into the submodule (using a
> + * child process) to get a summary of the submodule state.  We don't
> + * have (and don't want to create) the facility to translate every
> + * FS event that we receive and that happens to be deep inside of a
> + * submodule back to the submodule root, so we cannot correctly keep
> + * track of this bit on the gitlink directory.  Therefore, we never
> + * set it on submodules.
>   */
>  static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache_entry *ce)
>  {
> @@ -75,6 +84,8 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
>
>  	if (fsm_mode > FSMONITOR_MODE_DISABLED &&
>  	    !(ce->ce_flags & CE_FSMONITOR_VALID)) {
> +		if (S_ISGITLINK(ce->ce_mode))
> +			return;
>  		istate->cache_changed = 1;
>  		ce->ce_flags |= CE_FSMONITOR_VALID;
>  		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d0e681d008f..4c49ae5a684 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -721,4 +721,115 @@ do
>  	'
>  done
>
> +# Test fsmonitor interaction with submodules.
> +#
> +# If we start the daemon in the super, it will see FS events for
> +# everything in the working directory cone and this includes any
> +# files/directories contained *within* the submodules.
> +#
> +# A `git status` at top level will get events for items within the
> +# submodule and ignore them, since they aren't named in the index
> +# of the super repo.  This makes the fsmonitor response a little
> +# noisy, but it doesn't alter the correctness of the state of the
> +# super-proper.
> +#
> +# When we have submodules, `git status` normally does a recursive
> +# status on each of the submodules and adds a summary row for any
> +# dirty submodules.  (See the "S..." bits in porcelain V2 output.)
> +#
> +# It is therefore important that the top level status not be tricked
> +# by the FSMonitor response to skip those recursive calls.  That is,
> +# even if FSMonitor says that the mtime of the submodule directory
> +# hasn't changed and it could be implicitly marked valid, we must
> +# not take that shortcut.  We need to force the recusion into the
> +# submodule so that we get a summary of the status *within* the
> +# submodule.
> +
> +create_super () {
> +	super="$1" &&
> +
> +	git init "$super" &&
> +	echo x >"$super/file_1" &&
> +	echo y >"$super/file_2" &&
> +	echo z >"$super/file_3" &&
> +	mkdir "$super/dir_1" &&
> +	echo a >"$super/dir_1/file_11" &&
> +	echo b >"$super/dir_1/file_12" &&
> +	mkdir "$super/dir_1/dir_2" &&
> +	echo a >"$super/dir_1/dir_2/file_21" &&
> +	echo b >"$super/dir_1/dir_2/file_22" &&
> +	git -C "$super" add . &&
> +	git -C "$super" commit -m "initial $super commit"
> +}
> +
> +create_sub () {
> +	sub="$1" &&
> +
> +	git init "$sub" &&
> +	echo x >"$sub/file_x" &&
> +	echo y >"$sub/file_y" &&
> +	echo z >"$sub/file_z" &&
> +	mkdir "$sub/dir_x" &&
> +	echo a >"$sub/dir_x/file_a" &&
> +	echo b >"$sub/dir_x/file_b" &&
> +	mkdir "$sub/dir_x/dir_y" &&
> +	echo a >"$sub/dir_x/dir_y/file_a" &&
> +	echo b >"$sub/dir_x/dir_y/file_b" &&
> +	git -C "$sub" add . &&
> +	git -C "$sub" commit -m "initial $sub commit"
> +}
> +
> +my_match_and_clean () {
> +	git -C super --no-optional-locks status --porcelain=v2 >actual.with &&
> +	git -C super --no-optional-locks -c core.fsmonitor=false \
> +		status --porcelain=v2 >actual.without &&
> +	test_cmp actual.with actual.without &&
> +
> +	git -C super/dir_1/dir_2/sub reset --hard &&
> +	git -C super/dir_1/dir_2/sub clean -d -f
> +}
> +
> +test_expect_success "Submodule always visited" '

I almost feel bad offering this nit: could you use single-quotes, and
start with a lower-case `s`?

Thanks,
Dscho

> +	test_when_finished "git -C super fsmonitor--daemon stop; \
> +			    rm -rf super; \
> +			    rm -rf sub" &&
> +
> +	create_super super &&
> +	create_sub sub &&
> +
> +	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> +	git -C super commit -m "add sub" &&
> +
> +	start_daemon -C super &&
> +	git -C super config core.fsmonitor true &&
> +	git -C super update-index --fsmonitor &&
> +	git -C super status &&
> +
> +	# Now run pairs of commands w/ and w/o FSMonitor while we make
> +	# some dirt in the submodule and confirm matching output.
> +
> +	# Completely clean status.
> +	my_match_and_clean &&
> +
> +	# .M S..U
> +	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_u &&
> +	my_match_and_clean &&
> +
> +	# .M S.M.
> +	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_m &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	my_match_and_clean &&
> +
> +	# .M S.M.
> +	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	my_match_and_clean &&
> +
> +	# .M SC..
> +	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	git -C super/dir_1/dir_2/sub commit -m "SC.." &&
> +	my_match_and_clean
> +'
> +
>  test_done
> --
> gitgitgadget
>
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux