Re: [PATCH v2] add, rm, mv: fix bug that prevents the update of non-sparse dirs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 25.10.21 um 23:07 schrieb Matheus Tavares:
> These three commands recently learned to avoid updating paths outside
> the sparse checkout even if they are missing the SKIP_WORKTREE bit. This
> is done using path_in_sparse_checkout(), which checks whether a given
> path matches the current list of sparsity rules, similar to what
> clear_ce_flags() does when we run "git sparse checkout init" or "git
> sparse-checkout reapply". However, clear_ce_flags() uses a recursive
> approach, applying the match results from parent directories on paths
> that get the UNDECIDED result, whereas path_in_sparse_checkout() only
> attempts to match the full path and immediately considers UNDECIDED as
> NOT_MATCHED. This makes the function miss matches with leading
> directories. For example, if the user has the sparsity patterns "!/a"
> and "b/", add, rm, and mv will fail to update the path "a/b/c" and end
> up displaying a warning about it being outside the sparse checkout even
> though it isn't. This problem only occurs in full pattern mode as the
> pattern matching functions never return UNDECIDED for cone mode.
>
> To fix this, replicate the recursive behavior of clear_ce_flags() in
> path_in_sparse_checkout(), falling back to the parent directory match
> when a path gets the UNDECIDED result. (If this turns out to be too
> expensive in some cases, we may want to later add some form of caching
> to accelerate multiple queries within the same directory. This is not
> implemented in this patch, though.) Also add two tests for each affected
> command (add, rm, and mv) to check that they behave correctly with the
> recursive pattern matching. The first test would previously fail without
> this patch while the second already succeeded. It is added mostly to
> make sure that we are not breaking the existing pattern matching for
> directories that are really sparse, and also as a protection against any
> future regressions.
>
> Two other existing tests had to be changed as well: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it didn't
> allow the non-sparse entries inside <dir> to be removed. The other one,
> in t7002, tested that "git mv" would correctly display a warning message
> for sparse paths, but it accidentally expected the message to include
> two non-sparse paths as well.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx>
> ---
>
> Changes since RFC/v1 [1]:
>
> - Inverted the loop direction to start from the full path and go backwards in
>   the parent dirs. This way we can stop early when we find the first
>   non-UNDECIDED match result.
>
> - Simplified the implementation by unifing the code path for cone mode and
>   full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED
>   for cone mode, it will always execute only one iteration of the loop and then
>   find the final answer. There is no need to handle this case in a separate
>   block.
>
> - Inside the loop, made sure to change dtype to DT_DIR when going to parent
>   directories. Without this, the pattern match would fail if we had a path
>   like "a/b/c" and the pattern "b/" (with trailing slash).
>
> - Changed the tests to use trailing slash to make sure they cover the corner
>   case described above.
>
> - Improved commit message.
>
> [1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@xxxxxx
> (The RFC was deep down another thread, so I separated v2 to help
> readers. Please, let me know if that is not a good approach and I will
> avoid it in the future.)
>
>  dir.c                          | 25 +++++++++++++++++------
>  t/t3602-rm-sparse-checkout.sh  | 37 +++++++++++++++++++++++++++++++---
>  t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++
>  t/t7002-mv-sparse-checkout.sh  | 24 ++++++++++++++++++++--
>  4 files changed, 93 insertions(+), 11 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index a4306ab874..248f72e732 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result match = UNDECIDED;
> +	const char *end, *slash;
>
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	/*
> +	 * If UNDECIDED, use the match from the parent dir (recursively),
> +	 * or fall back to NOT_MATCHED at the topmost level.
> +	 */
> +	for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) {
> +
> +		for (slash = end - 1; slash >= path && *slash != '/'; slash--)
> +			; /* do nothing */

slash can end up one less than path.  If path points to the first char
of a string object this would be undefined if I read 6.5.6 of C99
correctly.  (A pointer to the array element just after the last one is
specified as fine as long as it's not dereferenced, but a pointer to
the element before the first one is not mentioned and thus undefined.)

Do you really need the ">=" instead of ">"?

> +
> +		match = path_matches_pattern_list(path, end - path,
> +				slash >= path ? slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		/* We are going to match the parent dir now */
> +		dtype = DT_DIR;
> +	}
> +	return match > 0;
>  }
>
>  int path_in_sparse_checkout(const char *path,




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux