Re: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/2021 10:28 PM, Matheus Tavares wrote:
> On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>>
>> $ cat .git/info/sparse-checkout
>> !arch/*
>> !tools/arch/*
>> !virt/kvm/arm/*
>> /*
>> arch/.gitignore
>> arch/Kconfig
>> arch/x86
>> tools/arch/x86
>> tools/include/uapi/linux/kvm.h
>> !Documentation
>> !drivers
>>
>> $ git read-tree -mu HEAD
>>
>> $ rm arch/x86/kvm/x86.c
> [...]
>> $ git add arch/x86
>> The following paths and/or pathspecs matched paths that exist
>> outside of your sparse-checkout definition, so will not be
>> updated in the index:
>> arch/x86
> 
> I think the problem may be that we are performing pattern matching
> slightly different in add, mv, and rm, in comparison to "git
> sparse-checkout". On "git sparse-checkout init" (or reapply), we call
> clear_ce_flags() which calls path_matches_pattern_list() for each
> component of the working tree paths. If the full path gives a match
> result of UNDECIDED, we recursively try to use the match result from
> the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED).

Yes! I think this is absolutely the problem. Thanks for pointing
this out!
 
> In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but
> "arch/x86" gives MATCHED, so we end up using that for the full path.
> 
> However, in add|mv|rm we only call path_matches_pattern_list() for the
> full path and get UNDECIDED, which we consider the same as NOT_MATCHED,
> and end up disallowing the path update operation with a warning message.
> 
> The commands do work if we replace the sparsity pattern "arch/x86" with
> "arch/x86/" (with a trailing slash), but note that it only works
> because the pattern is relative to the root (see dir.c:1297). If we
> change it to "x86/", it would no longer work.
> 
> So far, the only way I could think of to fix this would be to perform
> pattern matching for the leading components of the paths too. That
> doesn't seem very nice, though, as it can probably be quite expensive...
> But here is a patch for discussion:

I agree that it is expensive, but that's already the case for the
non-cone sparse-checkout patterns. Hopefully it is sufficient that
these cases are restricted to modified files (in the case of `git add .`)
or specific pathspecs (in the case of `git mv` and `git rm`).

> -- >8 --
> Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs
> 
> These three commands recently learned to avoid updating paths that do
> not match the sparse-checkout patterns even if they are missing the
> SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which
> tries to match the path with the current set of sparsity rules using
> path_matches_pattern_list(). This is similar to what clear_ce_flags()
> does when we run "git sparse-checkout init" or "git sparse-checkout
> reapply". But note that clear_ce_flags() has a recursive behavior,
> calling path_matches_pattern_list() for each component in a path,
> whereas path_in_sparse_checkout() only calls it for the full path. This
> makes the function miss matches such as the one between path "a/b/c" and
> the pattern "b/". So if the user has the sparsity rules "!/a" and "b/",
> for example, add, rm, and mv will fail to update the path "a/b/c" and
> end up displaying a warning about "a/b/c" being outside the sparse
> checkout even though it isn't. Note that this problem only occurs with
> non-cone mode.
> 
> Fix this by making path_in_sparse_checkout() perform pattern matching
> for every component in the given path when cone mode is disabled. (This
> can be expensive, and we might want to do some form of caching for the
> match results of the leading components. However, this is not
> implemented in this patch.) Also add two tests for each command (add,
> rm, and mv) to check that they behave correctly with the said pattern
> matching. The first test would previously fail without this patch, while
> the second already succeeded. It is added mostly to make sure that we
> are not breaking the existing pattern matching for directories that are
> really sparse, and also as a protection against any future
> regressions.
> 
> Note that two other existing tests had to be changed: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it
> didn't allow the non-sparse entries inside <dir> to be removed. The
> other one, in t7002, tested that "git mv" would correctly display a
> warning message for sparse paths, but it accidentally expected the
> message to include two non-sparse paths as well.


> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result ret = NOT_MATCHED;
> +	const char *p, *last_slash = NULL;
>  
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>  
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	if (istate->sparse_checkout_patterns->use_cone_patterns) {
> +		const char *base = strrchr(path, '/');
> +		return path_matches_pattern_list(path, strlen(path),
> +				base ? base + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate) > 0;
> +	}
> +
> +	for (p = path; ; p++) {
> +		enum pattern_match_result match;
> +
> +		if (*p && *p != '/')
> +			continue;
> +
> +		match  = path_matches_pattern_list(path, p - path,
> +				last_slash ? last_slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		if (match != UNDECIDED)
> +			ret = match;
> +		if (!*p)
> +			break;
> +		last_slash = p;
> +	}
> +
> +	return ret;

This implementation makes sense to me.

>  test_expect_success 'recursive rm does not remove sparse entries' '
>  	git reset --hard &&
>  	git sparse-checkout set sub/dir &&
> -	test_must_fail git rm -r sub &&
> -	git rm --sparse -r sub &&
> +	git rm -r sub &&

Interesting that the new pattern-matching already presents a change of
behavior in this test case.

>  	git status --porcelain -uno >actual &&
>  	cat >expected <<-\EOF &&
> +	D  sub/dir/e
> +	EOF
> +	test_cmp expected actual &&

And here is why. Excellent. I suppose that setting the pattern to be
"sub/dir/" would have shown this behavior before.

> +
> +	git rm --sparse -r sub &&
> +	git status --porcelain -uno >actual2 &&
> +	cat >expected2 <<-\EOF &&
>  	D  sub/d
>  	D  sub/dir/e
>  	EOF
> -	test_cmp expected actual
> +	test_cmp expected2 actual2
>  '

The rest of the test cases add new checks that are very valuable.

I love this idea and I agree that it would be better to change the
loop direction to match the full path first (as you mention in your
response).

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux