Re: [PATCH v4 04/13] dir: fix pattern matching on dirs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 02 2021, Derrick Stolee wrote:

> On 11/2/2021 9:42 AM, Derrick Stolee wrote:
>> On 11/1/2021 8:34 PM, Junio C Hamano wrote:
>>> Glen Choo <chooglen@xxxxxxxxxx> writes:
>>>
>>>> This patch changes the behavior of .gitignore such that directories are
>>>> now matched by prefix instead of matching exactly.
>> 
>> Thank you for pointing out an unintended consequence.
>> 
>>>> The failure that we observed is something like the following:
>>>>
>>>> In "a/.gitignore", we have the pattern "git/". We should expect that
>>>> "a/git/foo" to be ignored because "git/" should be matched exactly.
>>>> However, "a/git-foo/bar" is also ignored because "git-foo" matches the
>>>> prefix.
>>>>
>>>> I'll prepare a test case for this as soon as I figure out how to write
>>>> it..
> ...
>> In the meantime, I'll try to create a Git test that demonstrates a
>> problem one way or another.
>
> I created a test, but had some trouble reproducing it due to some
> subtleties higher in the call stack. Here is a patch that reverts
> the change and adds some tests.
>
> The Scalar functional tests passed with the revert, so the original
> patch was worthless to begin with. I don't recall what motivated the
> change, but clearly it was a mistake. Sorry.
>
> ---- >8 ----
>
> From d1cfc8efeab015273bfebd6cd93465e6f38dc40f Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> Date: Tue, 2 Nov 2021 10:40:06 -0400
> Subject: [PATCH] dir: fix directory-matching bug
>
> This reverts the change from ed49584 (dir: fix pattern matching on dirs,
> 2021-09-24), which claimed to fix a directory-matching problem without a
> test case. It turns out to _create_ a bug, but it is a bit subtle.
>
> The bug would have been revealed by the first of two tests being added to
> t0008-ignores.sh. The first uses a pattern "/git/" inside the a/.gitignores
> file, which matches against 'a/git/foo' but not 'a/git-foo/bar'. This test
> would fail before the revert.
>
> The second test shows what happens if the test instead uses a pattern "git/"
> and this test passes both before and after the revert.
>
> The difference in these two cases are due to how
> last_matching_pattern_from_list() checks patterns both if they have the
> PATTERN_FLAG_MUSTBEDIR and PATTERN_FLAG_NODIR flags. In the case of "git/",
> the PATTERN_FLAG_NODIR is also provided, making the change in behavior in
> match_pathname() not affect the end result of
> last_matching_pattern_from_list().
>
> Reported-by: Glen Choo <chooglen@xxxxxxxxxx>
> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>  dir.c              |  2 +-
>  t/t0008-ignores.sh | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index c6d7a8647b9..94489298f4c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1294,7 +1294,7 @@ int match_pathname(const char *pathname, int pathlen,
>  		 * then our prefix match is all we need; we
>  		 * do not need to call fnmatch at all.
>  		 */
> -		if (!patternlen && (!namelen || (flags & PATTERN_FLAG_MUSTBEDIR)))
> +		if (!patternlen && !namelen)
>  			return 1;
>  	}
>  
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 532637de882..1889cfc60e0 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -803,6 +803,32 @@ test_expect_success 'existing directory and file' '
>  	grep top-level-dir actual
>  '
>  
> +test_expect_success 'exact prefix matching (with root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo /git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'exact prefix matching (without root)' '
> +	test_when_finished rm -r a &&
> +	mkdir -p a/git a/git-foo &&
> +	touch a/git/foo a/git-foo/bar &&
> +	echo git/ >a/.gitignore &&
> +	git check-ignore a/git a/git/foo a/git-foo a/git-foo/bar >actual &&
> +	cat >expect <<-\EOF &&
> +	a/git
> +	a/git/foo
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  ############################################################################
>  #
>  # test whitespace handling

We have t3070-wildmatch.sh testing various combinations of these, and
indeed this code resolves back to wildmatch().

But I think in this case this is due to dir.c's particular behavior of
splitting paths before feeding them to wildmatch, as it needs to match
things relative to the subdirectory.

Still, we've got a matrix of these in t3070-wildmatch.sh, which already
tests some (but apparently not all) cases where we need to create an
actual file on disk. These sorts of test blindspots are why I added that
in de8bada2bf6 (wildmatch test: create & test files on disk in addition
to in-memory, 2018-01-30).

Wouldn't it be better & more exhaustive here to change its test lines
like:


    match 0 0 1 1 foo/bar/baz 'bar'

To say:

    match 0 0 1 1 ? ? foo/bar/baz 'bar'

And just add to its match() function so that if we have a subject with a
slash, we mkdir up to that first slash (here: "mkdir foo"), and create a
.gitignore file therein with the "foo" directory with the "bar" content,
perhaps adding "/bar", "bar/" and "/bar/" variants implicitly.

Then create a "bar.txt" in the directory as well, and a
bar-otherdir/somefile.txt or whatever.

And fill in the "? ?" depending on whether those variants matched or
not...

Anyway, just an idea, but if you pursue that you should be able to get
much more exhaustive testing in this area that we've apparently had a
blindspot in.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux