Re: [PATCH v2] ls-files: add pathspec matching for submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brandon Williams <bmwill@xxxxxxxxxx> writes:

> diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> index a623ebf..09e4449 100644
> --- a/Documentation/git-ls-files.txt
> +++ b/Documentation/git-ls-files.txt
> @@ -19,7 +19,7 @@ SYNOPSIS
> -		[--output-path-prefix=<path>]
> +		[--submodule-prefix=<path>]
> ...
> ---output-path-prefix=<path>::
> +--submodule-prefix=<path>::
> ...
> -static const char *output_path_prefix;
> +static const char *submodule_prefix;
> ...
> -	if (output_path_prefix && *output_path_prefix) {
> +	if (submodule_prefix && *submodule_prefix) {
>  		strbuf_reset(&full_name);
> -		strbuf_addstr(&full_name, output_path_prefix);
> +		strbuf_addstr(&full_name, submodule_prefix);
> ...
> -	argv_array_pushf(&cp.args, "--output-path-prefix=%s%s/",
> -			 output_path_prefix ? output_path_prefix : "",
> +	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
> +			 submodule_prefix ? submodule_prefix : "",
>  			 ce->name);

As the previous one that used a wrong (sorry) argument is not even
in 'next' yet, let's pretend that it never happened.  It is OK to
still keep it and this patch as two separate steps, i.e. a topic
with two patches in it.

That means that this patch will become 2/2 of a series, and 1/2 is
rerolled to use submodule_prefix from the get-go, without ever
introducing output_path_prefix variable, so that many of the above
lines we won't have to review in 2/2.

> +	/*
> +	 * Pass in the original pathspec args.  The submodule will be
> +	 * responsible for prepending the 'submodule_prefix' prior to comparing
> +	 * against the pathspec for matches.
> +	 */

Good.

> +	argv_array_push(&cp.args, "--");
> +	for (i = 0; i < pathspec.nr; ++i)
> +		argv_array_push(&cp.args, pathspec.items[i].original);
> +

Please prefer post-increment i++ over pre-increment ++i when writing
a for(;;) loop, unless there is a strong reason not to (familiarlity
in C++ is not a good reason).

> diff --git a/dir.c b/dir.c
> index 0ea235f..1afc3ff 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -63,6 +63,35 @@ int fspathncmp(const char *a, const char *b, size_t count)
>  	return ignore_case ? strncasecmp(a, b, count) : strncmp(a, b, count);
>  }
>  
> +/**
> + * Used to perform prefix matching against a pathspec_item for determining if we
> + * should decend into a submodule.  This function can result in false positives
> + * since we are only trying to match the 'string' to a prefix of the 'pattern'
> + */

Perhaps s/can/is allowed to/.  Implicit but equally important is
that it is not OK to produce false negative.

> +static int prefix_fnmatch(const struct pathspec_item *item,
> +		   const char *pattern, const char *string,
> +		   int prefix)
> +{
> +	if (prefix > 0) {
> +		if (ps_strncmp(item, pattern, string, prefix))
> +			return WM_NOMATCH;
> +		pattern += prefix;
> +		string += prefix;
> +	}
> +
> +	if (item->flags & PATHSPEC_ONESTAR) {
> +		return WM_MATCH;
> +	} else if (item->magic & PATHSPEC_GLOB) {
> +		return wildmatch(pattern, string,
> +				 WM_PATHNAME |
> +				 (item->magic & PATHSPEC_ICASE ?
> +				  WM_CASEFOLD : 0),
> +				 NULL);

Isn't this last one overly tight?  I am wondering about a scenario
where you have a submodule at "sub/" in the superproject, and "sub/"
has a "file" at the top of its working tree.  And you do:

	git ls-files --recurse-submodules ':(glob)??b/fi?e'

at the top of the superproject.  The "pattern" would be '??b/fi?e"
while string would be 'sub', and wildmatch() would not like it, but
there is no way for this caller to append anything to 'sub' before
making this call, as it hasn't looked into what paths appear in the
submodule repository (and it should not want to).  And I think we
would want it to recurse to find sub/file.  IOW, this looks like a
false negative we must avoid in this function.  As we cannot afford
to check if anything that matches 'fi?e' is in the index file of the
submodule repository, we shouldn't try to match 'fi?e' portion of
the given pathspec pattern.

Extending the scenario, if I also create a directory "sib/" in the
superproject next to "sub/" and add a "file" in it, the same
pathspec:

	git ls-files [--recurse-submodules] ':(glob)??b/fi?e'

does find "sib/file" (with or without the recursion option).

    Duy cc'ed because I am not quite sure why it is a good thing
    that I need to add an exlicit ':(glob)' in these examples to
    illustrate the breakage.  This comes from v1.8.3.2-849-gbd30c2e
    ("pathspec: support :(glob) syntax", 2013-07-14) and I do not
    think we want to change its behaviour. I just want to be
    reminded why we did this.  I am guessing that it was because of
    an ancient design decision that we would use fnmatch without
    FNM_PATHNAME by default, i.e. it would be too disruptive if we
    made :(glob), i.e. honor directory boundaries, the default.

With :(glob) that follows FNM_PATHNAME behaviour, ":(glob)s*" would
be rejected as a match with "sub/file" or "sib/file", and that is
OK, but I think our ':(glob)??b/fi?e' example should find both of
these paths as matches to the pattern.

> +
> +	return WM_NOMATCH;
> +}
> +

And of course, if it is not PATHSPEC_GLOB, returning NOMATCH does
not sound like erring on the side to prevent false negatives.  For
example, what does

	git ls-files --recurse-submodule '??b/fi?e'

do in the above scenario with this patch?  Doesn't the if/else
cascade fall through and hit this return that says "Nah, 'sub'
cannot possibly match, do not bother descending into it"?

One thing I was originally confused about was that there are at
least three kinds we need to worry about.

 - PATHSPEC_LITERAL, where we no-wildcard-len should cover
   everything in the pattern string, e.g. "sub/file" may be the
   pattern that would want to produce a match when trying to see if
   we want to descend into "sub".

 - PATHSPEC_GLOB, as discussed above.

 - Not having either LITERAL or GLOB.  's*' would want to say that
   it is worth descending into "sub".  If there were another
   submodule at "sob/dir", that would match the pattern 's*' as
   well, because an asterisk can match 'ob/dir' part unlike in
   PATHSPEC_GLOB case.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]