Re: [PATCH 2/3] wildmatch: support "no FNM_PATHNAME" mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> By default wildmatch(,, 0) is equivalent with fnmatch(,, FNM_PATHNAME).

Is this stating a fact before or after the patch?

I think it is more like:

    So far, wildmatch() has always honoured directory boundary and
    there was no way to turn it off.  Make it behave more like
    fnmatch() by requiring all callers that want the FNM_PATHNAME
    behaviour to pass that in the flags parameter.  Callers that do
    not specify FNM_PATHNAME will get wildcards like ? and * in
    their patterns matched against '/' and ...

>  The choice of name "pathspec" is not good. I couldn't think of
>  anything appropriate and just did not care enough at this point.

Well, you should, before this series leaves the WIP state.  It seems
that all operating modes supported by test-wildmatch are named as
somethingmatch, so "pathmatch" may be a better candidate.

> diff --git a/test-wildmatch.c b/test-wildmatch.c
> index e384c8e..7fefa4f 100644
> --- a/test-wildmatch.c
> +++ b/test-wildmatch.c
> @@ -12,9 +12,11 @@ int main(int argc, char **argv)
>  			argv[i] += 3;
>  	}
>  	if (!strcmp(argv[1], "wildmatch"))
> +		return !!wildmatch(argv[3], argv[2], FNM_PATHNAME);
> +	else if (!strcmp(argv[1], "pathspec"))
>  		return !!wildmatch(argv[3], argv[2], 0);

"ipathmatch" to pass only FNM_CASEFOLD may be a natural extension,
but I doubt we use that combination in the real life (yet).

I am probably two step ahead of what is being done (read: this will
be a Git 2.0 topic, if not later), but I am wondering how we'd
integrate this new machinery well with the pathspec limited
traversal.

Traditionally, when traversing a tree limited with a pathspec, say,
"Documentation/*.txt", we used the simple part of the prefix and
noticed that any subdirectory whose name is not "Documentation" can
never match the pathspec and avoided descending into it.  As the
user can use "**" to expand to "any levels of subdirectory", I think
the pathspec limited traversal eventually can safely (and should)
use FNM_PATHNAME by default.  

That will allow people to say "Documentation/**/*.txt" to match both
"git.txt" and "howto/maintain-git.txt", without resorting to the
more traditional !FNM_PATHNAME semantics "Documentation/*.txt" to
match the latter (i.e. "*" matches "howto/maintain-git").

When that happens, we should want to retain the same "do not bother
to descend into subdirectories that will never match" optimization
for a pattern like "Doc*tion/**/*.txt".  Because of FNM_PATHNAME, we
can tell if a subdirectory is worth descending into by looking at
the not-so-simple prefix "Doc*tion/"; "Documentation" will match,
"Doc" will not (because '*' won't match '/').

Which tells me that integrating this _well_ into the rest of the
system is not just a matter of replacing fnmatch() with wildmatch().

I also expect that wildmatch() would be much slower than fnmatch()
especially when doing its "**" magic, but I personally do not think
it will be a showstopper.  If the user asks for a more powerful but
expensive operation, we are saving time for the user by doing a more
powerful thing (reducing the need to postprocess the results) and
can afford to spend extra cycles.

As long as simpler patterns fnmatch() groks (namely, '?', '*', and
'[class]' wildcards only) are not slowed down by replacing it with
wildmatch(), that is, of course.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]