Re: [PATCH 3/4] pathspec: apply "*.c" optimization from exclude

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> -O2 build on linux-2.6, without the patch:

Before the result, can you briefly explain what '"*.c" optimization
from exclude' the title talks about is?

    When a pattern contains only a single asterisk, e.g. "foo*bar",
    after literally comparing the leading part "foo" with the
    string, we can compare the tail of the string and make sure it
    matches "bar", instead of running fnmatch() on "*bar" against
    the remainder of the string.

The funny thing was that trying to explain what the logic should do
makes one realize potential issues in the implementation of that
logic ;-)  See below.

> $ time git rev-list --quiet HEAD -- '*.c'
>
> real    0m40.770s
> user    0m40.290s
> sys     0m0.256s
>
> With the patch
>
> $ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
>
> real    0m34.288s
> user    0m33.997s
> sys     0m0.205s
>
> The above command is not supposed to be widely popular.

Hrm, perhaps.  I use "git grep <pattern> -- \*.c" quite a lot, but
haven't seen use case for \*.c in the context of the "log" family.

> diff --git a/cache.h b/cache.h
> index bf031f1..d18f584 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -473,6 +473,8 @@ extern int index_name_is_other(const struct index_state *, const char *, int);
>  extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
>  extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
>  
> +#define PSF_ONESTAR 1

Together with the GF_ prefix in the previous, PSF_ prefix needs a
bit of in-code explanation.  Is it just an RC3L (random combination
of 3 letters?)

> @@ -46,6 +46,12 @@ inline int git_fnmatch(const char *pattern, const char *string,
>  		pattern += prefix;
>  		string += prefix;
>  	}
> +	if (flags & GF_ONESTAR) {
> +		int pattern_len = strlen(++pattern);
> +		int string_len = strlen(string);
> +		return strcmp(pattern,
> +			      string + string_len - pattern_len);
> +	}

What happens when pattern="foo*oob" and string="foob"?

The prefix match before this code determines that the literal prefix
in the pattern matches with the early part of the string, and makes
pattern="*oob" and string="b".

When you come to strcmp(), you see that string_len is 1, pattern_len
is 3, and pattern is "oob".  string+string_len-pattern_len = "oob",
one past the beginning of the original string "foob".  They match.

Oops?

	return (string_len < pattern_len) ||
        	strcmp(pattern, string + string_len - pattern_len);

perhaps?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]