Re: [PATCH v6 7/7] grep: simplify config parsing and option parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> @@ -143,7 +142,6 @@ struct grep_opt {
>  	int unmatch_name_only;
>  	int count;
>  	int word_regexp;
> -	int fixed;
>  	int all_match;
>  #define GREP_BINARY_DEFAULT	0
>  #define GREP_BINARY_NOMATCH	1
> @@ -152,7 +150,6 @@ struct grep_opt {
>  	int allow_textconv;
>  	int extended;
>  	int use_reflog_filter;
> -	int pcre2;
>  	int relative;
>  	int pathname;
>  	int null_following_name;
> @@ -162,7 +159,7 @@ struct grep_opt {
>  	int funcname;
>  	int funcbody;
>  	int extended_regexp_option;
> -	int pattern_type_option;
> +	enum grep_pattern_type pattern_type_option;
>  	int ignore_locale;
>  	char colors[NR_GREP_COLORS][COLOR_MAXLEN];
>  	unsigned pre_context;
> @@ -181,7 +178,6 @@ struct grep_opt {
>  	.relative = 1, \
>  	.pathname = 1, \
>  	.max_depth = -1, \
> -	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
>  	.colors = { \
>  		[GREP_COLOR_CONTEXT] = "", \
>  		[GREP_COLOR_FILENAME] = "", \

I very much like the lossage of redundant fixed and pcre2 members.

As I kept telling you, we still need a separate bit to keep track of
the last value of grep.extendedRegexp, but the primary mechanism to
determine what pattern type to use should be a single enum that is
pattern_type.  When we see "fixed", "pcre", "-G", etc. from
grep.patternType config or from command line, we can stuff their
enum values in pattern_type member of this struct, and when we see
"default", we need to leave "default" in pattern_type member until
we see the last definition of grep.extendedRegexp, at which time
we can turn it into either "basic" or "extended".

So having only two members is absolutely the right thing to do.

But this part convinces me that whatever this patch does, it will
not possible be capable of doing the right thing.  You cannot
implement "we have to remember that the last grep.patternType we saw
was DEFAULT and in that case we cannot decide the real pattern type
until we see the last definition of grep.extendedRegexp, which may
be well after we saw the last grep.patternType definition" without a
value in this enum to express that the last value we saw was DEFAULT.

>  enum grep_pattern_type {
> -	GREP_PATTERN_TYPE_UNSPECIFIED = 0,
> -	GREP_PATTERN_TYPE_BRE,
> +	GREP_PATTERN_TYPE_BRE = 0,
>  	GREP_PATTERN_TYPE_ERE,
>  	GREP_PATTERN_TYPE_FIXED,
>  	GREP_PATTERN_TYPE_PCRE

> @@ -982,7 +981,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	argc = parse_options(argc, argv, prefix, options, grep_usage,
>  			     PARSE_OPT_KEEP_DASHDASH |
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
> -	grep_commit_pattern_type(pattern_type_arg, &opt);

In other words, this lossage is likely wrong.  Let's keep reading
and see how well the config reader in this patch does. 

> @@ -61,11 +59,25 @@ int grep_config(const char *var, const char *value, void *cb)
>  		return -1;
>  
>  	if (!strcmp(var, "grep.extendedregexp")) {
> +		if (opt->extended_regexp_option == -1)
> +			return 0;
>  		opt->extended_regexp_option = git_config_bool(var, value);
> +		if (opt->extended_regexp_option)
> +			opt->pattern_type_option = GREP_PATTERN_TYPE_ERE;
> +		else
> +			opt->pattern_type_option = GREP_PATTERN_TYPE_BRE;
> +		return 0;
> +	}
> +	if (!strcmp(var, "grep.patterntype") &&
> +	    !strcmp(value, "default")) {
> +		opt->pattern_type_option = opt->extended_regexp_option == 1
> +			? GREP_PATTERN_TYPE_ERE : GREP_PATTERN_TYPE_BRE;
>  		return 0;
>  	}
>  
>  	if (!strcmp(var, "grep.patterntype")) {
> +		opt->extended_regexp_option = -1; /* ignore */
>  		opt->pattern_type_option = parse_pattern_type_arg(var, value);
>  		return 0;
>  	}

The above does not look correct at all.

What happens when the configuration parser sees these configuration
variables in this sequence:

 - grep.patternType set to say "pcre" (or anything not "default").
 - grep.extendedRegexp set to "true".
 - grep.patternType set to "default".

After these three variable definitions with the usual "last one
wins" (for each variable independently), the last value for the
grep.patternType variable is "default", and the last value for
the grep.extendedRegexp variable is "true".  The user wants to use
the ERE patterns.

The way the above code would work on this three variable definition
sequence, as far as I read it, would however not give us the desired
behaviour.  First we drop extended_regexp_option member to -1 while
setting PCRE to attern_type_option member, and then grep.extendedRegexp
is totally ignored, and then we see patterntype set to default and
notice extended_regexp_option is *NOT* 1 (because you ignored it and
left it to -1), and end up using BRE, no?

I agree 100% with the direction that .fixed and .pcre2 members that
were added over time to the struct are redundant and it is a very
good idea to get rid of them.  But we need to keep track of two
configuration variables separately to allow them the "last one wins"
semantics independently, and for that, you cannot lose the "default"
value from the enum.  It is impossible not to store the fact that
"default" was the last value so far we saw for grep.patternType
because you do not know, at the point of seeing "default", what the
final value for grep.extendedRegexp will be.  If you want to correctly
implement the interaction between two variables without regression,
that is.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux