Re: [PATCH v2 00/12] nd/icase updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> v2 fixes Junio's and Jeff's comments (both good). The sharing "!icase
> || ascii_only" is made a separate commit (6/12) because I think it
> takes some seconds to realize that the conversion is correct and it's
> technically not needed in 5/12 (and it's sort of the opposite of 1/12)
>
> Interdiff

OK.  regcomp_or_die() does make the code simpler.

> diff --git a/grep.c b/grep.c
> index cb058a5..92587a8 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -432,15 +432,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
>  	ascii_only     = !has_non_ascii(p->pattern);
>  
> +	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
>  		p->fixed = !icase || ascii_only;
>  	else
>  		p->fixed = 0;
>  
> @@ -449,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  		kwsincr(p->kws, p->pattern, p->patternlen);
>  		kwsprep(p->kws);
>  		return;
> +	} else if (opt->fixed) {
> +		compile_fixed_regexp(p, opt);
> +		return;
>  	}

This if/elseif/else cascade made a lot simpler and while the
discussion is fresh in my brain it makes sense, but it may deserve a
bit of commenting.

And while attempting to do so, I found one possible issue there.

Can't p->ignore_case be true even when opt->regflags does not have
REG_ICASE?  The user never asked us to do a regexp match in such a
case, and the logical place to compensate for that case would be
inside compile_fixed_regexp(), where we use regexp engine behind
user's back for our convenience, I would think.

In the current code, compile_fixed_regexp() is only called when we
want ICASE, but hardcoding that assumption to it unnecessarily robs
flexibility (and the function name does not tell us it is only for
icase in the first place), so I taught it to do the REG_ICASE thing
only when opt->ignore_case is set.

How does this look?


 grep.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 92587a8..3a3a9f4 100644
--- a/grep.c
+++ b/grep.c
@@ -407,17 +407,21 @@ static int is_fixed(const char *s, size_t len)
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
+	int regflags;
 
 	basic_regex_quote_buf(&sb, p->pattern);
-	err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
+	regflags = opt->regflags & ~REG_EXTENDED;
+	if (opt->ignore_case)
+		regflags |= REG_ICASE;
+	err = regcomp(&p->regexp, sb.buf, regflags);
 	if (opt->debug)
 		fprintf(stderr, "fixed %s\n", sb.buf);
 	strbuf_release(&sb);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, sizeof(errbuf));
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }
@@ -425,38 +429,55 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int icase, ascii_only;
 	int err;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
+	/*
+	 * Even when -F (fixed) asks us to do a non-regexp search, we
+	 * may not be able to correctly case-fold when -i
+	 * (ignore-case) is asked (in which case, we'll synthesize a
+	 * regexp to match the pattern that matches regexp special
+	 * characters literally, while ignoring case differences).  On
+	 * the other hand, even without -F, if the pattern does not
+	 * have any regexp special characters and there is no need for
+	 * case-folding search, we can internally turn it into a
+	 * simple string match using kws.  p->fixed tells us if we
+	 * want to use kws.
+	 */
 	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
 	else
 		p->fixed = 0;
 
 	if (p->fixed) {
 		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
 	} else if (opt->fixed) {
+		/*
+		 * We only come here when the pattern has the regexp
+		 * special characters in it, which need to be matched
+		 * literally, while ignoring case.
+		 */
 		compile_fixed_regexp(p, opt);
 		return;
 	}
 
 	if (opt->pcre) {
 		compile_pcre_regexp(p, opt);
 		return;
 	}
 
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, 1024);
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]