Re: [PATCH v13 3/3] grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hamza Mahfooz <someguy@xxxxxxxxxxxxxxxxxxx> writes:

> If we attempt to grep non-ascii log message text with an ascii pattern, we

"with an ascii pattern, when Git is built with and told to use pcre2, we"

> run into the following issue:
>
>     $ git log --color --author='.var.*Bjar' -1 origin/master | grep ^Author
>     grep: (standard input): binary file matches
>
> So, to fix this teach the grep code to use PCRE2_UTF, as long as the log
> output is encoded in UTF-8.

> -	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
> -	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
> +	if ((!opt->ignore_locale && !has_non_ascii(p->pattern)) ||
> +	    (!opt->ignore_locale && is_utf8_locale() &&
> +	     has_non_ascii(p->pattern) && !(!opt->ignore_case &&
> +					    (p->fixed || p->is_fixed))))

That's a mouthful.  It is not obvious what new condition is being
added.  I had to flip the order to see the only difference is, that

> -	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
> -	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
> +	if ((!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
> +	    !(!opt->ignore_case && (p->fixed || p->is_fixed))) ||
> +	    (!opt->ignore_locale && !has_non_ascii(p->pattern)))

... in addition to the case where the original condition holds, if
we do not say "ignore locale" and the pattern is ascii-only, we
apply these two option flags.  And that matches what the proposed
log message explained as the condition the problem appears.

So,... looks good, I guess.

Thanks, will queue.


Addendum.

If we were reordering pieces in the condition, I wonder if there is
a better way to reorganize it, though.  The original is already
barely explainable with words, and with this new condition added, I
am not sure if anybody can phrase the condition in simple words to
others after staring it for a few minutes.  I can't.

But straightening it out is best left as a future clean-up patch,
separate from this series.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux