Am 17.10.21 um 08:00 schrieb Junio C Hamano: > René Scharfe <l.s.r@xxxxxx> writes: > >>>> Literal patterns are those that don't use any wildcards or case-folding. >>>> If the text is encoded in UTF-8 then we enable PCRE2_UTF either if the >>>> pattern only consists of ASCII characters, or if the pattern is encoded >>>> in UTF-8 and is not just a literal pattern. >>>> >>>> Hmm. Why enable PCRE2_UTF for literal patterns that consist of only >>>> ASCII chars? >>>> ... >>> echo 'René Scharfe' >f && >>> $ git -P grep --no-index -P '^(?:You are (?:wrong|correct), )?Ren. S' f; echo $? >>> 1 >>> $ git -P grep --no-index -P '^(?:You are (?:wrong|correct), )?R[eé]n. S' f; echo $? >>> f:René Scharfe >>> 0 >>> >>> So it's a choose-your-own adventure where you can pick if you're >>> right. I.e. do you want the "." metacharacter to match your "é" or not? >> >> Yes, I do, and it's what Hamza's patch is fixing. > > That may be correct but is this discussion still about "Why enable > ... for literal patterns that consist of only ASCII"? Calling "." a > "metacharacter" and wanting it to match anything other than a single > dot would mean the pattern we are discussing is no longer "literal", > isn't it? I am puzzled. Right, Ævar's comment is not about my question, but highlights an inconsistency in master that is fixed by Hamza's patch. I'll repeat and extend my question: Hamza's patch enables PCRE2_UTF for non-ASCII patterns even if they are literal or our locale is not UTF-8. The following change would fix the edge case mentioned in its commit message without these side-effects. Am I correct? diff --git a/grep.c b/grep.c index fe847a0111..5badb6d851 100644 --- a/grep.c +++ b/grep.c @@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt } options |= PCRE2_CASELESS; } - if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) && + if (!opt->ignore_locale && is_utf8_locale() && !(!opt->ignore_case && (p->fixed || p->is_fixed))) options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);