Re: [PATCH v3] grep: correctly identify utf-8 characters with \{b,w} in -P

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> To argue with myself here, I'm not so sure that just making this the
> default isn't the right move, especially as the GNU grep maintainer
> seems to be convinced that that's the right thing for grep(1).

OK.

> I think calling this e.g.:
>
> 	grep.perl.Unicode=<bool>
> 	grep.patternTypePerl.Unicode=<bool>
>
> Or even:
>
> 	grep.patternTypePerl.Flags=u
>
> Would be better, i.e. PCRE's C API is really just mapping to the flags
> you can find in "perldoc perlre" (https://perldoc.perl.org/perlre). In
> this case the /u flag maps to the "PCRE2_UCP" API flag.
>
> That we happen to use PCRE to give ourselves "Perl" semantics is an
> implementation detail we should avoid exposing, so we could either give
> our config generic names, or literally map to the perl /flags/.
>
> For now we could just die on any "Flags" value that isn't "u".
>
> Of course all of this is predicated on us wanting to leave this as an
> opt-in, which I'm not so sure about. If it's opt-out we'll avoid this
> entire question,

Making it opt-out would also require a similar knob to turn the
"flag" off, be it a configuration variable or a command line option,
wouldn't it?  I tend to agree with you that it makes sense to make
it a goal to take us closer to "grep -P" from GNU---do they have
such an opt-out knob?  If not, let's make it simple by turning it
always on, which would be the simplest ;-)

Again, thanks for a careful review with concrete points.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux