Re: [PATCH v11 3/3] grep: fix an edge case concerning ascii patterns and UTF-8 data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hamza Mahfooz <someguy@xxxxxxxxxxxxxxxxxxx> writes:

> If we attempt to grep non-ascii log message text with an ascii pattern, we
> run into the following issue:
>
>     $ git log --color --author='.var.*Bjar' -1 origin/master | grep ^Author
>     grep: (standard input): binary file matches
>
> So, to fix this teach the grep code to mark the pattern as UTF-8 (even if
> the pattern is composed of only ascii characters), so long as the log
> output is encoded using UTF-8.

We'd need this only if we are using pcre2 backend, no?  If that is
the case, that fact needs to be recorded in the proposed log message
to help later developers, when they wonder why this "all-the-things"
knob exists.

And if it is the case that this bit is needed only to work around a
glitch while using pcre2 backend, I'd rather want to see a solution
that does not need to contaminate the more generic "struct grep_opt"
data and "setup_revisions()" codepath.

In other words, can't the function compile_pcre2_pattern() make the
"is log output encoding utf8?" decision locally and act accordingly?

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux