Re: [PATCH 1/2] grep/pcre2: limit the instances in which UTF mode is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 18, 2021 at 12:42 AM Hamza Mahfooz
<someguy@xxxxxxxxxxxxxxxxxxx> wrote:
>
> UTF mode is enabled for cases that cause older versions of PCRE to break.

Not really; what is broken is our implementation of how PCRE gets
called and that ignores the fact that giving it invalid UTF-8 (which
might be valid LATIN-1 text for example) and telling it to do a match
using UTF, will fail (if we are lucky even with an error) or might
even crash (and obviously don't match) if we also tell it to not do
the validation, and which is something we do when JIT is enabled.

> This is primarily due to the fact that we can't make as many assumptions on
> the kind of data that is fed to "git grep." So, limit when UTF mode can be
> enabled by introducing "is_log" to struct grep_opt, checking to see if it's
> a non-zero value in compile_pcre2_pattern() and only mutating it in
> cmd_log() so that we know "git log" was invoked if it's set to a non-zero
> value.

I haven't tested it, but I think that for this to work with the log,
we also need to make sure that all log entries that might not be UTF-8
get first iconv() which is why probably Æevar mentioned[1]
i18n.commitEncoding in his old email.

Of course doing that translation only makes sense if the log output is
meant to be UTF-8 which is why there is all that logic about being in
an UTF-8 locale or not which probably needs to be adjusted as well.

Carlo

[1] https://lore.kernel.org/git/87v92bju64.fsf@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux