Re: Can we just get rid of kwset & obstack in favor of optimistically using PCRE v2 JIT?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 15 2019, brian m. carlson wrote:

> On 2019-06-14 at 22:55:17, Ævar Arnfjörð Bjarmason wrote:
>> It seems most packagers build with PCRE v2 now (CC:
>> git-packagers@). I.e. we'd keep something like compile_fixed_regexp()
>> (and as an aside just use PCRE's \Q...\E instead of our own escaping).
>>
>> We'd have performance regression for platforms that use kwset() now but
>> don't build pcre2, or where pcre2 jit doesn't work. Does anyone care?
>
> I know that there are people shipping newer versions of Git using CentOS
> 6, which IIRC doesn't ship PCRE 2[0]. Since having to ship your own PCRE
> is a security maintenance nightmare, it's probably best to leave this at
> least compatible with non-PCRE 2 systems until November 2020. At that
> point, I'm happy to drop support for it.
>
> If it would work but just be slower with PCRE 1, I'm not too terribly
> concerned. Let that be an incentive to users to upgrade.

Not just PCRE, but if you don't have PCRE at all things would still work
perfectly fine.

I.e. all we're talking about is how to treat this internal
optimization. If we'd never imported kwset we'd still be perfectly
capable of searching for fixed strings with grep/pickaxe, it would have
just been slower.

So platforms that don't have PCRE at all would be slowed down by
something like what the benchmark in my
https://public-inbox.org/git/87v9x793qi.fsf@xxxxxxxxxxxxxxxxxxx/
upthread shows.

Or not, maybe their C library POSIX regcomp()/regexec() is faster.

Platforms that do have PCRE would be faster than they are now, and we
could stop shipping this kwset code.

The *only* case where what I've outlined above isn't true is cases where
the pattern being matched has a \0. See my 966be95549 ("grep: add tests
to fix blind spots with \0 patterns", 2017-05-20) for how that behaves.

There our current behavior is IMNSHO insane, and is certainly
undocumented and unreliable (i.e. it behaves differently if you
e.g. have non-ASCII along with \0 in the pattern, none of this is
documented).

Having poked a bit at that I think the only sane thing there is to just
outright die unless you have PCRE v2, which is the only backend we have
that has any hope of handling that sanely.

> Also, as Carlos pointed out, not all platforms will have the JIT support
> functional, such as OpenBSD, NetBSD, and PaX Linux systems. That may be
> more of a blocker than the CentOS issue, especially since people run PaX
> kernels with standard distros.
>
> [0] I'm not certain because CentOS 6 Docker images segfault on newer
> kernels and I'm too lazy to download a live CD image for testing.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux