Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 9, 2017 at 1:32 AM, brian m. carlson
<sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, May 08, 2017 at 04:10:41PM +0900, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>>
>> > This won't be in my next PCRE submission, but I have a path locally to
>> > simply import PCRE into git.git as compat/pcre2, so it can be compiled
>> > with NO_PCRE=Y similar to how NO_REGEX=Y works.
>> >
>> > This will hopefully address your concerns partially, i.e. when you do
>> > want to try it out it'll be easier.
>>
>> Eek, please don't.
>>
>> Until pcre2 becomes _so_ stable that all reasonable distros give
>> choice to the end-users to install it easily in a packaged form,
>> such a "not a fork, but a copy" will be a moving target that I'd
>> rather not to have in compat/.  IOW, our compat/$pkg should be a
>> last resort to help those on distros that are so hard to convince to
>> carry the version/variant of $pkg we would like to use.

The reason I'm looking into this is because the WIP part of my PCRE
branch has changes which start to use PCRE as a general matching
engine in git, even. I.e.:

* git grep -F will be powered by it rather than kwset (which'll be git rm'd)
* Long standing limitations with \0s in regexes go away.
* grep -G and -E will use PCRE via a WIP POSIX -> PCRE  pattern
translator (https://bugs.exim.org/show_bug.cgi?id=2106)
* Perhaps we can remove compat/regex/ entirely & use PCRE via its
POSIX emulation mode + pattern translator (we use regcomp/regexec a
lot for non-grep/log), I'm not sure yet.

I have messy but working code for this in a WIP branch, here's the
performance improvement against linux.git:

Test                                           v2.13.0-rc2       HEAD
---------------------------------------------------------------------------------------
7820.1: basic grep how.to                      0.31(1.20+0.46)
0.19(0.33+0.55) -38.7%
7820.2: extended grep how.to                   0.31(1.19+0.46)
0.19(0.33+0.55) -38.7%
7820.3: perl grep how.to                       0.30(1.12+0.46)
0.19(0.28+0.62) -36.7%
7820.5: basic grep ^how to                     0.31(1.24+0.39)
0.19(0.32+0.56) -38.7%
7820.6: extended grep ^how to                  0.30(1.18+0.44)
0.19(0.22+0.66) -36.7%
7820.7: perl grep ^how to                      0.55(2.68+0.41)
0.19(0.32+0.56) -65.5%
7820.9: basic grep [how] to                    0.47(2.17+0.44)
0.22(0.39+0.54) -53.2%
7820.10: extended grep [how] to                0.47(2.21+0.40)
0.22(0.39+0.55) -53.2%
7820.11: perl grep [how] to                    0.53(2.64+0.39)
0.21(0.37+0.58) -60.4%
7820.13: basic grep \(e.t[^ ]*\|v.ry\) rare    0.63(3.16+0.48)
0.21(0.48+0.53) -66.7%
7820.14: extended grep (e.t[^ ]*|v.ry) rare    0.64(3.28+0.38)
0.21(0.45+0.57) -67.2%
7820.15: perl grep (e.t[^ ]*|v.ry) rare        1.00(5.77+0.37)
0.21(0.50+0.53) -79.0%
7820.17: basic grep m\(ú\|u\)ult.b\(æ\|y\)te   0.31(1.23+0.51)
0.19(0.30+0.58) -38.7%
7820.18: extended grep m(ú|u)ult.b(æ|y)te      0.32(1.26+0.47)
0.19(0.27+0.61) -40.6%
7820.19: perl grep m(ú|u)ult.b(æ|y)te          0.36(1.61+0.37)
0.19(0.30+0.57) -47.2%
7821.1: fixed grep int                         0.52(1.71+0.64)
0.43(1.11+0.72) -17.3%
7821.2: basic grep int                         0.54(1.62+0.70)
0.42(1.14+0.62) -22.2%
7821.3: extended grep int                      0.53(1.67+0.64)
0.51(1.17+0.62) -3.8%
7821.4: perl grep int                          0.53(1.71+0.59)
0.72(1.13+0.63) +35.8%
7821.6: fixed grep -i int                      0.58(1.86+0.67)
0.47(1.32+0.62) -19.0%
7821.7: basic grep -i int                      0.62(1.94+0.61)
0.57(1.25+0.72) -8.1%
7821.8: extended grep -i int                   0.82(1.86+0.68)
0.50(1.41+0.56) -39.0%
7821.9: perl grep -i int                       0.70(1.88+0.68)
0.56(1.25+0.70) -20.0%
7821.11: fixed grep æ                          0.33(1.30+0.43)
0.19(0.22+0.64) -42.4%
7821.12: basic grep æ                          0.33(1.35+0.38)
0.18(0.26+0.59) -45.5%
7821.13: extended grep æ                       0.33(1.20+0.52)
0.18(0.32+0.53) -45.5%
7821.14: perl grep æ                           0.33(1.31+0.40)
0.18(0.28+0.56) -45.5%
7821.16: fixed grep -i æ                       0.25(0.87+0.50)
0.18(0.24+0.60) -28.0%
7821.17: basic grep -i æ                       0.26(0.88+0.48)
0.18(0.24+0.60) -30.8%
7821.18: extended grep -i æ                    0.26(0.92+0.44)
0.18(0.24+0.61) -30.8%
7821.19: perl grep -i æ                        0.25(0.79+0.45)
0.19(0.32+0.56) -24.0%

In case that comes out misformatted it's also available at
https://github.com/avar/git/commit/ee5b2040e2c697e22a73c7b5f07f1b1e591f07e3

I.e. grepping is sped up by 50% and more in many cases, even for -G
and -E patterns which now get translated internally into PCRE
patterns.

> PCRE and PCRE2 also tend to have a lot of security updates, so I would
> prefer if we didn't import them into the tree.  It is far better for
> users to use their distro's packages for PCRE, as it means they get
> automatic security updates even if they're using an old Git.
>
> We shouldn't consider shipping anything with a remotely frequent history
> of security updates in our tree, since people very frequently run old or
> ancient versions of Git.

I'm aware of its security record[1], but I wonder what threat model
you have in mind here. I'm not aware of any parts of git (except maybe
gitweb?) where we take regexes from untrusted sources.

I.e. yes there have been DoS's & even some overflow bugs leading code
execution in PCRE, but in the context of powering git-grep & git-log
with PCRE this falls into the "stop hitting yourself" category.

1. https://www.cvedetails.com/vendor/3265/Pcre.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]