On Tue, May 9, 2017 at 1:32 AM, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, May 08, 2017 at 04:10:41PM +0900, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: >> >> > This won't be in my next PCRE submission, but I have a path locally to >> > simply import PCRE into git.git as compat/pcre2, so it can be compiled >> > with NO_PCRE=Y similar to how NO_REGEX=Y works. >> > >> > This will hopefully address your concerns partially, i.e. when you do >> > want to try it out it'll be easier. >> >> Eek, please don't. >> >> Until pcre2 becomes _so_ stable that all reasonable distros give >> choice to the end-users to install it easily in a packaged form, >> such a "not a fork, but a copy" will be a moving target that I'd >> rather not to have in compat/. IOW, our compat/$pkg should be a >> last resort to help those on distros that are so hard to convince to >> carry the version/variant of $pkg we would like to use. The reason I'm looking into this is because the WIP part of my PCRE branch has changes which start to use PCRE as a general matching engine in git, even. I.e.: * git grep -F will be powered by it rather than kwset (which'll be git rm'd) * Long standing limitations with \0s in regexes go away. * grep -G and -E will use PCRE via a WIP POSIX -> PCRE pattern translator (https://bugs.exim.org/show_bug.cgi?id=2106) * Perhaps we can remove compat/regex/ entirely & use PCRE via its POSIX emulation mode + pattern translator (we use regcomp/regexec a lot for non-grep/log), I'm not sure yet. I have messy but working code for this in a WIP branch, here's the performance improvement against linux.git: Test v2.13.0-rc2 HEAD --------------------------------------------------------------------------------------- 7820.1: basic grep how.to 0.31(1.20+0.46) 0.19(0.33+0.55) -38.7% 7820.2: extended grep how.to 0.31(1.19+0.46) 0.19(0.33+0.55) -38.7% 7820.3: perl grep how.to 0.30(1.12+0.46) 0.19(0.28+0.62) -36.7% 7820.5: basic grep ^how to 0.31(1.24+0.39) 0.19(0.32+0.56) -38.7% 7820.6: extended grep ^how to 0.30(1.18+0.44) 0.19(0.22+0.66) -36.7% 7820.7: perl grep ^how to 0.55(2.68+0.41) 0.19(0.32+0.56) -65.5% 7820.9: basic grep [how] to 0.47(2.17+0.44) 0.22(0.39+0.54) -53.2% 7820.10: extended grep [how] to 0.47(2.21+0.40) 0.22(0.39+0.55) -53.2% 7820.11: perl grep [how] to 0.53(2.64+0.39) 0.21(0.37+0.58) -60.4% 7820.13: basic grep \(e.t[^ ]*\|v.ry\) rare 0.63(3.16+0.48) 0.21(0.48+0.53) -66.7% 7820.14: extended grep (e.t[^ ]*|v.ry) rare 0.64(3.28+0.38) 0.21(0.45+0.57) -67.2% 7820.15: perl grep (e.t[^ ]*|v.ry) rare 1.00(5.77+0.37) 0.21(0.50+0.53) -79.0% 7820.17: basic grep m\(ú\|u\)ult.b\(æ\|y\)te 0.31(1.23+0.51) 0.19(0.30+0.58) -38.7% 7820.18: extended grep m(ú|u)ult.b(æ|y)te 0.32(1.26+0.47) 0.19(0.27+0.61) -40.6% 7820.19: perl grep m(ú|u)ult.b(æ|y)te 0.36(1.61+0.37) 0.19(0.30+0.57) -47.2% 7821.1: fixed grep int 0.52(1.71+0.64) 0.43(1.11+0.72) -17.3% 7821.2: basic grep int 0.54(1.62+0.70) 0.42(1.14+0.62) -22.2% 7821.3: extended grep int 0.53(1.67+0.64) 0.51(1.17+0.62) -3.8% 7821.4: perl grep int 0.53(1.71+0.59) 0.72(1.13+0.63) +35.8% 7821.6: fixed grep -i int 0.58(1.86+0.67) 0.47(1.32+0.62) -19.0% 7821.7: basic grep -i int 0.62(1.94+0.61) 0.57(1.25+0.72) -8.1% 7821.8: extended grep -i int 0.82(1.86+0.68) 0.50(1.41+0.56) -39.0% 7821.9: perl grep -i int 0.70(1.88+0.68) 0.56(1.25+0.70) -20.0% 7821.11: fixed grep æ 0.33(1.30+0.43) 0.19(0.22+0.64) -42.4% 7821.12: basic grep æ 0.33(1.35+0.38) 0.18(0.26+0.59) -45.5% 7821.13: extended grep æ 0.33(1.20+0.52) 0.18(0.32+0.53) -45.5% 7821.14: perl grep æ 0.33(1.31+0.40) 0.18(0.28+0.56) -45.5% 7821.16: fixed grep -i æ 0.25(0.87+0.50) 0.18(0.24+0.60) -28.0% 7821.17: basic grep -i æ 0.26(0.88+0.48) 0.18(0.24+0.60) -30.8% 7821.18: extended grep -i æ 0.26(0.92+0.44) 0.18(0.24+0.61) -30.8% 7821.19: perl grep -i æ 0.25(0.79+0.45) 0.19(0.32+0.56) -24.0% In case that comes out misformatted it's also available at https://github.com/avar/git/commit/ee5b2040e2c697e22a73c7b5f07f1b1e591f07e3 I.e. grepping is sped up by 50% and more in many cases, even for -G and -E patterns which now get translated internally into PCRE patterns. > PCRE and PCRE2 also tend to have a lot of security updates, so I would > prefer if we didn't import them into the tree. It is far better for > users to use their distro's packages for PCRE, as it means they get > automatic security updates even if they're using an old Git. > > We shouldn't consider shipping anything with a remotely frequent history > of security updates in our tree, since people very frequently run old or > ancient versions of Git. I'm aware of its security record[1], but I wonder what threat model you have in mind here. I'm not aware of any parts of git (except maybe gitweb?) where we take regexes from untrusted sources. I.e. yes there have been DoS's & even some overflow bugs leading code execution in PCRE, but in the context of powering git-grep & git-log with PCRE this falls into the "stop hitting yourself" category. 1. https://www.cvedetails.com/vendor/3265/Pcre.html