The recent libpcre2 got me interested in seeing what the difference in v1 and v2 was. So I hacked up a *very basic* patch for libpcre2 that passes all tests, but obviously isn't ready for inclusion (I searched/replaced all the v1 usage with v2). I'm not even bothering sending this to the list since discussing the patch itself isn't the point: https://github.com/avar/git/commit/414647d88dd9c5 Before that patch, running a test[1] on linux.git where I grep the whole tree for a fixed string / simple regex with POSIX regexes & PCRE (all the greps match the same few lines) gives: s/iter rx prx fixed rx 2.20 -- -2% -34% prx 2.17 2% -- -32% fixed 1.46 51% 48% -- I.e. fixed string is fastest, and both POSIX regcomp() & pcre v1 are ~30% slower than that, with no difference in performance between the two. Now with my patch above with pcre v2 there's a notable performance difference: s/iter rx prx fixed rx 2.18 -- -16% -33% prx 1.84 19% -- -20% fixed 1.47 48% 25% -- We've gone from ~30% slower to ~20% slower for PCRE with v2. But now let's test that with this patch: https://github.com/avar/git/commit/4b7e5da3606c0b9b12025437de8005f5fa07ff54 That enables the new JIT support in pcre v2: s/iter rx fixed prx rx 2.19 -- -33% -44% fixed 1.47 49% -- -17% prx 1.22 79% 20% -- Now it's PCRE that's 20% faster than our currently fastest grep codepath that searches for a fixed string, and in absolute terms it's around 50% faster than the current PCRE implementation. This is on Debian testing with both PCRE libraries installed via packages, 8.35 & 10.22 for v1 and v2, respectively. Both are the second-latest[2] point releases for their respective versions. As far as turning this into a patch goes there's a few open questions: * PCRE itself supports linking to v1 and v2 in the same program just fine. Should we provide the possibility to link to both, or just make the user choose? If these performance numbers hold up preferring v2 is definitely better. * The JIT is supposedly a bit slower if you're not doing a lot of matching, although I doubt this matters in practice, but whether to use it & a few other options could be controlled by some config/CLI option. I think it probably makes sense just to always use it if it's there pending some cases where it makes performance worse in practice As an aside I started looking into this because I'm interested in eventually hacking up something that makes every user-facing regcomp()/regexec() we have now (e.g. log -G) accept PCRE as well. How do do this in all cases isn't very obvious, we could just have some global config option, but there's lots of stuff like e.g. "<rev>^{/<regex>} & "git show :/<regex>" that takes regexes, and e.g. how to pass a /i flag to some of these isn't obvious at all. The solution I'm leaning towards is to just make stuff like thath only work under PCRE, via the native (?<flags>:<pattern>) facility. E.g. this now works: git grep -P '(?xi: h e l l o)' 1. PF=~/g/git/ perl -MBenchmark=cmpthese -wE 'cmpthese(20, { fixed => sub { system "$ENV{PF}git grep -F avarasu >/dev/null" }, rx => sub { system "$ENV{PF}git grep avara?su >/dev/null" }, prx => sub { system "$ENV{PF}git grep -P avara?su >/dev/null" } })' 2. https://ftp.pcre.org/pub/pcre/