On Thu, Jun 13 2019, Johannes Schindelin via GitGitGadget wrote: > The kwset functionality makes use of the obstack code, which expects to > be handed a function that can allocate large chunks of data. It expects > that function to accept a `size` parameter of type `long`. > > This upsets GCC 8 on Windows, because `long` does not have the same > bit size as `size_t` there. > > Now, the proper thing to do would be to switch to `size_t`. But this > would make us deviate from the "upstream" code even further, making it > hard to synchronize with newer versions, and also it would be quite > involved because that `long` type is so invasive in that code. > > Let's punt, and instead provide a super small wrapper around > `xmalloc()`. I have a WIP patches from 2017 that do $subject that I can dig up, but thought I'd gauge interest first. Right now the grep code & pickaxe machinery will detect fixed strings and use kwset() as an optimization. Back when kwset was added in 9eceddeec6 ("Use kwset in grep", 2011-08-21) this helped, but now doing this for grep with a PCRE pattern is actually counterproductive for performance. On top of current `master`: @@ -368 +368 @@ static int is_fixed(const char *s, size_t len) - return 1; + return 0; And running p7821-grep-engines-fixed.sh[1] (which is in git.git, and is as far as I got with this) we get: Test HEAD~ HEAD ------------------------------------------------------------------------- 7821.1: fixed grep int 0.48(1.59+0.63) 0.48(1.53+0.68) +0.0% 7821.2: basic grep int 0.55(1.64+0.51) 0.72(2.97+0.54) +30.9% 7821.3: extended grep int 0.65(1.63+0.54) 0.77(2.92+0.60) +18.5% 7821.4: perl grep int 1.01(1.62+0.55) 0.36(0.97+0.58) -64.4% 7821.6: fixed grep uncommon 0.18(0.51+0.45) 0.18(0.51+0.46) +0.0% 7821.7: basic grep uncommon 0.18(0.50+0.46) 0.30(1.36+0.33) +66.7% 7821.8: extended grep uncommon 0.18(0.45+0.52) 0.28(1.37+0.37) +55.6% 7821.9: perl grep uncommon 0.18(0.52+0.45) 0.16(0.28+0.54) -11.1% 7821.11: fixed grep æ 0.31(1.28+0.39) 0.31(1.24+0.43) +0.0% 7821.12: basic grep æ 0.30(1.29+0.38) 0.22(0.85+0.36) -26.7% 7821.13: extended grep æ 0.30(1.26+0.40) 0.22(0.78+0.45) -26.7% 7821.14: perl grep æ 0.30(1.33+0.34) 0.16(0.25+0.56) -46.7% So what this means on my Debian box is that when we use PCRE with JIT and just get out of its way and let it do its own fixed string matching it's up to ~65% faster than the kwset() path. The usual case of just feeding the fixed pattern to glibc's regex function is slower, although as seen there when you grep for a rarely-occurring non-ASCII string glibc does better now (the perils of using ancient last-version-to-use-GPLv2 snapshots...). So my plan for this (which I partially implemented) was to have a series where if we have a fixed string and have PCRE v2 we'd use it instead of kwset() for fixed strings. It seems most packagers build with PCRE v2 now (CC: git-packagers@). I.e. we'd keep something like compile_fixed_regexp() (and as an aside just use PCRE's \Q...\E instead of our own escaping). We'd have performance regression for platforms that use kwset() now but don't build pcre2, or where pcre2 jit doesn't work. Does anyone care? This would allow us to just "git rm" kwset.[ch] compat/obstack.[ch], which is ~2k lines of tricky code, 1/2 of which we're currently doomed to maintain a bitrotting version of due to license incompatibilities with upstream[2]. As an aside there's other code in grep.c that we could similarly remove in favor of optimistic PCRE v2 use, e.g. the -w case can be replaced by \b<word>\b, but I found that less promising[3]. We can also get a huge performance win for BRE and ERE patterns by using PCRE v2 with a translation layer for those under the hood[4], but various solvable backwards compatible headaches[5] related to that are why I got lost in the weeds back in 2017 and didn't finish this. But just the s/kwset/pcre2/ case is easy enough. 1. Via: GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7821_GREP_OPTS='' GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE2=YesPlease' ./run HEAD~ HEAD -- p7821-grep-engines-fixed.sh 2. https://public-inbox.org/git/87wohn95vb.fsf@xxxxxxxxxxxxxxxxxxx/ 3. https://github.com/avar/git/commit/49ca92e799 4. https://github.com/avar/git/commit/a3cc090344 5. https://github.com/avar/git/commit/7dd367eb37