Am 08.08.19 um 04:35 schrieb Carlo Arenas: > On Wed, Aug 7, 2019 at 6:03 AM René Scharfe <l.s.r@xxxxxx> wrote: >> >> Am 07.08.19 um 11:49 schrieb Carlo Arenas: >>> was hoping will perform better but it seems that testing can be done >>> only in windows >> >> nedmalloc works on other platforms as well. > > I meant[1] it works reliably enough to be useful for performance testing. You mentioned being concerned about performance several times and I wondered why each time. I'd expect no measurable difference between using a custom global context and the internal one of PCRE2 -- setting two function pointers surely can't take very long, can it? But measuring is better than guessing, of course. > goes without saying that the fact that I am using a virtualbox with 2 > CPUs running Debian 10 on top of macOS (a macbook pro with 4 cores) > and the test uses by default 8 threads, doesn't help, nedmalloc is supposed to run on macOS as well. > but to share my > pain here is the result of running p7820 with my last reroll on top of > pu, comparing a build of the same code without NED (this tree) with > one with it (HEAD) > > Test this tree > HEAD > ------------------------------------------------------------------------------------------- > 7820.1: basic grep -i 'how.to' 0.89(1.12+0.46) > 0.95(1.23+0.49) +6.7% > 7820.2: extended grep -i 'how.to' 0.90(1.12+0.49) > 0.92(1.19+0.46) +2.2% > 7820.3: perl grep -i 'how.to' 0.54(0.30+0.52) > 0.53(0.39+0.52) -1.9% > 7820.5: basic grep -i '^how to' 0.89(1.13+0.47) > 0.91(1.13+0.49) +2.2% > 7820.6: extended grep -i '^how to' 0.84(1.04+0.49) > 0.94(1.27+0.47) +11.9% > 7820.7: perl grep -i '^how to' 0.49(0.34+0.47) > 0.51(0.36+0.49) +4.1% > 7820.9: basic grep -i '[how] to' 1.51(2.31+0.51) > 1.55(2.38+0.51) +2.6% > 7820.10: extended grep -i '[how] to' 1.50(2.20+0.59) > 1.56(2.30+0.62) +4.0% > 7820.11: perl grep -i '[how] to' 0.67(0.50+0.52) > 0.62(0.50+0.55) -7.5% > 7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare' 2.58(4.39+0.56) > 2.64(4.45+0.60) +2.3% > 7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare' 2.60(4.41+0.56) > 2.66(4.58+0.56) +2.3% > 7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare' 1.17(1.66+0.53) > 1.23(1.84+0.45) +5.1% > 7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te' 1.12(1.54+0.51) > 1.14(1.70+0.44) +1.8% > 7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te' 1.09(1.54+0.48) > 1.14(1.62+0.49) +4.6% > 7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te' 0.87(1.09+0.46) > 0.90(1.20+0.43) +3.4% > > and here one comparing two builds (both with NED) > > Test origin/pu > HEAD > ------------------------------------------------------------------------------------------- > 7820.1: basic grep -i 'how.to' 1.00(1.24+0.55) > 0.94(1.19+0.52) -6.0% > 7820.2: extended grep -i 'how.to' 0.90(1.15+0.49) > 0.93(1.23+0.44) +3.3% > 7820.3: perl grep -i 'how.to' 0.52(0.37+0.51) > 0.59(0.34+0.53) +13.5% > 7820.5: basic grep -i '^how to' 0.89(1.16+0.48) > 0.90(1.17+0.47) +1.1% > 7820.6: extended grep -i '^how to' 0.92(1.17+0.50) > 0.92(1.20+0.45) +0.0% > 7820.7: perl grep -i '^how to' 0.45(0.33+0.42) > 0.54(0.29+0.57) +20.0% > 7820.9: basic grep -i '[how] to' 1.60(2.46+0.52) > 1.61(2.39+0.62) +0.6% > 7820.10: extended grep -i '[how] to' 1.71(2.67+0.56) > 1.57(2.41+0.54) -8.2% > 7820.11: perl grep -i '[how] to' 0.66(0.61+0.51) > 0.59(0.44+0.51) -10.6% > 7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare' 2.69(4.49+0.66) > 2.67(4.49+0.60) -0.7% > 7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare' 2.67(4.49+0.64) > 2.64(4.54+0.54) -1.1% > 7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare' 1.23(1.80+0.47) > 1.25(1.89+0.46) +1.6% > 7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te' 1.13(1.64+0.47) > 1.14(1.64+0.48) +0.9% > 7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te' 1.16(1.68+0.46) > 1.20(1.60+0.60) +3.4% > 7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te' 0.90(1.16+0.48) > 0.88(1.17+0.45) -2.2% > > with the only relevant line (for my code) being 7820.19 where it would > seem it performs almost the same (eventhough just adding NED made it > initially worst) > > note though that the fact there are 20% swings in parts of the code > that hasn't changed > or that where explicitly #ifdef out of my code changes doesn't give me > much confidence, but since the windows guys seem to be using NED by > default, I am hoping it works better there. These measurement results are quite noisy, so I wouldn't trust them too much. nedmalloc being slower than the one from a recent glibc version is not very surprising given this statement from its home page, https://www.nedprod.com/programs/portable/nedmalloc/: "Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results" In particular I don't think that these results justify coupling the use of nedmalloc to the choice of using a custom global context for PCRE2. I'd expect: - Without USE_NED_ALLOCATOR: xmalloc() should be used for all allocations, including for PCRE2. Some special exceptions use malloc(3) directly, but for most uses we want the consistent out-of-memory handling that xmalloc() brings. - With USE_NED_ALLOCATOR: malloc() and xmalloc() use nedmalloc behind the scenes and free() is similarly overridden, so all allocations are affected. - If USE_NED_ALLOCATOR performs worse than the system allocator on some system then it's the problem of those that turn on that flag. Makes sense? René