On Tue, 12 Nov 2024 10:19:34 +0000 Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > On 12/11/2024 09:45, Petr Tesarik wrote: > > On Mon, 11 Nov 2024 12:25:35 +0000 > > Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > > >> Hi Petr, > >> > >> On 11/11/2024 12:14, Petr Tesarik wrote: > >>> Hi Ryan, > >>> > >>> On Thu, 17 Oct 2024 13:32:43 +0100 > >>> Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> [...] > >>> Third, a few micro-benchmarks saw a significant regression. > >>> > >>> Most notably, getenv and getenvT2 tests from libMicro were 18% and 20% > >>> slower with variable page size. I don't know why, but I'm looking into > >>> it. The system() library call was also about 18% slower, but that might > >>> be related. > >> > >> OK, ouch. I think there are some things we can try to optimize the > >> implementation further. But I'll wait for your analysis before digging myself. > > > > This turned out to be a false positive. The way this microbenchmark was > > invoked did not get enough samples, so it was mostly dependent on > > whether caches were hot or cold, and the timing on this specific system > > with the specific sequence of bencnmarks in the suite happens to favour > > my baseline kernel. > > > > After increasing the batch count, I'm getting pretty much the same > > performance for 6.11 vanilla and patched kernels: > > > > prc thr usecs/call samples errors cnt/samp > > getenv (baseline) 1 1 0.14975 99 0 100000 > > getenv (patched) 1 1 0.14981 92 0 100000 > > Oh that's good news! Does this account for all 3 of the above tests (getenv, > getenvT2 and system())? It does for getenvT2 (a variant of the test with 2 threads), but not for system. Thanks for asking, I forgot about that one. I'm getting substantial difference there (+29% on average over 100 runs): prc thr usecs/call samples errors cnt/samp command system (baseline) 1 1 6937.18016 102 0 100 A=$$ system (patched) 1 1 8959.48032 102 0 100 A=$$ So, yeah, this should in fact be my priority #1. The "system" benchmark measures the duration of system("A=$$"), which involves starting the system shell (in my case bash-4.4.23), so this is not really a microbenchmark. I hope perf can help match the difference to a kernel API. Petr T