On 12/11/2024 09:45, Petr Tesarik wrote: > On Mon, 11 Nov 2024 12:25:35 +0000 > Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> Hi Petr, >> >> On 11/11/2024 12:14, Petr Tesarik wrote: >>> Hi Ryan, >>> >>> On Thu, 17 Oct 2024 13:32:43 +0100 >>> Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> [...] >>> Third, a few micro-benchmarks saw a significant regression. >>> >>> Most notably, getenv and getenvT2 tests from libMicro were 18% and 20% >>> slower with variable page size. I don't know why, but I'm looking into >>> it. The system() library call was also about 18% slower, but that might >>> be related. >> >> OK, ouch. I think there are some things we can try to optimize the >> implementation further. But I'll wait for your analysis before digging myself. > > This turned out to be a false positive. The way this microbenchmark was > invoked did not get enough samples, so it was mostly dependent on > whether caches were hot or cold, and the timing on this specific system > with the specific sequence of bencnmarks in the suite happens to favour > my baseline kernel. > > After increasing the batch count, I'm getting pretty much the same > performance for 6.11 vanilla and patched kernels: > > prc thr usecs/call samples errors cnt/samp > getenv (baseline) 1 1 0.14975 99 0 100000 > getenv (patched) 1 1 0.14981 92 0 100000 Oh that's good news! Does this account for all 3 of the above tests (getenv, getenvT2 and system())? > >> You probably also saw the conversation with Catalin about the cost vs benefit of >> this series. Performance regressions will all need to be considered in the cost >> column, of course. So understanding the root cause and trying to reduce the >> regression as much as possible will increase chances of getting it accepted >> upstream. > > Yes. Now that the biggest number is off the table, I'm going to: > > - look into the dup() slowdown > - verify whether VMA split/merge operations are indeed slower > > Petr T