On Mon, 11 Nov 2024 12:25:35 +0000 Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > Hi Petr, > > On 11/11/2024 12:14, Petr Tesarik wrote: > > Hi Ryan, > > > > On Thu, 17 Oct 2024 13:32:43 +0100 > > Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >[...] > > Third, a few micro-benchmarks saw a significant regression. > > > > Most notably, getenv and getenvT2 tests from libMicro were 18% and 20% > > slower with variable page size. I don't know why, but I'm looking into > > it. The system() library call was also about 18% slower, but that might > > be related. > > OK, ouch. I think there are some things we can try to optimize the > implementation further. But I'll wait for your analysis before digging myself. This turned out to be a false positive. The way this microbenchmark was invoked did not get enough samples, so it was mostly dependent on whether caches were hot or cold, and the timing on this specific system with the specific sequence of bencnmarks in the suite happens to favour my baseline kernel. After increasing the batch count, I'm getting pretty much the same performance for 6.11 vanilla and patched kernels: prc thr usecs/call samples errors cnt/samp getenv (baseline) 1 1 0.14975 99 0 100000 getenv (patched) 1 1 0.14981 92 0 100000 > You probably also saw the conversation with Catalin about the cost vs benefit of > this series. Performance regressions will all need to be considered in the cost > column, of course. So understanding the root cause and trying to reduce the > regression as much as possible will increase chances of getting it accepted > upstream. Yes. Now that the biggest number is off the table, I'm going to: - look into the dup() slowdown - verify whether VMA split/merge operations are indeed slower Petr T