On 7/26/23 01:25, David Rientjes wrote: > On Tue, 18 Jul 2023, Julian Pidancet wrote: > >> Hi David, >> >> Many thanks for running all these tests. The amount of attention you've >> given this change is simply amazing. I wish I could have been able to >> assist you by doing more tests, but I've been lacking the necessary >> resources to do so. >> >> I'm as surprised as you are regarding the skylake regression. 20% is >> quite a large number, but perhaps it's less worrying than it looks given >> that benchmarks are usually very different from real-world workloads? >> > > I'm not an expert on context_switch1_per_thread_ops so I can't infere > which workloads would be most affected by such a regression other than to > point out that -18% is quite substantial. It might turn out that this regression is accidental in that merging happens to result in a better caching that benefits the particular skylake cache hierarchy (but not others), because the workload happens to use two different classes of objects that are compatible for merging, and uses them with identical lifetime. But that would be arguably still a corner case and not something to result in a hard go/no-go for the change, as similar corner cases would likely exist that would benefit from not merging. But it's possible the reason for the regression is something less expectable than the above hypotehsis, so indeed we should investigate first. > I'm still hoping to run some benchmarks with 64KB page sizes as Christoph > suggested, I should be able to do this with arm64. > > It's ceratinly good news that the overall memory footprint doesn't change > much with this change. > >> As Kees Cook was suggesting in his own reply, have you given a thought >> about including this change in -next and see if there are regressions >> showing up in CI performance tests results? >> > > I assume that anything we can run with CI performance tests can also be > run without merging into -next? > > The performance degradation is substantial for a microbenchmark, I'd like > to complete the picture on other benchmarks and do a complete analysis > with 64KB page sizes since I think the concern Christoph mentions could be > quite real. We just don't have the data yet to make an informed > assessment of it. Certainly would welcome any help that others would like > to provide for running benchmarks with this change as well :P > > Once we have a complete picture, we might also want to discuss what we are > hoping to achieve with such a change. I was very supportive of it prior > to the -18% benchmark result. But if most users are simply using whatever > their distro defaults to and other users may already be opting into this > either by the kernel command line or .config, it's hard to determine > exactly the set of users that would be affected by this change. Suddenly > causing a -18% regression overnight for this would be surprising for them. What I'd hope to achieve is that if we find out that the differences of merging/not-merging are negligible (modulo corner cases) for both performance and memory, we'd not only change the default, but even make merging more exceptional. It should still be done under SLUB_TINY, and maybe we can keep the slab_merge boot option, but that's it? Because in case they are comparable, not merging has indeed benefits - /proc/slabinfo accounting is not misleading, so in case a bug is reported, it's not neccessary to reboot with nomerge to get the real picture, then there are the security benefits mentioned etc.