On Sun, Jan 23, 2022 at 06:43:06PM +1300, Barry Song wrote: > On Wed, Jan 5, 2022 at 7:17 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: <snipped> > > Large-scale deployments > > ----------------------- > > We've rolled out MGLRU to tens of millions of Chrome OS users and > > about a million Android users. Google's fleetwide profiling [13] shows > > an overall 40% decrease in kswapd CPU usage, in addition to > > Hi Yu, > > Was the overall 40% decrease of kswap CPU usgae seen on x86 or arm64? > And I am curious how much we are taking advantage of NONLEAF_PMD_YOUNG. > Does it help a lot in decreasing the cpu usage? Hi Barry, The fleet-wide profiling data I shared was from x86. For arm64, I only have data from synthetic benchmarks at the moment, and it also shows similar improvements. For Chrome OS (individual users), walk_pte_range(), the function that would benefit from ARCH_HAS_NONLEAF_PMD_YOUNG, only uses a small portion (<4%) of kswapd CPU time. So ARCH_HAS_NONLEAF_PMD_YOUNG isn't that helpful. > If so, this might be > a good proof that arm64 also needs this hardware feature? > In short, I am curious how much the improvement in this patchset depends > on the hardware ability of NONLEAF_PMD_YOUNG. For data centers, I do think ARCH_HAS_NONLEAF_PMD_YOUNG has some value. In addition to cold/hot memory scanning, there are other use cases like dirty tracking, which can benefit from the accessed bit on non-leaf entries. I know some proprietary software uses this capability on x86 for different purposes than this patchset does. And AFAIK, x86 is the only arch that supports this capability, e.g., risc-v and ppc can only set the accessed bit in PTEs. In fact, I've discussed this with one of the arm maintainers Will. So please check with him too if you are interested in moving forward with the idea. I might be able to provide with additional data if you need it to make a decision. Thanks.