On Mon, Feb 27, 2023 at 09:31:51PM +0800, Qi Zheng wrote: > > > On 2023/2/27 03:51, Andrew Morton wrote: > > On Sun, 26 Feb 2023 22:46:47 +0800 Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote: > > > Save the above script, then run test and touch commands. > > Then we can use the following perf command to view hotspots: > > perf top -U -F 999 > > 1) Before applying this patchset: > > 32.31% [kernel] [k] down_read_trylock > 19.40% [kernel] [k] pv_native_safe_halt > 16.24% [kernel] [k] up_read > 15.70% [kernel] [k] shrink_slab > 4.69% [kernel] [k] _find_next_bit > 2.62% [kernel] [k] shrink_node > 1.78% [kernel] [k] shrink_lruvec > 0.76% [kernel] [k] do_shrink_slab > > 2) After applying this patchset: > > 27.83% [kernel] [k] _find_next_bit > 16.97% [kernel] [k] shrink_slab > 15.82% [kernel] [k] pv_native_safe_halt > 9.58% [kernel] [k] shrink_node > 8.31% [kernel] [k] shrink_lruvec > 5.64% [kernel] [k] do_shrink_slab > 3.88% [kernel] [k] mem_cgroup_iter Not opposing the intention of the patchset in any way (I actually think it's a good idea to make the shrinkers list lockless), but looking at both outputs above I think that the main problem is not the contention on the semaphore, but the reason of this contention. It seems like often there is a long list of shrinkers which barely can reclaim any memory, but we're calling them again and again. In order to achieve real wins with real-life workloads, I guess it's what we should optimize. Thanks!