On Wed 26-04-23 13:10:54, Marcelo Tosatti wrote: [...] > "To test the performance difference, a page allocator microbenchmark: > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench01.c > with loops=1000000 was used, on Intel Core i7-11850H @ 2.50GHz. > > For the single_page_alloc_free test, which does > > /** Loop to measure **/ > for (i = 0; i < rec->loops; i++) { > my_page = alloc_page(gfp_mask); > if (unlikely(my_page == NULL)) > return 0; > __free_page(my_page); > } > > Unit is cycles. > > Vanilla Patched Diff > 115.25 117 1.4%" > > To be honest, that 1.4% difference was not stable but fluctuated between > positive and negative percentages (so the performance difference was in > the noise). > > So performance is not a decisive factor in this case. It is not neglible considering that majority worklods will not benefit from this change. You are clearly ignoring that vmstat code has been highly optimized for local per-cpu access exactly to avoid locked operations and cache line bouncing. -- Michal Hocko SUSE Labs