On Thu, Apr 27, 2023 at 10:39:29AM +0200, Michal Hocko wrote: > On Wed 26-04-23 13:10:54, Marcelo Tosatti wrote: > [...] > > "To test the performance difference, a page allocator microbenchmark: > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench01.c > > with loops=1000000 was used, on Intel Core i7-11850H @ 2.50GHz. > > > > For the single_page_alloc_free test, which does > > > > /** Loop to measure **/ > > for (i = 0; i < rec->loops; i++) { > > my_page = alloc_page(gfp_mask); > > if (unlikely(my_page == NULL)) > > return 0; > > __free_page(my_page); > > } > > > > Unit is cycles. > > > > Vanilla Patched Diff > > 115.25 117 1.4%" > > > > To be honest, that 1.4% difference was not stable but fluctuated between > > positive and negative percentages (so the performance difference was in > > the noise). > > > > So performance is not a decisive factor in this case. > > It is not neglible considering that majority worklods will not benefit > from this change. You are clearly ignoring that vmstat code has been > highly optimized for local per-cpu access exactly to avoid locked > operations and cache line bouncing. > -- > Michal Hocko > SUSE Labs Again, the values fluctuate between positive and negative performance difference (i happen to have copied a positive value). So the performance difference is in the noise (its not stable at 1.4%), but rather close to 0%. So the data is showing that there is no negative performance impact.