On Tue, Dec 06, 2022 at 10:43:05AM +0100, Jesper Dangaard Brouer wrote: > > > On 05/12/2022 17.31, Matthew Wilcox wrote: > > On Mon, Dec 05, 2022 at 04:34:10PM +0100, Jesper Dangaard Brouer wrote: > > > I have a micro-benchmark [1][2], that I want to run on this patchset. > > > Reducing the asm code 'text' size is less likely to improve a > > > microbenchmark. The 100Gbit mlx5 driver uses page_pool, so perhaps I can > > > run a packet benchmark that can show the (expected) performance improvement. > > > > > > [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c > > > [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c > > > > Appreciate it! I'm not expecting any performance change outside noise, > > but things do surprise me. I'd appreciate it if you'd test with a > > "distro" config, ie enabling CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP so > > we show the most expensive case. > > > > I have CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y BUT it isn't default > runtime enabled. That's fine. I think the vast majority of machines won't actually have it enabled. It's mostly useful for hosting setups where allocating 1GB pages for VMs is common. The mlx5 driver was straightforward, but showed some gaps in the API. You'd already got the majority of the wins by using page_ref_inc() instead of get_page(), but I did find one put_page() ;-)