On Tue, Mar 23, 2021 at 05:00:08PM +0100, Jesper Dangaard Brouer wrote: > > + /* > > + * If there are no allowed local zones that meets the watermarks then > > + * try to allocate a single page and reclaim if necessary. > > + */ > > + if (!zone) > > + goto failed; > > + > > + /* Attempt the batch allocation */ > > + local_irq_save(flags); > > + pcp = &this_cpu_ptr(zone->pageset)->pcp; > > + pcp_list = &pcp->lists[ac.migratetype]; > > + > > + while (allocated < nr_pages) { > > + page = __rmqueue_pcplist(zone, ac.migratetype, alloc_flags, > > + pcp, pcp_list); > > The function __rmqueue_pcplist() is now used two places, this cause the > compiler to uninline the static function. > This was expected. It was not something I was particularly happy with but avoiding it was problematic without major refactoring. > My tests show you should inline __rmqueue_pcplist(). See patch I'm > using below signature, which also have some benchmark notes. (Please > squash it into your patch and drop these notes). > The cycle savings per element is very marginal at just 4 cycles. I expect just the silly stat updates are way more costly but the series that addresses that is likely to be controversial. As I know the cycle budget for processing a packet is tight, I've applied the patch but am keeping it separate to preserve the data in case someone points out that is a big function to inline and "fixes" it. -- Mel Gorman SUSE Labs