On Wed, Mar 10, 2021 at 03:47:04PM -0800, Andrew Morton wrote: > On Wed, 10 Mar 2021 10:46:13 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > This series introduces a bulk order-0 page allocator with sunrpc and > > the network page pool being the first users. > > <scratches head> > > Right now, the [0/n] doesn't even tell us that it's a performance > patchset! > I'll add a note about this improving performance for users that operate on batches of patches and want to avoid multiple round-trips to the page allocator. > The whole point of this patchset appears to appear in the final paragraph > of the final patch's changelog. > I'll copy&paste that note to the introduction. It's likely that high-speed networking is the most relevant user in the short-term. > : For XDP-redirect workload with 100G mlx5 driver (that use page_pool) > : redirecting xdp_frame packets into a veth, that does XDP_PASS to create > : an SKB from the xdp_frame, which then cannot return the page to the > : page_pool. In this case, we saw[1] an improvement of 18.8% from using > : the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps). > > Much more detail on the overall objective and the observed results, > please? > I cannot generate that data right now so I need Jesper to comment on exactly why this is beneficial. For example, while I get that more data can be processed in a microbenchmark, I do not have a good handle on how much difference that makes to a practical application. About all I know is that this problem has been knocking around for 3-4 years at least. > Also, that workload looks awfully corner-casey. How beneficial is this > work for more general and widely-used operations? > At this point, probably nothing for most users because batch page allocation is not common. It's primarily why I avoided reworking the whole allocator just to make this a bit tidier. > > The implementation is not > > particularly efficient and the intention is to iron out what the semantics > > of the API should have for users. Once the semantics are ironed out, it can > > be made more efficient. > > And some guesstimates about how much benefit remains to be realized > would be helpful. > I don't have that information unfortunately. It's a chicken and egg problem because without the API, there is no point creating new users. For example, fault around or readahead could potentially batch pages but whether it is actually noticable when page zeroing has to happen is a completely different story. It's a similar story for SLUB, we know lower order allocations hurt some microbenchmarks like hackbench-sockets but have not quantified what happens if SLUB batch allocates pages when high-order allocations fail. -- Mel Gorman SUSE Labs