Re: [PATCH 0/5] Introduce a bulk order-0 page allocator with two in-tree users

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Thu, 11 Mar 2021 08:48:27 +0000

On Wed, Mar 10, 2021 at 03:47:04PM -0800, Andrew Morton wrote:
> On Wed, 10 Mar 2021 10:46:13 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > This series introduces a bulk order-0 page allocator with sunrpc and
> > the network page pool being the first users.
> 
> <scratches head>
> 
> Right now, the [0/n] doesn't even tell us that it's a performance
> patchset!
> 

I'll add a note about this improving performance for users that operate
on batches of patches and want to avoid multiple round-trips to the
page allocator.

> The whole point of this patchset appears to appear in the final paragraph
> of the final patch's changelog.
> 

I'll copy&paste that note to the introduction. It's likely that high-speed
networking is the most relevant user in the short-term.

> : For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
> : redirecting xdp_frame packets into a veth, that does XDP_PASS to create
> : an SKB from the xdp_frame, which then cannot return the page to the
> : page_pool.  In this case, we saw[1] an improvement of 18.8% from using
> : the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
> 
> Much more detail on the overall objective and the observed results,
> please?
> 

I cannot generate that data right now so I need Jesper to comment on
exactly why this is beneficial. For example, while I get that more data
can be processed in a microbenchmark, I do not have a good handle on how
much difference that makes to a practical application. About all I know
is that this problem has been knocking around for 3-4 years at least.

> Also, that workload looks awfully corner-casey.  How beneficial is this
> work for more general and widely-used operations?
> 

At this point, probably nothing for most users because batch page
allocation is not common. It's primarily why I avoided reworking the
whole allocator just to make this a bit tidier.

> > The implementation is not
> > particularly efficient and the intention is to iron out what the semantics
> > of the API should have for users. Once the semantics are ironed out, it can
> > be made more efficient.
> 
> And some guesstimates about how much benefit remains to be realized
> would be helpful.
> 

I don't have that information unfortunately. It's a chicken and egg
problem because without the API, there is no point creating new users.
For example, fault around or readahead could potentially batch pages
but whether it is actually noticable when page zeroing has to happen
is a completely different story. It's a similar story for SLUB, we know
lower order allocations hurt some microbenchmarks like hackbench-sockets
but have not quantified what happens if SLUB batch allocates pages when
high-order allocations fail.

-- 
Mel Gorman
SUSE Labs