Re: [PATCH 2/3] mm/vmalloc: Switch to bulk allocator in __vmalloc_area_node()

Uladzislau Rezki <urezki@xxxxxxxxx> · Wed, 19 May 2021 21:52:14 +0200

> On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote:
> > > > +	/*
> > > > +	 * If not enough pages were obtained to accomplish an
> > > > +	 * allocation request, free them via __vfree() if any.
> > > > +	 */
> > > > +	if (area->nr_pages != nr_small_pages) {
> > > > +		warn_alloc(gfp_mask, NULL,
> > > > +			"vmalloc size %lu allocation failure: "
> > > > +			"page order %u allocation failed",
> > > > +			area->nr_pages * PAGE_SIZE, page_order);
> > > > +		goto fail;
> > > > +	}
> > > 
> > > From reading __alloc_pages_bulk not allocating all pages is something
> > > that cn happen fairly easily.  Shouldn't we try to allocate the missing
> > > pages manually and/ore retry here?
> > > 
> >
> > It is a good point. The bulk-allocator, as i see, only tries to access
> > to pcp-list and falls-back to a single allocator once it fails, so the
> > array may not be fully populated.
> > 
> 
> Partially correct. It does allocate via the pcp-list but the pcp-list will
> be refilled if it's empty so if the bulk allocator returns fewer pages
> than requested, it may be due to hitting watermarks or the local zone is
> depleted. It does not take any special action to correct the situation
> or stall e.g.  wake kswapd, enter direct reclaim, allocate from a remote
> node etc.
> 
> If no pages were allocated, it'll try allocate at least one page via a
> single allocation request in case the bulk failure would push the zone
> over the watermark but 1 page does not. That path as a side-effect would
> also wake kswapd.
> 
OK. A single page allocator can enter a slow path i mean direct reclaim,
etc to adjust watermarks.

> > In that case probably it makes sense to manually populate it using
> > single page allocator.
> > 
> > Mel, could you please also comment on it?
> > 
> 
> It is by design because it's unknown if callers can recover or if so,
> how they want to recover and the primary intent behind the bulk allocator
> was speed. In the case of network, it only wants some pages quickly so as
> long as it gets 1, it makes progress. For the sunrpc user, it's willing
> to wait and retry. For vmalloc, I'm unsure what a suitable recovery path
> should be as I do not have a good handle on workloads that are sensitive
> to vmalloc performance. The obvious option would be to loop and allocate
> single pages with alloc_pages_node understanding that the additional
> pages may take longer to allocate.
> 
I got it. At least we should fall-back for a single allocator, that is how
we used to allocate before(now it is for high-order pages). If it also fails
to obtain a page we are done.

Basically a single-page allocator is more permissive so it is a higher
chance to success. Therefore a fallback to it makes sense.

Thanks.

--
Vlad Rezki