> On Wed, May 19, 2021 at 04:39:00PM +0200, Uladzislau Rezki wrote: > > > > + /* > > > > + * If not enough pages were obtained to accomplish an > > > > + * allocation request, free them via __vfree() if any. > > > > + */ > > > > + if (area->nr_pages != nr_small_pages) { > > > > + warn_alloc(gfp_mask, NULL, > > > > + "vmalloc size %lu allocation failure: " > > > > + "page order %u allocation failed", > > > > + area->nr_pages * PAGE_SIZE, page_order); > > > > + goto fail; > > > > + } > > > > > > From reading __alloc_pages_bulk not allocating all pages is something > > > that cn happen fairly easily. Shouldn't we try to allocate the missing > > > pages manually and/ore retry here? > > > > > > > It is a good point. The bulk-allocator, as i see, only tries to access > > to pcp-list and falls-back to a single allocator once it fails, so the > > array may not be fully populated. > > > > Partially correct. It does allocate via the pcp-list but the pcp-list will > be refilled if it's empty so if the bulk allocator returns fewer pages > than requested, it may be due to hitting watermarks or the local zone is > depleted. It does not take any special action to correct the situation > or stall e.g. wake kswapd, enter direct reclaim, allocate from a remote > node etc. > > If no pages were allocated, it'll try allocate at least one page via a > single allocation request in case the bulk failure would push the zone > over the watermark but 1 page does not. That path as a side-effect would > also wake kswapd. > OK. A single page allocator can enter a slow path i mean direct reclaim, etc to adjust watermarks. > > In that case probably it makes sense to manually populate it using > > single page allocator. > > > > Mel, could you please also comment on it? > > > > It is by design because it's unknown if callers can recover or if so, > how they want to recover and the primary intent behind the bulk allocator > was speed. In the case of network, it only wants some pages quickly so as > long as it gets 1, it makes progress. For the sunrpc user, it's willing > to wait and retry. For vmalloc, I'm unsure what a suitable recovery path > should be as I do not have a good handle on workloads that are sensitive > to vmalloc performance. The obvious option would be to loop and allocate > single pages with alloc_pages_node understanding that the additional > pages may take longer to allocate. > I got it. At least we should fall-back for a single allocator, that is how we used to allocate before(now it is for high-order pages). If it also fails to obtain a page we are done. Basically a single-page allocator is more permissive so it is a higher chance to success. Therefore a fallback to it makes sense. Thanks. -- Vlad Rezki