> On Mar 22, 2021, at 3:49 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Mar 22, 2021 at 06:25:03PM +0000, Chuck Lever III wrote: >> >> >>> On Mar 22, 2021, at 5:18 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: >>> >>> This series is based on top of Matthew Wilcox's series "Rationalise >>> __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to >>> test and are not using Andrew's tree as a baseline, I suggest using the >>> following git tree >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9 >>> >>> The users of the API have been dropped in this version as the callers >>> need to check whether they prefer an array or list interface (whether >>> preference is based on convenience or performance). >> >> I now have a consumer implementation that uses the array >> API. If I understand the contract correctly, the return >> value is the last array index that __alloc_pages_bulk() >> visits. My consumer uses the return value to determine >> if it needs to call the allocator again. >> > > For either arrays or lists, the return value is the number of valid > pages. For arrays, the pattern is expected to be > > nr_pages = alloc_pages_bulk(gfp, nr_requested, page_array); > for (i = 0; i < nr_pages; i++) { > do something with page_array[i] > } > > There *could* be populated valid elements on and after nr_pages but the > implementation did not visit those elements. The implementation can abort > early if the array looks like this > > PPP....PPP > > Where P is a page and . is NULL. The implementation would skip the > first three pages, allocate four pages and then abort when a new page > was encountered. This is an implementation detail around how I handled > prep_new_page. It could be addressed if many callers expect to pass in > an array that has holes in the middle. > >> It is returning some confusing (to me) results. I'd like >> to get these resolved before posting any benchmark >> results. >> >> 1. When it has visited every array element, it returns the >> same value as was passed in @nr_pages. That's the N + 1th >> array element, which shouldn't be touched. Should the >> allocator return nr_pages - 1 in the fully successful case? >> Or should the documentation describe the return value as >> "the number of elements visited" ? >> > > I phrased it as "the known number of populated elements in the > page_array". The comment you added states: + * For lists, nr_pages is the number of pages that should be allocated. + * + * For arrays, only NULL elements are populated with pages and nr_pages + * is the maximum number of pages that will be stored in the array. + * + * Returns the number of pages added to the page_list or the index of the + * last known populated element of page_array. > I did not want to write it as "the number of valid elements > in the array" because that is not necessarily the case if an array is > passed in with holes in the middle. I'm open to any suggestions on how > the __alloc_pages_bulk description can be improved. The comments states that, for the array case, a /count/ of pages is passed in, and an /index/ is returned. If you want to return the same type for lists and arrays, it should be documented as a count in both cases, to match @nr_pages. Consumers will want to compare @nr_pages with the return value to see if they need to call again. Comparing a count to an index is a notorious source of off-by-one errors. > The definition of the return value as-is makes sense for either a list > or an array. Returning "nr_pages - 1" suits an array because it's the > last valid index but it makes less sense when returning a list. > >> 2. Frequently the allocator returns a number smaller than >> the total number of elements. As you may recall, sunrpc >> will delay a bit (via a call to schedule_timeout) then call >> again. This is supposed to be a rare event, and the delay >> is substantial. But with the array-based API, a not-fully- >> successful allocator call seems to happen more than half >> the time. Is that expected? I'm calling with GFP_KERNEL, >> seems like the allocator should be trying harder. >> > > It's not expected that the array implementation would be worse *unless* > you are passing in arrays with holes in the middle. Otherwise, the success > rate should be similar. Essentially, sunrpc will always pass an array with a hole. Each RPC consumes the first N elements in the rq_pages array. Sometimes N == ARRAY_SIZE(rq_pages). AFAIK sunrpc will not pass in an array with more than one hole. Typically: .....PPPP My results show that, because svc_alloc_arg() ends up calling __alloc_pages_bulk() twice in this case, it ends up being twice as expensive as the list case, on average, for the same workload. >> 3. Is the current design intended so that if the consumer >> does call again, is it supposed to pass in the array address >> + the returned index (and @nr_pages reduced by the returned >> index) ? >> > > The caller does not have to pass in array address + returned index but > it's more efficient if it does. > > If you are passing in arrays with holes in the middle then the following > might work (not tested) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c83d38dfe936..4dc38516a5bd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5002,6 +5002,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > gfp_t alloc_gfp; > unsigned int alloc_flags; > int nr_populated = 0, prep_index = 0; > + bool hole = false; > > if (WARN_ON_ONCE(nr_pages <= 0)) > return 0; > @@ -5057,6 +5058,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > if (!zone) > goto failed; > > +retry_hole: > /* Attempt the batch allocation */ > local_irq_save(flags); > pcp = &this_cpu_ptr(zone->pageset)->pcp; > @@ -5069,6 +5071,7 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > * IRQs are enabled. > */ > if (page_array && page_array[nr_populated]) { > + hole = true; > nr_populated++; > break; > } > @@ -5109,6 +5112,9 @@ int __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > prep_new_page(page_array[prep_index++], 0, gfp, 0); > } > > + if (hole && nr_populated < nr_pages && hole) > + goto retry_hole; > + > return nr_populated; > > failed_irq: > > -- > Mel Gorman > SUSE Labs If a local_irq_save() is done more than once in this case, I don't expect that the result will be much better. To make the array API as performant as the list API, the sunrpc consumer will have to check if the N + 1th element is populated, upon return, rather than checking the return value against @nr_pages. -- Chuck Lever