> On Feb 11, 2021, at 4:12 AM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Feb 10, 2021 at 10:58:37PM +0000, Chuck Lever wrote: >>> Not in the short term due to bug load and other obligations. >>> >>> The original series had "mm, page_allocator: Only use per-cpu allocator >>> for irq-safe requests" but that was ultimately rejected because softirqs >>> were affected so it would have to be done without that patch. >>> >>> The last patch can be rebased easily enough but it only batch allocates >>> order-0 pages. It's also only build tested and could be completely >>> miserable in practice and as I didn't even try boot test let, let alone >>> actually test it, it could be a giant pile of crap. To make high orders >>> work, it would need significant reworking but if the API showed even >>> partial benefit, it might motiviate someone to reimplement the bulk >>> interfaces to perform better. >>> >>> Rebased diff, build tested only, might not even work >> >> Thanks, Mel, for kicking off a forward port. >> >> It compiles. I've added a patch to replace the page allocation loop >> in svc_alloc_arg() with a call to alloc_pages_bulk(). >> >> The server system deadlocks pretty quickly with any NFS traffic. Based >> on some initial debugging, it appears that a pcplist is getting corrupted >> and this causes the list_del() in __rmqueue_pcplist() to fail during a >> a call to alloc_pages_bulk(). >> > > Parameters to __rmqueue_pcplist are garbage as the parameter order changed. > I'm surprised it didn't blow up in a spectacular fashion. Again, this > hasn't been near any testing and passing a list with high orders to > free_pages_bulk() will corrupt lists too. Mostly it's a curiousity to see > if there is justification for reworking the allocator to fundamentally > deal in batches and then feed batches to pcp lists and the bulk allocator > while leaving the normal GFP API as single page "batches". While that > would be ideal, it's relatively high risk for regressions. There is still > some scope for adding a basic bulk allocator before considering a major > refactoring effort. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index f8353ea7b977..8f3fe7de2cf7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5892,7 +5892,7 @@ __alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order, > pcp_list = &pcp->lists[migratetype]; > > while (nr_pages) { > - page = __rmqueue_pcplist(zone, gfp_mask, migratetype, > + page = __rmqueue_pcplist(zone, migratetype, alloc_flags, > pcp, pcp_list); > if (!page) > break; The NFS server is considerably more stable now. Thank you! I confirmed that my patch is requesting and getting multiple pages. The new NFSD code and the API seem to be working as expected. The results are stunning. Each svc_alloc_arg() call here allocates 65 pages to satisfy a 256KB NFS READ request. Before: nfsd-972 [000] 584.513817: funcgraph_entry: + 35.385 us | svc_alloc_arg(); nfsd-979 [002] 584.513870: funcgraph_entry: + 29.051 us | svc_alloc_arg(); nfsd-980 [001] 584.513951: funcgraph_entry: + 29.178 us | svc_alloc_arg(); nfsd-983 [000] 584.514014: funcgraph_entry: + 29.211 us | svc_alloc_arg(); nfsd-976 [002] 584.514059: funcgraph_entry: + 29.315 us | svc_alloc_arg(); nfsd-974 [001] 584.514127: funcgraph_entry: + 29.237 us | svc_alloc_arg(); After: nfsd-977 [002] 87.049425: funcgraph_entry: 4.293 us | svc_alloc_arg(); nfsd-981 [000] 87.049478: funcgraph_entry: 4.059 us | svc_alloc_arg(); nfsd-988 [001] 87.049549: funcgraph_entry: 4.474 us | svc_alloc_arg(); nfsd-983 [003] 87.049612: funcgraph_entry: 3.819 us | svc_alloc_arg(); nfsd-976 [000] 87.049619: funcgraph_entry: 3.869 us | svc_alloc_arg(); nfsd-980 [002] 87.049738: funcgraph_entry: 4.124 us | svc_alloc_arg(); nfsd-975 [000] 87.049769: funcgraph_entry: 3.734 us | svc_alloc_arg(); There appears to be little cost change for single-page allocations using the bulk allocator (nr_pages=1): Before: nfsd-985 [003] 572.324517: funcgraph_entry: 0.332 us | svc_alloc_arg(); nfsd-986 [001] 572.324531: funcgraph_entry: 0.311 us | svc_alloc_arg(); nfsd-985 [003] 572.324701: funcgraph_entry: 0.311 us | svc_alloc_arg(); nfsd-986 [001] 572.324727: funcgraph_entry: 0.424 us | svc_alloc_arg(); nfsd-985 [003] 572.324760: funcgraph_entry: 0.332 us | svc_alloc_arg(); nfsd-986 [001] 572.324786: funcgraph_entry: 0.390 us | svc_alloc_arg(); After: nfsd-989 [002] 75.043226: funcgraph_entry: 0.322 us | svc_alloc_arg(); nfsd-988 [001] 75.043436: funcgraph_entry: 0.368 us | svc_alloc_arg(); nfsd-989 [002] 75.043464: funcgraph_entry: 0.424 us | svc_alloc_arg(); nfsd-988 [001] 75.043490: funcgraph_entry: 0.317 us | svc_alloc_arg(); nfsd-989 [002] 75.043517: funcgraph_entry: 0.425 us | svc_alloc_arg(); nfsd-988 [001] 75.050025: funcgraph_entry: 0.407 us | svc_alloc_arg(); -- Chuck Lever