On Wed, 24 Feb 2021 10:26:00 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > This is a prototype series that introduces a bulk order-0 page allocator > with sunrpc being the first user. The implementation is not particularly > efficient and the intention is to iron out what the semantics of the API > should be. That said, sunrpc was reported to have reduced allocation > latency when refilling a pool. I also have a use-case in page_pool, and I've been testing with the earlier patches, results are here[1] [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org Awesome to see this newer patchset! thanks a lot for working on this! I'll run some new tests based on this. > As a side-note, while the implementation could be more efficient, it > would require fairly deep surgery in numerous places. The lock scope would > need to be significantly reduced, particularly as vmstat, per-cpu and the > buddy allocator have different locking protocol that overal -- e.g. all > partially depend on irqs being disabled at various points. Secondly, > the core of the allocator deals with single pages where as both the bulk > allocator and per-cpu allocator operate in batches. All of that has to > be reconciled with all the existing users and their constraints (memory > offline, CMA and cpusets being the trickiest). As you can see in[1], I'm getting a significant speedup from this. I guess that the cost of finding the "zone" is higher than I expected, as this basically what we/you amortize for the bulk. > In terms of semantics required by new users, my preference is that a pair > of patches be applied -- the first which adds the required semantic to > the bulk allocator and the second which adds the new user. > > Patch 1 of this series is a cleanup to sunrpc, it could be merged > separately but is included here for convenience. > > Patch 2 is the prototype bulk allocator > > Patch 3 is the sunrpc user. Chuck also has a patch which further caches > pages but is not included in this series. It's not directly > related to the bulk allocator and as it caches pages, it might > have other concerns (e.g. does it need a shrinker?) > > This has only been lightly tested on a low-end NFS server. It did not break > but would benefit from an evaluation to see how much, if any, the headline > performance changes. The biggest concern is that a light test case showed > that there are a *lot* of bulk requests for 1 page which gets delegated to > the normal allocator. The same criteria should apply to any other users. If you change local_irq_save(flags) to local_irq_disable() then you can likely get better performance for 1 page requests via this API. This limits the API to be used in cases where IRQs are enabled (which is most cases). (For my use-case I will not do 1 page requests). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer