Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > On Tue, Apr 11, 2023 at 05:08:48PM +0100, David Howells wrote: > > Make the page_frag_cache allocator have a separate allocation bucket for > > each cpu to avoid racing. This means that no lock is required, other than > > preempt disablement, to allocate from it, though if a softirq wants to > > access it, then softirq disablement will need to be added. > ... > Let me ask a third time as I've not got an answer the last two times: Sorry about that. I think the problem is that the copy of the message from you directly to me arrives after the first copy that comes via a mailing list and google then deletes the direct one - as obviously no one could possibly want duplicates, right? :-/ - and so you usually get consigned to the linux-kernel or linux-fsdevel mailing list folder. > > Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache() > > and use the default allocation buckets rather than defining their own. > > why are these callers treated different from the others? There are only four users of struct page_frag_cache, the one these patches modify:: (1) GVE. (2) Mediatek. (3) NVMe. (4) skbuff. Note that things are slightly confused by there being three very similarly named frag allocators (page_frag and page_frag_1k in addition to page_frag_cache) and the __page_frag_cache_drain() function gets used for things other than just page_frag_cache. I've replaced the single allocation buckets with per-cpu allocation buckets for (1), (2) and (3) so that no locking[*] is required other than pinning it to the cpu temporarily - but I can't test them as I don't have hardware. [*] Note that what's upstream doesn't have locking, and I'm not sure all the users of it are SMP-safe. That leaves (4). Upstream, skbuff.c creates two separate per-cpu frag caches and I've elected to retain that, except that the per-cpu bits are now inside the frag allocator as I'm not entirely sure of the reason that there's a separate napi frag cache to the netdev_alloc_cache. The general page_frag_cache allocator is used by skb_splice_from_iter() if it encounters a page it can't take a ref on, so it has been tested through that using sunrpc, sunrpc+siw and cifs+siw. > Can you show any performance numbers? As far as I can tell, it doesn't make any obvious difference to directly pumping data through TCP or TLS over TCP or transferring data over a network filesystem such as sunrpc or cifs using siw/TCP. I've tested this between two machines over a 1G and a 10G link. I can generate some actual numbers tomorrow. Actually, I probably can drop these patches 2-4 from this patchset and just use the netdev_alloc_cache in skb_splice_from_iter() for now. Since that copies unspliceable data, I no longer need to allocate frags in the next layer up. David