On Thu, 22 Jun 2023 23:54:31 +0100 David Howells wrote: > > Maybe it's just me but I'd prefer to keep the clear rule that splice > > operates on pages not slab objects. > > sendpage isn't only being used for splice(). Or were you referring to > splicing pages into socket buffers more generally? Yes, sorry, any sort of "zero-copy attachment of data onto a socket queue". > > SIW is the software / fake implementation of RDMA, right? You couldn't have > > picked a less important user :( > > ISCSI and sunrpc could both make use of this, as could ceph and others. I > have patches for sunrpc to make it condense into a single bio_vec[] and > sendmsg() in the server code (ie. nfsd) but for the moment, Chuck wanted me to > just do the xdr payload. But to be clear (and I'm not implying that it's not a strong enough reason) - the only benefit from letting someone pass headers in a slab object is that the code already uses kmalloc(), right? IOW it could be changed to use frags without much of a LoC bloat? > > Maybe we can get Eric to comment. The ability to identify "frag type" > > seems cool indeed, but I haven't thought about using it to attach > > slab objects. > > Unfortunately, you can't attach slab objects. Their lifetime isn't controlled > by put_page() or folio_put(). kmalloc()/kfree() doesn't refcount them - > they're recycled immediately. Hence why I was copying them. (Well, you > could attach, but then you need a callback mechanism). Right, right, I thought you were saying that _in the future_ we may try to attach the slab objects as frags (and presumably copy when someone tries to ref them). Maybe I over-interpreted. > What I'm trying to do is make it so that the process of calling sock_sendmsg() > with MSG_SPLICE_PAGES looks exactly the same as without: You fill in a > bio_vec[] pointing to your protocol header, the payload and the trailer, > pointing as appropriate to bits of slab, static, stack data or ref'able pages, > and call sendmsg and then the data will get copied or spliced as appropriate > to the page type, whether the MSG_SPLICE_PAGES flag is supplied and whether > the flag is supported. > > There are a couple of things I'd like to avoid: (1) having to call > sock_sendmsg() more than once per message and (2) having sendmsg allocate more > space and make a copy of data that you had to copy into a frag before calling > sendmsg. If we're not planning to attach the slab objects as frags, then surely doing kmalloc() + free() in the caller, and then allocating a frag and copying the data over in the skb / socket code is also inefficient. Fixing the caller gives all the benefits you want, and then some. Granted some form of alloc_skb_frag() needs to be added so that callers don't curse us, I'd start with something based on sk_page_frag(). Or we could pull the coping out into an intermediate helper which first replaces all slab objects in the iovec with page frags and then calls sock_sendmsg()? Maybe that's stupid... Let's hear what others think. If we can't reach instant agreement -- can you strategically separate out the minimal set of changes required to just kill MSG_SENDPAGE_NOTLAST. IMHO it's worth getting that into 6.5.