On Thu, 22 Jun 2023 20:40:43 +0100 David Howells wrote: > > How did that happen? I thought MSG_SPLICE_PAGES comes from former > > sendpage users and sendpage can't operate on slab pages. > > Some of my patches, take the siw one for example, now aggregate all the bits > that make up a message into a single sendmsg() call, including any protocol > header and trailer in the same bio_vec[] as the payload where before it would > have to do, say, sendmsg+sendpage+sendpage+...+sendpage+sendmsg. Maybe it's just me but I'd prefer to keep the clear rule that splice operates on pages not slab objects. SIW is the software / fake implementation of RDMA, right? You couldn't have picked a less important user :( Paolo indicated that he'll take a look tomorrow, we'll see what he thinks. > I'm trying to make it so that I make the minimum number of sendmsg calls > (ie. 1 where possible) and the loop that processes the data is inside of that. The in-kernel users can be fixed to not use slab, and user space can't feed us slab objects. > This offers the opportunity, at least in the future, to append slab data to an > already-existing private fragment in the skbuff. Maybe we can get Eric to comment. The ability to identify "frag type" seems cool indeed, but I haven't thought about using it to attach slab objects. > > The locking is to local_bh_disable(). Does the milliont^w new frag > > allocator have any additional benefits? > > It is shareable because it does locking. Multiple sockets of multiple > protocols can share the pages it has reserved. It drops the lock around calls > to the page allocator so that GFP_KERNEL/GFP_NOFS can be used with it. > > Without this, the page fragment allocator would need to be per-socket, I > think, or be done further up the stack where the higher level drivers would > have to have a fragment bucket per whatever unit they use to deal with the > lack of locking. There's also the per task frag which can be used under normal conditions (sk_use_task_frag). > Doing it here makes cleanup simpler since I just transfer my ref on the > fragment to the skbuff frag list and it will automatically be cleaned up with > the skbuff. > > Willy suggested that I just allocate a page for each thing I want to copy, but > I would rather not do that for, say, an 8-byte bit of protocol data. TBH my intuition would also be get a full page and let the callers who care about performance fix themselves. Assuming we want to let slab objects in in the first place.