On Wed, Feb 12, 2025 at 7:52 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > On 2/10/25 21:09, Mina Almasry wrote: > > On Wed, Feb 5, 2025 at 4:20 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > >> > >> On 2/3/25 22:39, Mina Almasry wrote: > >> ... > >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > >>> index bb2b751d274a..3ff8f568c382 100644 > >>> --- a/include/linux/skbuff.h > >>> +++ b/include/linux/skbuff.h > >>> @@ -1711,9 +1711,12 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, > >> ... > >>> int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > >>> struct iov_iter *from, size_t length); > >>> @@ -1721,12 +1724,14 @@ int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > >>> static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, > >>> struct msghdr *msg, int len) > >>> { > >>> - return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); > >>> + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len, > >>> + NULL); > >> > >> Instead of propagating it all the way down and carving a new path, why > >> not reuse the existing infra? You already hook into where ubuf is > >> allocated, you can stash the binding in there. And > > > > It looks like it's not possible to increase the side of ubuf_info at > > all, otherwise the BUILD_BUG_ON in msg_zerocopy_alloc() fires. > > > > It's asserting that sizeof(ubuf_info_msgzc) <= sizeof(skb->cb), and > > I'm guessing increasing skb->cb size is not really the way to go. > > > > What I may be able to do here is stash the binding somewhere in > > ubuf_info_msgzc via union with fields we don't need for devmem, and/or > > It doesn't need to account the memory against the user, and you > actually don't want that because dmabuf should take care of that. > So, it should be fine to reuse ->mmp. > > It's also not a real sk_buff, so maybe maintainers wouldn't mind > reusing some more space out of it, if that would even be needed. > netmem skb are real sk_buff, with the modification that frags are not readable, only in the case that the netmem is unreadable. I would not approve of considering netmem/devmem skbs "not real skbs", and start messing with the semantics of skb fields for devmem skbs, and having to start adding skb_is_devmem() checks through all code in the skb handlers that touch the fields being overwritten in the devmem case. No, I don't think we can re-use random fields in the skb for devmem. > > stashing the binding in ubuf_info_ops (very hacky). Neither approach > > seems ideal, but the former may work and may be cleaner. > > > > I'll take a deeper look here. I had looked before and concluded that > > we're piggybacking devmem TX on MSG_ZEROCOPY path, because we need > > almost all of the functionality there (no copying, send complete > > notifications, etc), with one minor change in the skb filling. I had > > concluded that if MSG_ZEROCOPY was never updated to use the existing > > infra, then it's appropriate for devmem TX piggybacking on top of it > > MSG_ZEROCOPY does use the common infra, i.e. passing ubuf_info, > but doesn't need ->sg_from_iter as zerocopy_fill_skb_from_iter() > and it's what was there first. > But MSG_ZEROCOPY doesn't set msg->msg_ubuf. And not setting msg->msg_ubuf fails to trigger msg->sg_from_iter altogether. And also currently sg_from_iter isn't set up to take in a ubuf_info. We'd need that if we stash the binding in the ubuf_info. All in all I think I wanna prototype an msg->sg_from_iter approach and make a judgement call on whether it's cleaner than just passing the binding through a couple of helpers just as I'm doing here. My feeling is that the implementation in this patch may be cleaner than refactoring the entire msg_ubuf/sg_from_iter flows so we can sort of use it for MSG_ZEROCOPY with devmem when it currently doesn't use it. > > to follow that. I would not want to get into a refactor of > > MSG_ZEROCOPY for no real reason. > > > > But I'll take a deeper look here and see if I can make something > > slightly cleaner work. > > > >> zerocopy_fill_skb_from_devmem can implement ->sg_from_iter, > >> see __zerocopy_sg_from_iter(). > >> > >> ... > >>> diff --git a/net/core/datagram.c b/net/core/datagram.c > >>> index f0693707aece..c989606ff58d 100644 > >>> --- a/net/core/datagram.c > >>> +++ b/net/core/datagram.c > >>> @@ -63,6 +63,8 @@ > >>> +static int > >>> +zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from, > >>> + int length, > >>> + struct net_devmem_dmabuf_binding *binding) > >>> +{ > >>> + int i = skb_shinfo(skb)->nr_frags; > >>> + size_t virt_addr, size, off; > >>> + struct net_iov *niov; > >>> + > >>> + while (length && iov_iter_count(from)) { > >>> + if (i == MAX_SKB_FRAGS) > >>> + return -EMSGSIZE; > >>> + > >>> + virt_addr = (size_t)iter_iov_addr(from); > >> > >> Unless I missed it somewhere it needs to check that the iter > >> is iovec based. > >> > > > > How do we end up here with an iterator that is not iovec based? Is the > > user able to trigger that somehow and I missed it? > > Hopefully not, but for example io_uring passes bvecs for a number of > requests that can end up in tcp_sendmsg_locked(). Those probably > would work with the current patch, but check the order of some of the > checks it will break. And once io_uring starts passing bvecs for > normal send[msg] requests, it'd definitely be possible. And there > are other in kernel users apart from send(2) path, so who knows. > > The api allows it and therefore should be checked, it's better to > avoid quite possible latent bugs. > Sounds good. -- Thanks, Mina