Mina Almasry wrote: > Augment dmabuf binding to be able to handle TX. Additional to all the RX > binding, we also create tx_vec needed for the TX path. > > Provide API for sendmsg to be able to send dmabufs bound to this device: > > - Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from. > - MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf. > > Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY > implementation, while disabling instances where MSG_ZEROCOPY falls back > to copying. > > We additionally pipe the binding down to the new > zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems > instead of the traditional page netmems. > > We also special case skb_frag_dma_map to return the dma-address of these > dmabuf net_iovs instead of attempting to map pages. > > Based on work by Stanislav Fomichev <sdf@xxxxxxxxxxx>. A lot of the meat > of the implementation came from devmem TCP RFC v1[1], which included the > TX path, but Stan did all the rebasing on top of netmem/net_iov. > > Cc: Stanislav Fomichev <sdf@xxxxxxxxxxx> > Signed-off-by: Kaiyuan Zhang <kaiyuanz@xxxxxxxxxx> > Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx> > > > --- > > v3: > - Use kvmalloc_array instead of kcalloc (Stan). > - Fix unreachable code warning (Simon). > > v2: > - Remove dmabuf_offset from the dmabuf cmsg. > - Update zerocopy_fill_skb_from_devmem to interpret the > iov_base/iter_iov_addr as the offset into the dmabuf to send from > (Stan). > - Remove the confusing binding->tx_iter which is not needed if we > interpret the iov_base/iter_iov_addr as offset into the dmabuf (Stan). > - Remove check for binding->sgt and binding->sgt->nents in dmabuf > binding. > - Simplify the calculation of binding->tx_vec. > - Check in net_devmem_get_binding that the binding we're returning > has ifindex matching the sending socket (Willem). > --- > include/linux/skbuff.h | 15 +++- > include/net/sock.h | 1 + > include/uapi/linux/uio.h | 6 +- > net/core/datagram.c | 41 ++++++++++- > net/core/devmem.c | 97 +++++++++++++++++++++++-- > net/core/devmem.h | 42 ++++++++++- > net/core/netdev-genl.c | 64 +++++++++++++++- > net/core/skbuff.c | 6 +- > net/core/sock.c | 8 ++ > net/ipv4/tcp.c | 36 ++++++--- > net/vmw_vsock/virtio_transport_common.c | 3 +- > 11 files changed, 285 insertions(+), 34 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index bb2b751d274a..3ff8f568c382 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1711,9 +1711,12 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, > > void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); > > +struct net_devmem_dmabuf_binding; > + > int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, > struct sk_buff *skb, struct iov_iter *from, > - size_t length); > + size_t length, > + struct net_devmem_dmabuf_binding *binding); > > int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > struct iov_iter *from, size_t length); > @@ -1721,12 +1724,14 @@ int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, > struct msghdr *msg, int len) > { > - return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); > + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len, > + NULL); > } > > int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, > struct msghdr *msg, int len, > - struct ubuf_info *uarg); > + struct ubuf_info *uarg, > + struct net_devmem_dmabuf_binding *binding); > > /* Internal */ > #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) > @@ -3697,6 +3702,10 @@ static inline dma_addr_t __skb_frag_dma_map(struct device *dev, > size_t offset, size_t size, > enum dma_data_direction dir) > { > + if (skb_frag_is_net_iov(frag)) { > + return netmem_to_net_iov(frag->netmem)->dma_addr + offset + > + frag->offset; > + } > return dma_map_page(dev, skb_frag_page(frag), > skb_frag_off(frag) + offset, size, dir); > } > diff --git a/include/net/sock.h b/include/net/sock.h > index 8036b3b79cd8..09eb918525b6 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1822,6 +1822,7 @@ struct sockcm_cookie { > u32 tsflags; > u32 ts_opt_id; > u32 priority; > + u32 dmabuf_id; > }; > > static inline void sockcm_init(struct sockcm_cookie *sockc, > diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h > index 649739e0c404..866bd5dfe39f 100644 > --- a/include/uapi/linux/uio.h > +++ b/include/uapi/linux/uio.h > @@ -38,10 +38,14 @@ struct dmabuf_token { > __u32 token_count; > }; > > +struct dmabuf_tx_cmsg { > + __u32 dmabuf_id; > +}; > + Why a wrapper struct instead of just __u32?