On Mon, Nov 11, 2024 at 11:01 AM David Wei <dw@xxxxxxxxxxx> wrote: > > On 2024-11-04 05:20, Mina Almasry wrote: > > On Fri, Nov 1, 2024 at 10:41 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > >> ... > >>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > >>>> index e928efc22f80..31e01da61c12 100644 > >>>> --- a/net/ipv4/tcp.c > >>>> +++ b/net/ipv4/tcp.c > >>>> @@ -277,6 +277,7 @@ > >>>> #include <net/ip.h> > >>>> #include <net/sock.h> > >>>> #include <net/rstreason.h> > >>>> +#include <net/page_pool/types.h> > >>>> > >>>> #include <linux/uaccess.h> > >>>> #include <asm/ioctls.h> > >>>> @@ -2476,6 +2477,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, > >>>> } > >>>> > >>>> niov = skb_frag_net_iov(frag); > >>>> + if (net_is_devmem_page_pool_ops(niov->pp->mp_ops)) { > >>>> + err = -ENODEV; > >>>> + goto out; > >>>> + } > >>>> + > >>> > >>> I think this check needs to go in the caller. Currently the caller > >>> assumes that if !skb_frags_readable(), then the frag is dma-buf, and > >> > >> io_uring originated netmem that are marked unreadable as well > >> and so will end up in tcp_recvmsg_dmabuf(), then we reject and > >> fail since they should not be fed to devmem TCP. It should be > >> fine from correctness perspective. > >> > >> We need to check frags, and that's the place where we iterate > >> frags. Another option is to add a loop in tcp_recvmsg_locked > >> walking over all frags of an skb and doing the checks, but > >> that's an unnecessary performance burden to devmem. > >> > > > > Checking each frag in tcp_recvmsg_dmabuf (and the equivalent io_uring > > function) is not ideal really. Especially when you're dereferencing > > nio->pp to do the check which IIUC will pull a cache line not normally > > needed in this code path and may have a performance impact. > > This check is needed currently because the curent assumption in core > netdev code is that !skb_frags_readable() means devmem TCP. Longer term, > we need to figure out how to distinguish skb frag providers in both code > and Netlink introspection. > Right. In my mind the skb_frags_readable() check can be extended to tell us whether the entire skb is io_uring or devmem or readable. So that we don't have to: 1. Do a per-frag check, and 2. pull and keep an entire new cacheline hot to do the check. > Since your concerns here are primarily around performance rather than > correctness, I suggest we defer this as a follow up series. > OK. > > > > We currently have a check in __skb_fill_netmem_desc() that makes sure > > all frags added to an skb are pages or dmabuf. I think we need to > > improve it to make sure all frags added to an skb are of the same type > > (pages, dmabuf, iouring). sending it to skb_copy_datagram_msg or > > tcp_recvmsg_dmabuf or error. > > It should not be possible for drivers to fill in an skb with frags from > different providers. A provider can only change upon a queue reset. > Right, drivers shouldn't fill in an skb with different providers. We should probably protect the core net from weird driver behavior. We could assert/force at skb_add_rx_frag_netmem() time that the entire skb has frags of the same type. Then we only need to check skb->frags[0] once to determine the type of the entire skb. But as you mention this is a performance optimization. Probably OK to punt this to a later iteration. But to me the changes are straightforward enough that it may have been good to carry them in the first iteration anyway. But OK to leave this out. -- Thanks, Mina