On Mon, Nov 6, 2023 at 3:37 PM David Ahern <dsahern@xxxxxxxxxx> wrote: > > On 11/6/23 3:18 PM, Mina Almasry wrote: > >>>>>> @@ -991,7 +993,7 @@ struct sk_buff { > >>>>>> #if IS_ENABLED(CONFIG_IP_SCTP) > >>>>>> __u8 csum_not_inet:1; > >>>>>> #endif > >>>>>> - > >>>>>> + __u8 devmem:1; > >>>>>> #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) > >>>>>> __u16 tc_index; /* traffic control index */ > >>>>>> #endif > >>>>>> @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) > >>>>>> __skb_zcopy_downgrade_managed(skb); > >>>>>> } > >>>>>> > >>>>>> +/* Return true if frags in this skb are not readable by the host. */ > >>>>>> +static inline bool skb_frags_not_readable(const struct sk_buff *skb) > >>>>>> +{ > >>>>>> + return skb->devmem; > >>>>> > >>>>> bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_readable'? > >>>>> It better communicates the fact that the stack shouldn't dereference the > >>>>> frags (because it has 'devmem' fragments or for some other potential > >>>>> future reason). > >>>> > >>>> +1. > >>>> > >>>> Also, the flag on the skb is an optimization - a high level signal that > >>>> one or more frags is in unreadable memory. There is no requirement that > >>>> all of the frags are in the same memory type. > >> > >> David: maybe there should be such a requirement (that they all are > >> unreadable)? Might be easier to support initially; we can relax later > >> on. > >> > > > > Currently devmem == not_readable, and the restriction is that all the > > frags in the same skb must be either all readable or all unreadable > > (all devmem or all non-devmem). > > What requires that restriction? In all of the uses of skb->devmem and > skb_frags_not_readable() what matters is if any frag is not readable, > then frag list walk or collapse is avoided. > > Currently only tcp_recvmsg_devmem(), I think. tcp_recvmsg_locked() delegates to tcp_recvmsg_devmem() if skb->devmem, and tcp_recvmsg_devmem() net_err's if it finds a non-iov frag in the skb. This is done for some simplicity, because iov's are given to the user via cmsg, but pages are copied into the linear buffer. I think it would be confusing for the user if we simultaneously copied some data to the linear buffer and gave them a devmem cmsgs in the same recvmsg() call. So, my simplicity is: 1. in a single skb, all frags must be devmem or non-devmem, no mixing. 2. In a single recvmsg() call, we only process devmem or non-devmem skbs, no mixing. -- Thanks, Mina