On 11/05, Mina Almasry wrote: > For device memory TCP, we expect the skb headers to be available in host > memory for access, and we expect the skb frags to be in device memory > and unaccessible to the host. We expect there to be no mixing and > matching of device memory frags (unaccessible) with host memory frags > (accessible) in the same skb. > > Add a skb->devmem flag which indicates whether the frags in this skb > are device memory frags or not. > > __skb_fill_page_desc() now checks frags added to skbs for page_pool_iovs, > and marks the skb as skb->devmem accordingly. > > Add checks through the network stack to avoid accessing the frags of > devmem skbs and avoid coalescing devmem skbs with non devmem skbs. > > Signed-off-by: Willem de Bruijn <willemb@xxxxxxxxxx> > Signed-off-by: Kaiyuan Zhang <kaiyuanz@xxxxxxxxxx> > Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx> > > --- > include/linux/skbuff.h | 14 +++++++- > include/net/tcp.h | 5 +-- > net/core/datagram.c | 6 ++++ > net/core/gro.c | 5 ++- > net/core/skbuff.c | 77 ++++++++++++++++++++++++++++++++++++------ > net/ipv4/tcp.c | 6 ++++ > net/ipv4/tcp_input.c | 13 +++++-- > net/ipv4/tcp_output.c | 5 ++- > net/packet/af_packet.c | 4 +-- > 9 files changed, 115 insertions(+), 20 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 1fae276c1353..8fb468ff8115 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -805,6 +805,8 @@ typedef unsigned char *sk_buff_data_t; > * @csum_level: indicates the number of consecutive checksums found in > * the packet minus one that have been verified as > * CHECKSUM_UNNECESSARY (max 3) > + * @devmem: indicates that all the fragments in this skb are backed by > + * device memory. > * @dst_pending_confirm: need to confirm neighbour > * @decrypted: Decrypted SKB > * @slow_gro: state present at GRO time, slower prepare step required > @@ -991,7 +993,7 @@ struct sk_buff { > #if IS_ENABLED(CONFIG_IP_SCTP) > __u8 csum_not_inet:1; > #endif > - > + __u8 devmem:1; > #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) > __u16 tc_index; /* traffic control index */ > #endif > @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) > __skb_zcopy_downgrade_managed(skb); > } > > +/* Return true if frags in this skb are not readable by the host. */ > +static inline bool skb_frags_not_readable(const struct sk_buff *skb) > +{ > + return skb->devmem; bikeshedding: should we also rename 'devmem' sk_buff flag to 'not_readable'? It better communicates the fact that the stack shouldn't dereference the frags (because it has 'devmem' fragments or for some other potential future reason).