Pavel Begunkov wrote: > On 6/28/24 18:06, Willem de Bruijn wrote: > > Pavel Begunkov wrote: > >> Instead of accounting every page range against the socket separately, do > >> it in batch based on the change in skb->truesize. It's also moved into > >> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is > >> simpler and responsible for setting frags but not the accounting. > >> > >> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> > > > > Reviewed-by: Willem de Bruijn <willemb@xxxxxxxxxx> > > Thanks for reviews! > > >> --- > >> net/core/datagram.c | 31 ++++++++++++++++++------------- > >> 1 file changed, 18 insertions(+), 13 deletions(-) > >> > >> diff --git a/net/core/datagram.c b/net/core/datagram.c > >> index 7f7d5da2e406..2b24d69b1e94 100644 > >> --- a/net/core/datagram.c > >> +++ b/net/core/datagram.c > >> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, > >> } > >> EXPORT_SYMBOL(skb_copy_datagram_from_iter); > >> > >> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > >> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb, > >> struct iov_iter *from, size_t length) > >> { > >> int frag = skb_shinfo(skb)->nr_frags; > >> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb, > >> int refs, order, n = 0; > >> size_t start; > >> ssize_t copied; > >> - unsigned long truesize; > >> > >> if (frag == MAX_SKB_FRAGS) > >> return -EMSGSIZE; > > > > Does the existing code then incorrectly not unwind sk_wmem_queued_add > > and sk_mem_charge if returning with error from the second or later > > loop.. > > As long as ->truesize matches what's accounted to the socket, > kfree_skb() -> sock_wfree()/->destructor() should take care of it. > With sk_mem_charge() I assume __zerocopy_sg_from_iter -> ___pskb_trim() > should do it, need to look it up, but if not, it sounds like a temporary > over estimation until the skb is put down. I don't see anything > concerning. Is that the scenario you're worried about? Oh indeed. Thanks. I don't see ___pskb_trim adjusting except for the cases where it calls skb_condese, but neither does it adjust truesize. So indeed a temporary over estimation until e.g., tcp_wmem_free_skb. Sounds fine.