Re: [External] Re: [PATCH bpf 2/2] tcp_bpf: add sk_rmem_alloc related logic for ingress redirection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/7/24 8:04 PM, Cong Wang wrote:
On Thu, Oct 17, 2024 at 12:57:42AM +0000, zijianzhang@xxxxxxxxxxxxx wrote:
From: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx>

Although we sk_rmem_schedule and add sk_msg to the ingress_msg of sk_redir
in bpf_tcp_ingress, we do not update sk_rmem_alloc. As a result, except
for the global memory limit, the rmem of sk_redir is nearly unlimited.

Thus, add sk_rmem_alloc related logic to limit the recv buffer.

Signed-off-by: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx>
---
  include/linux/skmsg.h | 11 ++++++++---
  net/core/skmsg.c      |  6 +++++-
  net/ipv4/tcp_bpf.c    |  4 +++-
  3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index d9b03e0746e7..2cbe0c22a32f 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -317,17 +317,22 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
  	kfree_skb(skb);
  }
-static inline void sk_psock_queue_msg(struct sk_psock *psock,
+static inline bool sk_psock_queue_msg(struct sk_psock *psock,
  				      struct sk_msg *msg)
  {
+	bool ret;
+
  	spin_lock_bh(&psock->ingress_lock);
-	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
+	if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
  		list_add_tail(&msg->list, &psock->ingress_msg);
-	else {
+		ret = true;
+	} else {
  		sk_msg_free(psock->sk, msg);
  		kfree(msg);
+		ret = false;
  	}
  	spin_unlock_bh(&psock->ingress_lock);
+	return ret;
  }
static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b1dcbd3be89e..110ee0abcfe0 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -445,8 +445,10 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
  			if (likely(!peek)) {
  				sge->offset += copy;
  				sge->length -= copy;
-				if (!msg_rx->skb)
+				if (!msg_rx->skb) {
  					sk_mem_uncharge(sk, copy);
+					atomic_sub(copy, &sk->sk_rmem_alloc);
+				}
  				msg_rx->sg.size -= copy;
if (!sge->length) {
@@ -772,6 +774,8 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock)
list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) {
  		list_del(&msg->list);
+		if (!msg->skb)
+			atomic_sub(msg->sg.size, &psock->sk->sk_rmem_alloc);
  		sk_msg_free(psock->sk, msg);

Why not calling this atomic_sub() in sk_msg_free_elem()?

Thanks.

sk_msg_free_elem called by sk_msg_free or sk_msg_free_no_charge will
be invoked in multiple locations including TX/RX/Error and etc.

We should call atomic_sub(&sk->sk_rmem_alloc) for sk_msgs that have
been atomic_add before. In other words, we need to call atomic_sub
only for sk_msgs in ingress_msg.

As for "!msg->skb" check here, I want to make sure the sk_msg is not
from function sk_psock_skb_ingress_enqueue, because these sk_msgs'
rmem accounting has already handled by skb_set_owner_r in function
sk_psock_skb_ingress.






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux