Hi Martin, Thank you for your review! Sorry for the delay here, Geliang started to work on a new version, but it might take a bit of time as he is currently off for a few days. On 24/01/2025 01:47, Martin KaFai Lau wrote: > On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote: >> From: Geliang Tang <tanggeliang@xxxxxxxxxx> >> >> It's necessary to traverse all subflows on the conn_list of an MPTCP >> socket and then call kfunc to modify the fields of each subflow. In >> kernel space, mptcp_for_each_subflow() helper is used for this: >> >> mptcp_for_each_subflow(msk, subflow) >> kfunc(subflow); >> >> But in the MPTCP BPF program, this has not yet been implemented. As >> Martin suggested recently, this conn_list walking + modify-by-kfunc >> usage fits the bpf_iter use case. >> >> So this patch adds a new bpf_iter type named "mptcp_subflow" to do >> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ >> _destroy(). And register these bpf_iter mptcp_subflow into mptcp >> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used >> in BPF program like this: >> >> bpf_for_each(mptcp_subflow, subflow, msk) >> kfunc(subflow); (...) >> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c >> index >> c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 >> --- a/net/mptcp/bpf.c >> +++ b/net/mptcp/bpf.c (...) >> @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) >> return NULL; >> } >> +__bpf_kfunc static int >> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, >> + struct mptcp_sock *msk) >> +{ >> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; >> + struct sock *sk = (struct sock *)msk; >> + >> + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > >> + sizeof(struct bpf_iter_mptcp_subflow)); >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != >> + __alignof__(struct bpf_iter_mptcp_subflow)); >> + >> + kit->msk = msk; >> + if (!msk) > > NULL check is not needed. verifier should have rejected it for > KF_TRUSTED_ARGS. > >> + return -EINVAL; >> + >> + if (!sock_owned_by_user_nocheck(sk) && >> + !spin_is_locked(&sk->sk_lock.slock)) > > I could have missed something. If it is to catch bug, should it be > sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt > hook here, the lock should have already been held earlier in the kernel. Good point. Because in this series, the kfunc is currently restricted to CG [gs]etsockopt hooks, we should use msk_owned_by_me(msk) here. > This set is only showing the cg sockopt bpf prog but missing the major > struct_ops piece. It is hard to comment. I assumed the lock situation is > the same for the struct_ops where the lock will be held before calling > the struct_ops prog? I understand it is hard to comment on that point. In the 'struct_ops' we are designing, the lock will indeed be held before calling the stuct_ops program. So we will just need to make sure this assumption is correct for all callbacks of our struct_ops. Also, if I understood correctly, it is possible to restrict a kfunc to some specific struct_ops, e.g. not to call this kfunc for the TCP CA struct_ops. So these checks should indeed not be needed, but I will double-check that with Geliang. Cheers, Matt -- Sponsored by the NGI0 Core fund.