From: Geliang Tang <tanggeliang@xxxxxxxxxx> It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this: mptcp_for_each_subflow(msk, subflow) kfunc(subflow); But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case. So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this: bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow); Suggested-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx> Signed-off-by: Geliang Tang <tanggeliang@xxxxxxxxxx> Reviewed-by: Mat Martineau <martineau@xxxxxxxxxx> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx> --- Notes: - v2: - Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers. - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin). - v3: - Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin) - Remove unneeded !msk check (Martin) - Remove locks checks, add msk_owned_by_me for lockdep (Martin) - The following note and 2 questions have been added below. This new bpf_iter will be used by our future BPF packet schedulers and path managers. To see how we are going to use them, please check our export branch [1], especially these two commits: - "bpf: Add mptcp packet scheduler struct_ops": introduce a new struct_ops. - "selftests/bpf: Add bpf_burst scheduler & test": new test showing how the new struct_ops and bpf_iter are being used. [1] https://github.com/multipath-tcp/mptcp_net-next/commits/export @BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter to be used with struct_ops, but only with the two new ones we are going to introduce that are specific to MPTCP, and with not others struct_ops (TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do you have examples or doc you could point to us to have this restriction in place, please? Also, for one of the two future MPTCP struct_ops, not all callbacks should be allowed to use this new bpf_iter, because they are called from different contexts. How can we ensure such callbacks from a struct_ops cannot call mptcp_subflow bpf_iter without adding new dedicated checks looking if some locks are held for all callbacks? We understood that they wanted to have something similar with sched_ext, but we are not sure if this code is ready nor if it is going to be accepted. --- A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones: - v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) - v3: drop bpf_iter__mptcp_subflow - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii) - v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) --- net/mptcp/bpf.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 2e4b8ddf81ab0bb9dc547ea8783b73767d553a18..d3b5597eddb915a19eca87d87c31a27dfbdda619 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs(); __bpf_kfunc static struct mptcp_subflow_context * @@ -41,10 +50,57 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct sock *sk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + struct mptcp_sock *msk; + + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > + sizeof(struct bpf_iter_mptcp_subflow)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != + __alignof__(struct bpf_iter_mptcp_subflow)); + + if (unlikely(!sk || !sk_fullsock(sk))) + return -EINVAL; + + if (sk->sk_protocol != IPPROTO_MPTCP) + return -EINVAL; + + msk = mptcp_sk(sk); + + msk_owned_by_me(msk); + + kit->msk = msk; + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { -- 2.48.1