Hi Martin, Andrii, Matt, On Fri, 2024-09-13 at 17:41 -0700, Martin KaFai Lau wrote: > On 9/13/24 1:57 PM, Andrii Nakryiko wrote: > > > > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct > > > > > bpf_iter_mptcp_subflow *it, > > > > > + struct mptcp_sock > > > > > *msk) > > > > > +{ > > > > > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; > > > > > + > > > > > + kit->msk = msk; > > > > > + if (!msk) > > > > > + return -EINVAL; > > > > > + > > > > > + kit->pos = &msk->conn_list; > > > > > + return 0; > > > > > +} > > [ ... ] > > > > > > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids) > > > > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new) > > > > > > > > I'm not 100% sure, but I suspect you might need to specify > > > > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is > > > > a > > +1 So we must add KF_TRUSTED_ARGS flag, right? > > > > > > @@ -241,6 +286,8 @@ static int __init > > > > > bpf_mptcp_kfunc_init(void) > > > > > int ret; > > > > > > > > > > ret = > > > > > register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); > > > > > + ret = ret ?: > > > > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, > > > > > + > > > > > &bpf_mptcp_sched_kfunc_set); > > This cannot be used in tracing. Actually, we don’t need to use mptcp_subflow bpf_iter in tracing. We plan to use it in MPTCP BPF packet schedulers, which are not tracing, but "struct_ops" types. And they work well with KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new: BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS); An example of the scheduler is: SEC("struct_ops") int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk, struct mptcp_sched_data *data) { struct mptcp_subflow_context *subflow; bpf_rcu_read_lock(); bpf_for_each(mptcp_subflow, subflow, msk) { mptcp_subflow_set_scheduled(subflow, true); break; } bpf_rcu_read_unlock(); return 0; } SEC(".struct_ops") struct mptcp_sched_ops first = { .init = (void *)mptcp_sched_first_init, .release = (void *)mptcp_sched_first_release, .get_subflow = (void *)bpf_first_get_subflow, .name = "bpf_first", }; But BPF mptcp_sched_ops code has not been merged into bpf-next yet, so I simply test this bpf_for_each(mptcp_subflow) in tracing since I noticed other bpf_iter selftests are using tracing too: progs/iters_task.c SEC("fentry.s/" SYS_PREFIX "sys_getpgid") progs/iters_css.c SEC("fentry.s/" SYS_PREFIX "sys_getpgid") If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I will try to move the selftest into a struct_ops. > > Going back to my earlier question in v1. How is the msk->conn_list > protected? > msk->conn_list is protected by msk socket lock. (@Matt, am I right?) We use this in kernel code: struct sock *sk = (struct sock *)msk; lock_sock(sk); kfunc(&msk->conn_list); release_sock(sk); If so, should we also use lock_sock/release_sock in bpf_iter_mptcp_subflow_next()? Thanks, -Geliang