On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@xxxxxxxxxx> wrote: > > From: Geliang Tang <tanggeliang@xxxxxxxxxx> > > It's necessary to traverse all subflows on the conn_list of an MPTCP > socket and then call kfunc to modify the fields of each subflow. In > kernel space, mptcp_for_each_subflow() helper is used for this: > > mptcp_for_each_subflow(msk, subflow) > kfunc(subflow); > > But in the MPTCP BPF program, this has not yet been implemented. As > Martin suggested recently, this conn_list walking + modify-by-kfunc > usage fits the bpf_iter use case. So this patch adds a new bpf_iter > type named "mptcp_subflow" to do this and implements its helpers > bpf_iter_mptcp_subflow_new()/_next()/_destroy(). > > Since these bpf_iter mptcp_subflow helpers are invoked in its selftest > in a ftrace hook for mptcp_sched_get_send(), it's necessary to register > them into a BPF_PROG_TYPE_TRACING type kfunc set together with other > two used kfuncs mptcp_subflow_active() and mptcp_subflow_set_scheduled(). > > Then bpf_for_each() for mptcp_subflow can be used in BPF program like > this: > > i = 0; > bpf_rcu_read_lock(); > bpf_for_each(mptcp_subflow, subflow, msk) { > if (i++ >= MPTCP_SUBFLOWS_MAX) > break; > kfunc(subflow); > } > bpf_rcu_read_unlock(); > > v2: remove msk->pm.lock in _new() and _destroy() (Martin) > drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) > v3: drop bpf_iter__mptcp_subflow > v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in > _next() (Andrii) > > Suggested-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx> > Signed-off-by: Geliang Tang <tanggeliang@xxxxxxxxxx> > --- > net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 52 insertions(+), 5 deletions(-) > Looks ok from setting up open-coded iterator things, but I can't speak for other aspects I mentioned below. > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c > index 6414824402e6..fec18e7e5e4a 100644 > --- a/net/mptcp/bpf.c > +++ b/net/mptcp/bpf.c > @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { > .set = &bpf_mptcp_fmodret_ids, > }; > > -__diag_push(); > -__diag_ignore_all("-Wmissing-prototypes", > - "kfuncs which will be used in BPF programs"); > +struct bpf_iter_mptcp_subflow { > + __u64 __opaque[2]; > +} __attribute__((aligned(8))); > + > +struct bpf_iter_mptcp_subflow_kern { > + struct mptcp_sock *msk; > + struct list_head *pos; > +} __attribute__((aligned(8))); > + > +__bpf_kfunc_start_defs(); > + > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, > + struct mptcp_sock *msk) > +{ > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; > + > + kit->msk = msk; > + if (!msk) > + return -EINVAL; > + > + kit->pos = &msk->conn_list; > + return 0; > +} > + > +__bpf_kfunc struct mptcp_subflow_context * > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) > +{ > + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; > + struct mptcp_subflow_context *subflow; > + struct mptcp_sock *msk = kit->msk; > + > + if (!msk) > + return NULL; > + > + subflow = list_entry(kit->pos->next, struct mptcp_subflow_context, node); > + if (!subflow || list_entry_is_head(subflow, &msk->conn_list, node)) it's a bit weird that you need both !subflow and list_entry_is_head(). Can you have NULL subflow at all? But this is the question to msk->conn_list semantics. > + return NULL; > + > + kit->pos = &subflow->node; > + return subflow; > +} > + > +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) > +{ > +} > > __bpf_kfunc struct mptcp_subflow_context * > bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos) > @@ -218,12 +260,15 @@ __bpf_kfunc bool bpf_mptcp_subflow_queues_empty(struct sock *sk) > return tcp_rtx_queue_empty(sk); > } > > -__diag_pop(); > +__bpf_kfunc_end_defs(); > > BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids) > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new) I'm not 100% sure, but I suspect you might need to specify KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a valid owned kernel object. Other BPF folks might help to clarify this. > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next) > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy) > +BTF_ID_FLAGS(func, mptcp_subflow_active) > BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled) > BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos) > -BTF_ID_FLAGS(func, mptcp_subflow_active) > BTF_ID_FLAGS(func, mptcp_set_timeout) > BTF_ID_FLAGS(func, mptcp_wnd_end) > BTF_ID_FLAGS(func, tcp_stream_memory_free) > @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void) > int ret; > > ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); > + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, > + &bpf_mptcp_sched_kfunc_set); > ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, > &bpf_mptcp_sched_kfunc_set); > #ifdef CONFIG_BPF_JIT > -- > 2.43.0 > >