Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter

Geliang Tang <geliang@xxxxxxxxxx> · Sat, 14 Sep 2024 16:40:35 +0800

Hi Martin, Andrii, Matt,

On Fri, 2024-09-13 at 17:41 -0700, Martin KaFai Lau wrote:
> On 9/13/24 1:57 PM, Andrii Nakryiko wrote:
> > > > > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > > > > bpf_iter_mptcp_subflow *it,
> > > > > +                                          struct mptcp_sock
> > > > > *msk)
> > > > > +{
> > > > > +       struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > > > > +
> > > > > +       kit->msk = msk;
> > > > > +       if (!msk)
> > > > > +               return -EINVAL;
> > > > > +
> > > > > +       kit->pos = &msk->conn_list;
> > > > > +       return 0;
> > > > > +}
> 
> [ ... ]
> 
> > > > >   BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > > > > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> > > > 
> > > > I'm not 100% sure, but I suspect you might need to specify
> > > > KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is
> > > > a
> 
> +1

So we must add KF_TRUSTED_ARGS flag, right?

> 
> > > > > @@ -241,6 +286,8 @@ static int __init
> > > > > bpf_mptcp_kfunc_init(void)
> > > > >          int ret;
> > > > > 
> > > > >          ret =
> > > > > register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > > > > +       ret = ret ?:
> > > > > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > > > > +
> > > > > &bpf_mptcp_sched_kfunc_set);
> 
> This cannot be used in tracing.

Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.

We plan to use it in MPTCP BPF packet schedulers, which are not
tracing, but "struct_ops" types. And they work well with
KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:

BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
KF_TRUSTED_ARGS);

An example of the scheduler is:

SEC("struct_ops")
int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
             struct mptcp_sched_data *data)
{
        struct mptcp_subflow_context *subflow;

        bpf_rcu_read_lock();
        bpf_for_each(mptcp_subflow, subflow, msk) {
                mptcp_subflow_set_scheduled(subflow, true);
                break;
        }
        bpf_rcu_read_unlock();

        return 0;
}

SEC(".struct_ops")
struct mptcp_sched_ops first = { 
        .init           = (void *)mptcp_sched_first_init,
        .release        = (void *)mptcp_sched_first_release,
        .get_subflow    = (void *)bpf_first_get_subflow,
        .name           = "bpf_first",
};

But BPF mptcp_sched_ops code has not been merged into bpf-next yet, so
I simply test this bpf_for_each(mptcp_subflow) in tracing since I
noticed other bpf_iter selftests are using tracing too:

progs/iters_task.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")

progs/iters_css.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")

If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
will try to move the selftest into a struct_ops.

> 
> Going back to my earlier question in v1. How is the msk->conn_list
> protected?
> 

msk->conn_list is protected by msk socket lock. (@Matt, am I right?) We
use this in kernel code:

	struct sock *sk = (struct sock *)msk;

	lock_sock(sk);
	kfunc(&msk->conn_list);
	release_sock(sk);

If so, should we also use lock_sock/release_sock in
bpf_iter_mptcp_subflow_next()?

Thanks,
-Geliang