Re: [PATCH mptcp-next v5 1/5] bpf: Add mptcp_subflow bpf_iter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrii,

On Thu, 2024-09-12 at 11:24 -0700, Andrii Nakryiko wrote:
> On Thu, Sep 12, 2024 at 2:26 AM Geliang Tang <geliang@xxxxxxxxxx>
> wrote:
> > 
> > From: Geliang Tang <tanggeliang@xxxxxxxxxx>
> > 
> > It's necessary to traverse all subflows on the conn_list of an
> > MPTCP
> > socket and then call kfunc to modify the fields of each subflow. In
> > kernel space, mptcp_for_each_subflow() helper is used for this:
> > 
> >         mptcp_for_each_subflow(msk, subflow)
> >                 kfunc(subflow);
> > 
> > But in the MPTCP BPF program, this has not yet been implemented. As
> > Martin suggested recently, this conn_list walking + modify-by-kfunc
> > usage fits the bpf_iter use case. So this patch adds a new bpf_iter
> > type named "mptcp_subflow" to do this and implements its helpers
> > bpf_iter_mptcp_subflow_new()/_next()/_destroy().
> > 
> > Since these bpf_iter mptcp_subflow helpers are invoked in its
> > selftest
> > in a ftrace hook for mptcp_sched_get_send(), it's necessary to
> > register
> > them into a BPF_PROG_TYPE_TRACING type kfunc set together with
> > other
> > two used kfuncs mptcp_subflow_active() and
> > mptcp_subflow_set_scheduled().
> > 
> > Then bpf_for_each() for mptcp_subflow can be used in BPF program
> > like
> > this:
> > 
> >         i = 0;
> >         bpf_rcu_read_lock();
> >         bpf_for_each(mptcp_subflow, subflow, msk) {
> >                 if (i++ >= MPTCP_SUBFLOWS_MAX)
> >                         break;
> >                 kfunc(subflow);
> >         }
> >         bpf_rcu_read_unlock();
> > 
> > v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> >     drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2]
> > (Andrii)
> > v3: drop bpf_iter__mptcp_subflow
> > v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> > it in
> >     _next() (Andrii)
> > 
> > Suggested-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
> > Signed-off-by: Geliang Tang <tanggeliang@xxxxxxxxxx>
> > ---
> >  net/mptcp/bpf.c | 57 ++++++++++++++++++++++++++++++++++++++++++++-
> > ----
> >  1 file changed, 52 insertions(+), 5 deletions(-)
> > 
> 
> Looks ok from setting up open-coded iterator things, but I can't
> speak
> for other aspects I mentioned below.
> 
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 6414824402e6..fec18e7e5e4a 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -201,9 +201,51 @@ static const struct btf_kfunc_id_set
> > bpf_mptcp_fmodret_set = {
> >         .set   = &bpf_mptcp_fmodret_ids,
> >  };
> > 
> > -__diag_push();
> > -__diag_ignore_all("-Wmissing-prototypes",
> > -                 "kfuncs which will be used in BPF programs");
> > +struct bpf_iter_mptcp_subflow {
> > +       __u64 __opaque[2];
> > +} __attribute__((aligned(8)));
> > +
> > +struct bpf_iter_mptcp_subflow_kern {
> > +       struct mptcp_sock *msk;
> > +       struct list_head *pos;
> > +} __attribute__((aligned(8)));
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc int bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,
> > +                                          struct mptcp_sock *msk)
> > +{
> > +       struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +
> > +       kit->msk = msk;
> > +       if (!msk)
> > +               return -EINVAL;
> > +
> > +       kit->pos = &msk->conn_list;
> > +       return 0;
> > +}
> > +
> > +__bpf_kfunc struct mptcp_subflow_context *
> > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > +{
> > +       struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +       struct mptcp_subflow_context *subflow;
> > +       struct mptcp_sock *msk = kit->msk;
> > +
> > +       if (!msk)
> > +               return NULL;
> > +
> > +       subflow = list_entry(kit->pos->next, struct
> > mptcp_subflow_context, node);
> > +       if (!subflow || list_entry_is_head(subflow, &msk-
> > >conn_list, node))
> 
> it's a bit weird that you need both !subflow and
> list_entry_is_head().
> Can you have NULL subflow at all? But this is the question to
> msk->conn_list semantics.

Do you mean to return subflow regardless of whether subflow is NULL? If
so, we need to use list_is_last() instead of list_entry_is_head():

{
    struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;

    if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
            return NULL;

    kit->pos = kit->pos->next;
    return list_entry(kit->pos, struct mptcp_subflow_context, node);
}

Hope this is a better version.

> 
> > +               return NULL;
> > +
> > +       kit->pos = &subflow->node;
> > +       return subflow;
> > +}
> > +
> > +__bpf_kfunc void bpf_iter_mptcp_subflow_destroy(struct
> > bpf_iter_mptcp_subflow *it)
> > +{
> > +}
> > 
> >  __bpf_kfunc struct mptcp_subflow_context *
> >  bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data,
> > unsigned int pos)
> > @@ -218,12 +260,15 @@ __bpf_kfunc bool
> > bpf_mptcp_subflow_queues_empty(struct sock *sk)
> >         return tcp_rtx_queue_empty(sk);
> >  }
> > 
> > -__diag_pop();
> > +__bpf_kfunc_end_defs();
> > 
> >  BTF_KFUNCS_START(bpf_mptcp_sched_kfunc_ids)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new)
> 
> I'm not 100% sure, but I suspect you might need to specify
> KF_TRUSTED_ARGS here to ensure that `struct mptcp_sock *msk` is a

KF_TRUSTED_ARGS breaks the selftest in patch 2 [1]:

'''
libbpf: prog 'trace_mptcp_sched_get_send': BPF program load failed:
Invalid argument
libbpf: prog 'trace_mptcp_sched_get_send': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk) @
mptcp_bpf_iter.c:13
0: (79) r6 = *(u64 *)(r1 +0)
func 'mptcp_sched_get_send' arg0 has btf_id 28019 type STRUCT
'mptcp_sock'
1: R1=ctx() R6_w=ptr_mptcp_sock()
; if (bpf_get_current_pid_tgid() >> 32 != pid) @ mptcp_bpf_iter.c:17
1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
2: (77) r0 >>= 32                     ;
R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
3: (18) r1 = 0xffffb766c1684004       ;
R1_w=map_value(map=mptcp_bp.bss,ks=4,vs=8,off=4)
5: (61) r1 = *(u32 *)(r1 +0)          ;
R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
6: (67) r1 <<= 32                     ;
R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,sm
ax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
7: (c7) r1 s>>= 32                    ;
R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
8: (5d) if r0 != r1 goto pc+39        ;
R0_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
R1_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0;
0x7fffffff))
; iter = 0; @ mptcp_bpf_iter.c:20
9: (18) r8 = 0xffffb766c1684000       ;
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
11: (b4) w1 = 0                       ; R1_w=0
12: (63) *(u32 *)(r8 +0) = r1         ; R1_w=0
R8_w=map_value(map=mptcp_bp.bss,ks=4,vs=8)
; bpf_rcu_read_lock(); @ mptcp_bpf_iter.c:22
13: (85) call bpf_rcu_read_lock#84967         ;
14: (bf) r7 = r10                     ; R7_w=fp0 R10=fp0
; iter = 0; @ mptcp_bpf_iter.c:20
15: (07) r7 += -16                    ; R7_w=fp-16
; bpf_for_each(mptcp_subflow, subflow, msk) { @ mptcp_bpf_iter.c:23
16: (bf) r1 = r7                      ; R1_w=fp-16 R7_w=fp-16
17: (bf) r2 = r6                      ; R2_w=ptr_mptcp_sock()
R6=ptr_mptcp_sock()
18: (85) call bpf_iter_mptcp_subflow_new#62694
R2 must be referenced or trusted
processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 1
peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'trace_mptcp_sched_get_send': failed to load: -22
libbpf: failed to load object 'mptcp_bpf_iter'
libbpf: failed to load BPF skeleton 'mptcp_bpf_iter': -22
test_bpf_iter:FAIL:skel_open_load: mptcp_iter unexpected error: -22
#169/4   mptcp/bpf_iter:FAIL
'''

I don't know how to fix it yet.

> valid owned kernel object. Other BPF folks might help to clarify
> this.
> 
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy)
> > +BTF_ID_FLAGS(func, mptcp_subflow_active)

But we do need to add KF_ITER_NEW/NEXT/DESTROY flags here:

BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT |
KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)

With these flags, we can drop this additional test in loop:

	if (i++ >= MPTCP_SUBFLOWS_MAX)
		break;


This problem has been bothering me for a while. I got "infinite loop
detected" errors when walking the list with
bpf_for_each(mptcp_subflow). So I had to use this additional test to
break it.

With these KF_ITER_NEW/NEXT/DESTROY flags, no need to use this
additional test anymore.

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/patch/4467ab3af0f1a6c378778ca6be72753b16a1532c.1726132802.git.tanggeliang@xxxxxxxxxx/

> >  BTF_ID_FLAGS(func, mptcp_subflow_set_scheduled)
> >  BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx_by_pos)
> > -BTF_ID_FLAGS(func, mptcp_subflow_active)
> >  BTF_ID_FLAGS(func, mptcp_set_timeout)
> >  BTF_ID_FLAGS(func, mptcp_wnd_end)
> >  BTF_ID_FLAGS(func, tcp_stream_memory_free)
> > @@ -241,6 +286,8 @@ static int __init bpf_mptcp_kfunc_init(void)
> >         int ret;
> > 
> >         ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
> > +       ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
> > +                                             
> > &bpf_mptcp_sched_kfunc_set);
> >         ret = ret ?:
> > register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
> >                                               
> > &bpf_mptcp_sched_kfunc_set);
> >  #ifdef CONFIG_BPF_JIT
> > --
> > 2.43.0
> > 
> > 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux