Re: [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter

Matthieu Baerts <matttbe@xxxxxxxxxx> · Thu, 30 Jan 2025 13:05:54 +0100

Hi Martin,

Thank you for your review!

Sorry for the delay here, Geliang started to work on a new version, but
it might take a bit of time as he is currently off for a few days.

On 24/01/2025 01:47, Martin KaFai Lau wrote:
> On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
>> From: Geliang Tang <tanggeliang@xxxxxxxxxx>
>>
>> It's necessary to traverse all subflows on the conn_list of an MPTCP
>> socket and then call kfunc to modify the fields of each subflow. In
>> kernel space, mptcp_for_each_subflow() helper is used for this:
>>
>>     mptcp_for_each_subflow(msk, subflow)
>>         kfunc(subflow);
>>
>> But in the MPTCP BPF program, this has not yet been implemented. As
>> Martin suggested recently, this conn_list walking + modify-by-kfunc
>> usage fits the bpf_iter use case.
>>
>> So this patch adds a new bpf_iter type named "mptcp_subflow" to do
>> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
>> _destroy(). And register these bpf_iter mptcp_subflow into mptcp
>> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
>> in BPF program like this:
>>
>>     bpf_for_each(mptcp_subflow, subflow, msk)
>>         kfunc(subflow);

(...)

>> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
>> index
>> c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644
>> --- a/net/mptcp/bpf.c
>> +++ b/net/mptcp/bpf.c

(...)

>> @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk)
>>       return NULL;
>>   }
>>   +__bpf_kfunc static int
>> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
>> +               struct mptcp_sock *msk)
>> +{
>> +    struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
>> +    struct sock *sk = (struct sock *)msk;
>> +
>> +    BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
>> +             sizeof(struct bpf_iter_mptcp_subflow));
>> +    BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
>> +             __alignof__(struct bpf_iter_mptcp_subflow));
>> +
>> +    kit->msk = msk;
>> +    if (!msk)
> 
> NULL check is not needed. verifier should have rejected it for
> KF_TRUSTED_ARGS.
> 
>> +        return -EINVAL;
>> +
>> +    if (!sock_owned_by_user_nocheck(sk) &&
>> +        !spin_is_locked(&sk->sk_lock.slock))
> 
> I could have missed something. If it is to catch bug, should it be
> sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt
> hook here, the lock should have already been held earlier in the kernel.

Good point. Because in this series, the kfunc is currently restricted to
CG [gs]etsockopt hooks, we should use msk_owned_by_me(msk) here.

> This set is only showing the cg sockopt bpf prog but missing the major
> struct_ops piece. It is hard to comment. I assumed the lock situation is
> the same for the struct_ops where the lock will be held before calling
> the struct_ops prog?

I understand it is hard to comment on that point. In the 'struct_ops' we
are designing, the lock will indeed be held before calling the stuct_ops
program. So we will just need to make sure this assumption is correct
for all callbacks of our struct_ops.
Also, if I understood correctly, it is possible to restrict a kfunc to
some specific struct_ops, e.g. not to call this kfunc for the TCP CA
struct_ops. So these checks should indeed not be needed, but I will
double-check that with Geliang.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.