On 9/14/24 12:12 PM, Geliang Tang wrote:
@@ -241,6 +286,8 @@ static int __init
bpf_mptcp_kfunc_init(void)
int ret;
ret =
register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+ ret = ret ?:
register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
+
&bpf_mptcp_sched_kfunc_set);
This cannot be used in tracing.
Actually, we don’t need to use mptcp_subflow bpf_iter in tracing.
We plan to use it in MPTCP BPF packet schedulers, which are not
tracing, but "struct_ops" types. And they work well with
KF_TRUSTED_ARGS flag in bpf_iter_mptcp_subflow_new:
BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW |
KF_TRUSTED_ARGS);
An example of the scheduler is:
SEC("struct_ops")
int BPF_PROG(bpf_first_get_subflow, struct mptcp_sock *msk,
struct mptcp_sched_data *data)
{
struct mptcp_subflow_context *subflow;
bpf_rcu_read_lock();
bpf_for_each(mptcp_subflow, subflow, msk) {
mptcp_subflow_set_scheduled(subflow, true);
break;
}
bpf_rcu_read_unlock();
return 0;
}
SEC(".struct_ops")
struct mptcp_sched_ops first = {
.init = (void *)mptcp_sched_first_init,
.release = (void *)mptcp_sched_first_release,
.get_subflow = (void *)bpf_first_get_subflow,
.name = "bpf_first",
};
But BPF mptcp_sched_ops code has not been merged into bpf-next yet,
so
I simply test this bpf_for_each(mptcp_subflow) in tracing since I
noticed other bpf_iter selftests are using tracing too:
progs/iters_task.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
progs/iters_css.c
SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
If this bpf_for_each(mptcp_subflow) can only be used in struct_ops, I
will try to move the selftest into a struct_ops.
Going back to my earlier question in v1. How is the msk->conn_list
protected?
msk->conn_list is protected by msk socket lock. (@Matt, am I right?)
We use this in kernel code:
A KF_TRUSTED_ARGS msk does not mean its sock locked. That said, only the tp_btf
should have trusted msk args but still it is hard to audit all existing and
future tracepoints which may or may not have msk lock held.
If the subflow list is rcu-ify, it will be easier to reason for tracing use
case. Regardless, the use case is not for tracing, so this discussion is
probably moot now.
struct sock *sk = (struct sock *)msk;
lock_sock(sk);
kfunc(&msk->conn_list);
release_sock(sk);
If so, should we also use lock_sock/release_sock in
bpf_iter_mptcp_subflow_next()?
bpf_for_each(mptcp_subflow) is expected to be used in the get_subflow()
interface of mptcp_sched_ops to traverse all subflows and then pick one
subflow to send data. This interface is invoked in
mptcp_sched_get_send(), here the msk socket lock is hold already:
int mptcp_sched_get_send(struct mptcp_sock *msk)
{
struct mptcp_subflow_context *subflow;
struct mptcp_sched_data data;
msk_owned_by_me(msk);
... ...
mptcp_for_each_subflow(msk, subflow) {
if (READ_ONCE(subflow->scheduled))
return 0;
}
data.reinject = false;
if (msk->sched == &mptcp_sched_default || !msk->sched)
return mptcp_sched_default_get_subflow(msk, &data);
return msk->sched->get_subflow(msk, &data);
}
So no need to hold msk socket lock again in BPF schedulers. This means
we can walk the conn_list without any lock. bpf_rcu_read_lock() and
bpf_rcu_read_unlock() can be dropped in BPF schedulers too. Am I right?
hmm... don't know how bpf_rcu_read_lock() comes into picture considering the msk
lock has already been held in the struct_ops that you are planning to add.
Something you may need to consider on the subflow locking situation. Not sure
exactly what the bpf prog needs to do on a subflow. e.g. If it needs to change
some fields in the subflow, does it need to lock the subflow or the msk lock is
good enough.