On 11/9/2024 3:39 AM, Martin KaFai Lau wrote:
On 11/8/24 12:26 AM, Xu Kuohai wrote:
-static void bpf_testmod_test_2(int a, int b)
+static void bpf_dummy_unreg(void *kdata, struct bpf_link *link)
{
+ WRITE_ONCE(__bpf_dummy_ops, &__bpf_testmod_ops);
}
[ ... ]
+static int run_struct_ops(const char *val, const struct kernel_param *kp)
+{
+ int ret;
+ unsigned int repeat;
+ struct bpf_testmod_ops *ops;
+
+ ret = kstrtouint(val, 10, &repeat);
+ if (ret)
+ return ret;
+
+ if (repeat > 10000)
+ return -ERANGE;
+
+ while (repeat-- > 0) {
+ ops = READ_ONCE(__bpf_dummy_ops);
I don't think it is the usual bpf_struct_ops implementation which only uses READ_ONCE and WRITE_ONCE to protect the registered ops. tcp-cc uses a refcnt+rcu. It seems hid uses synchronize_srcu(). sched_ext seems to also use kthread_flush_work() to wait for
all ops calling finished. Meaning I don't think the current bpf_struct_ops unreg implementation will run into this issue for sleepable ops.
Thanks for the explanation.
Are you saying that it's not the struct_ops framework's
responsibility to ensure the struct_ops map is not
released while it may be still in use? And the "bug" in
this series should be "fixed" in the test, namely this
patch?
The current synchronize_rcu_mult(call_rcu, call_rcu_tasks) is only needed for the tcp-cc because a tcp-cc's ops (which uses refcnt+rcu) can decrement its own refcnt. Looking back, this was a mistake (mine). A new tcp-cc ops should have been introduced
instead to return a new tcp-cc-ops to be used.
Not quite clear, but from the description, it seems that
the synchronize_rcu_mult(call_rcu, call_rcu_tasks) could
be just removed in some way, no need to do a cleanup to
switch it to call_rcu.
+ if (ops->test_1)
+ ops->test_1();
+ if (ops->test_2)
+ ops->test_2(0, 0);
+ }
+
+ return 0;
+}