On 5/8/24 17:04, Martin KaFai Lau wrote:
On 5/6/24 10:56 PM, Kui-Feng Lee wrote:
Subsystems that manage struct_ops objects may attempt to detach a link
when
the link has been released or is about to be released. The test in
this patch demonstrate to developers the correct way to handle this
situation using a locking mechanism and atomic64_inc_not_zero().
A subsystem must ensure that a link is valid when detaching the link. In
order to achieve that, the subsystem may need to obtain a lock to
safeguard
a table that holds the pointer to the link being detached. However, the
subsystem cannot invoke link->ops->detach() while holding the lock
because
other tasks may be in the process of unregistering, which could lead to a
deadlock. This is why atomic64_inc_not_zero() is used to maintain the
Other tasks un-registering in parallel is not the reason for deadlock.
The deadlock is because the link detach will call unreg() which usually
will acquire the same lock (the detach_mutex here) and there is lock
ordering with the update_mutex also. Hence, the link detach must be done
after releasing the detach_mutex. After releasing the detach_mutex, the
link is protected by its refcnt.
It is what I mean in the commit log. I will rephrase it to emphasize
holding the same lock.
I think the above should be put as comments in bpf_dummy_do_link_detach
for the subsystem to reference later.
ok!
link's validity. (Refer to bpf_dummy_do_link_detach() in the previous
patch
for more details.)
This test make sure the pattern mentioned above work correctly.
Signed-off-by: Kui-Feng Lee <thinker.li@xxxxxxxxx>
---
.../bpf/prog_tests/test_struct_ops_module.c | 44 +++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git
a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c
b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c
index 9f6657b53a93..1e37037cfd8a 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c
@@ -292,6 +292,48 @@ static void test_subsystem_detach(void)
struct_ops_detach__destroy(skel);
}
+/* A subsystem detachs a link while the link is going to be free. */
+static void test_subsystem_detach_free(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts,
+ .data_in = &pkt_v4,
+ .data_size_in = sizeof(pkt_v4));
+ struct struct_ops_detach *skel;
+ struct bpf_link *link = NULL;
+ int prog_fd;
+ int err;
+
+ skel = struct_ops_detach__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "struct_ops_detach_open_and_load"))
+ return;
+
+ link = bpf_map__attach_struct_ops(skel->maps.testmod_do_detach);
+ if (!ASSERT_OK_PTR(link, "attach_struct_ops"))
+ goto cleanup;
+
+ bpf_link__destroy(link);
+
+ prog_fd = bpf_program__fd(skel->progs.start_detach);
+ if (!ASSERT_GE(prog_fd, 0, "start_detach_fd"))
+ goto cleanup;
+
+ /* Do detachment from the registered subsystem */
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ if (!ASSERT_OK(err, "start_detach_run"))
+ goto cleanup;
+
+ /* The link may have zero refcount value and may have been
+ * unregistered, so the detachment from the subsystem should fail.
+ */
+ ASSERT_EQ(topts.retval, (u32)-ENOENT, "start_detach_run retval");
+
+ /* Sync RCU to make sure the link is freed without any crash */
+ ASSERT_OK(kern_sync_rcu(), "sync rcu");
+
+cleanup:
+ struct_ops_detach__destroy(skel);
+}
+
void serial_test_struct_ops_module(void)
{
if (test__start_subtest("test_struct_ops_load"))
@@ -304,5 +346,7 @@ void serial_test_struct_ops_module(void)
test_detach_link();
if (test__start_subtest("test_subsystem_detach"))
test_subsystem_detach();
+ if (test__start_subtest("test_subsystem_detach_free"))
+ test_subsystem_detach_free();
}