On 12/5/23 1:39 AM, Hou Tao wrote:
Hi,
On 12/5/2023 2:04 PM, Yonghong Song wrote:
With previous patch, one of subtests in test_btf_id becomes
flaky and may fail. The following is a failing example:
Error: #26 btf
Error: #26/174 btf/BTF ID
Error: #26/174 btf/BTF ID
btf_raw_create:PASS:check 0 nsec
btf_raw_create:PASS:check 0 nsec
test_btf_id:PASS:check 0 nsec
...
test_btf_id:PASS:check 0 nsec
test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1
The test tries to prove a btf_id not available after the map is closed.
But btf_id is freed only after workqueue and a rcu grace period, compared
to previous case just after a rcu grade period.
It is not accurate. Before applying the patch, the btf_id will be
released in btf_put() and there is no RCU grace period involved. After
I missed it (and because I didn't double check the code).
Yes, btf_id is freed before going to rcu gp. So previously
reliable test now becomes not reliable due to workqueue.
applying the patch, the btf_id will be released after the running of
bpf_map_free_deferred kworker.
To fix the flaky test, I added a kern_sync_rcu() after closing map and
before querying btf id availability, essentially ensuring a rcu grace
period in the kernel, which seems making the test happy.
kern_sync_rcu() doesn't guarantee the bpf_map_free_deferred kworker will
complete, so why not remove the test case instead ?
Yes, I understand this. My hope is that kern_sync_rcu() can
make the test stable enough (that is why I am using 'seems making')
but no guarantees.
For this particular case, if I am doing refcount for btf as mentioned
in the comments of previous patch, we should be okay.
Will craft another version tomorrow with btf refcount approach.
Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
tools/testing/selftests/bpf/prog_tests/btf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index 8fb4a04fbbc0..7feb4223bbac 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -4629,6 +4629,7 @@ static int test_btf_id(unsigned int test_num)
/* The map holds the last ref to BTF and its btf_id */
close(map_fd);
+ kern_sync_rcu();
map_fd = -1;
btf_fd[0] = bpf_btf_get_fd_by_id(map_info.btf_id);
if (CHECK(btf_fd[0] >= 0, "BTF lingers")) {