Re: [PATCH v3 bpf-next] Improve BPF test stability (related to perf events and scheduling)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/3/22 9:29 AM, Mykola Lysenko wrote:


On Mar 3, 2022, at 7:36 AM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

On 3/2/22 10:27 PM, Mykola Lysenko wrote:
In send_signal, replace sleep with dummy cpu intensive computation
to increase probability of child process being scheduled. Add few
more asserts.
In find_vma, reduce sample_freq as higher values may be rejected in
some qemu setups, remove usleep and increase length of cpu intensive
computation.
In bpf_cookie, perf_link and perf_branches, reduce sample_freq as
higher values may be rejected in some qemu setups
Signed-off-by: Mykola Lysenko <mykolal@xxxxxx>
Acked-by: Yonghong Song <yhs@xxxxxx>
---
  .../selftests/bpf/prog_tests/bpf_cookie.c       |  2 +-
  .../testing/selftests/bpf/prog_tests/find_vma.c | 13 ++++++++++---
  .../selftests/bpf/prog_tests/perf_branches.c    |  4 ++--
  .../selftests/bpf/prog_tests/perf_link.c        |  2 +-
  .../selftests/bpf/prog_tests/send_signal.c      | 17 ++++++++++-------
  .../selftests/bpf/progs/test_send_signal_kern.c |  2 +-
  6 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
index cd10df6cd0fc..0612e79a9281 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
@@ -199,7 +199,7 @@ static void pe_subtest(struct test_bpf_cookie *skel)
  	attr.type = PERF_TYPE_SOFTWARE;
  	attr.config = PERF_COUNT_SW_CPU_CLOCK;
  	attr.freq = 1;
-	attr.sample_freq = 4000;
+	attr.sample_freq = 1000;
  	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
  	if (!ASSERT_GE(pfd, 0, "perf_fd"))
  		goto cleanup;
diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c
index b74b3c0c555a..7cf4feb6464c 100644
--- a/tools/testing/selftests/bpf/prog_tests/find_vma.c
+++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c
@@ -30,12 +30,20 @@ static int open_pe(void)
  	attr.type = PERF_TYPE_HARDWARE;
  	attr.config = PERF_COUNT_HW_CPU_CYCLES;
  	attr.freq = 1;
-	attr.sample_freq = 4000;
+	attr.sample_freq = 1000;
  	pfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC);
    	return pfd >= 0 ? pfd : -errno;
  }
  +static bool find_vma_pe_condition(struct find_vma *skel)
+{
+	return skel->bss->found_vm_exec == 0 ||
+		skel->data->find_addr_ret != 0 ||

Should this not test for `skel->data->find_addr_ret == -1` ?

It seems that find_addr_ret changes value few times until it gets to 0. I added print statements when value is changed:

find_addr_ret -1 => initial value
find_addr_ret -16 => -EBUSY
find_addr_ret 0 => final value

Hence, in this case I think it is better to wait for the final value. We do have time out in the loop anyways (when “i" reaches 1bn), so test would not get stuck.

Thanks for the above information. I read the code again. I think it is more complicated than above. Let us look at the bpf program:

SEC("perf_event")
int handle_pe(void)
{
        struct task_struct *task = bpf_get_current_task_btf();
        struct callback_ctx data = {};

        if (task->pid != target_pid)
                return 0;

        find_addr_ret = bpf_find_vma(task, addr, check_vma, &data, 0);

        /* In NMI, this should return -EBUSY, as the previous call is using
         * the irq_work.
         */
        find_zero_ret = bpf_find_vma(task, 0, check_vma, &data, 0);
        return 0;
}

Assuming task->pid == target_pid,
the first bpf program call should have
    find_addr_ret = 0     /* lock irq_work */
    find_zero_ret = -EBUSY

For the second bpf program call, there are two possibilities:
. irq_work is unlocked, so the result will find_addr_ret = 0, find_zero_ret = -EBUSY . or irq_work is still locked, the result will be find_addr_ret = -EBUSY, find_zero_ret = -EBUSY

the third bpf program call will be similar to the second bpf program run.

So final validation check probably should check both 0 and -EBUSY
for find_addr_ret.

Leaving some time to potentially unlock the irq_work as in the original
code is still needed to prevent potential problem for the subsequent tests.

I think this patch can be broke into three separate commits:
  - find_vma fix
  - send_signal fix
  - other
to make changes a little bit focused.


TL:DR change in the test that prints these values

-       for (i = 0; i < 1000000000 && find_vma_pe_condition(skel); ++i)
+       int find_addr_ret = -1;
+       printf("find_addr_ret %d\n", skel->data->find_addr_ret);
+
+       for (i = 0; i < 1000000000 && find_vma_pe_condition(skel); ++i) {
+               if (find_addr_ret != skel->data->find_addr_ret) {
+                       find_addr_ret = skel->data->find_addr_ret;
+                       printf("find_addr_ret %d\n", skel->data->find_addr_ret);
+               }
                 ++j;
+       }
+
+       printf("find_addr_ret %d\n", skel->data->find_addr_ret);


+		skel->data->find_zero_ret == -1 ||
+		strcmp(skel->bss->d_iname, "test_progs") != 0;
+}
+
Thanks,
Daniel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux