Re: [PATCH bpf-next v2] bpf: Do not try bpf_msg_push_data with len 0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/11/22 9:52 AM, Felix Maurer wrote:
On 10.02.22 19:04, Yonghong Song wrote:
On 2/10/22 7:45 AM, Felix Maurer wrote:
On 09.02.22 18:06, Yonghong Song wrote:
On 2/9/22 7:55 AM, Felix Maurer wrote:
If bpf_msg_push_data is called with len 0 (as it happens during
selftests/bpf/test_sockmap), we do not need to do anything and can
return early.

Calling bpf_msg_push_data with len 0 previously lead to a wrong ENOMEM
error: we later called get_order(copy + len); if len was 0, copy + len
was also often 0 and get_order returned some undefined value (at the
moment 52). alloc_pages caught that and failed, but then
bpf_msg_push_data returned ENOMEM. This was wrong because we are most
probably not out of memory and actually do not need any additional
memory.

v2: Add bug description and Fixes tag

Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: Felix Maurer <fmaurer@xxxxxxxxxx>

LGTM. I am wondering why bpf CI didn't catch this problem. Did you
modified the test with length 0 in order to trigger that? If this
is the case, it would be great you can add such a test to the
test_sockmap.

I did not modify the tests to trigger that. The state of the selftests
around that is unfortunately not very good. There is no explicit test
with length 0 but bpf_msg_push_data is still called with length 0,
because of what I consider to be bugs in the test. On the other hand,
explicit tests with other lengths are sometimes not called as well. I'll
elaborate on that in a bit.

Something easy to fix is that the tests do not check the return value of
bpf_msg_push_data which they probably should. That may have helped find
the problem earlier.

Now to the issue mentioned in the beginning: Only some of the BPF
programs used in test_sockmap actually call bpf_msg_push_data. However,
they are not always attached, just for particular scenarios:
txmsg_pass==1, txmsg_redir==1, or txmsg_drop==1. If none of those apply,
bpf_msg_push_data is never called. This happens for example in
test_txmsg_push. Out of the four defined tests only one actually calls
the helper.

But after a test, the parameters in the map are reset to 0 (instead of
being removed). Therefore, when the maps are reused in a subsequent test
which is one of the scenarios above, the values are present and
bpf_msg_push_data is called, albeit with the parameters set to 0. This
is also what triggered the wrong behavior fixed in the patch.

Unfortunately, I do not have the time to fix these issues in the test at
the moment.

Thanks for detailed explanation. Maybe for the immediate case, can you
just fix this in the selftest,

   > Something easy to fix is that the tests do not check the return
value of
   > bpf_msg_push_data which they probably should. That may have helped find
   > the problem earlier.

This will be enough to verify your kernel change as without it the
test will fail.

I just send a patch checking the return values of the bpf_msg_push_data
usages in the test [1]. Passing the errors to userspace by dropping
packets is not very nice, but a straightforward way in the current test
program.

I did try the same checks of the return values of bpf_msg_pull_data, but
then the tests fail. So there might be something hidden here as well.

Thanks for the patch! Maybe this can be the first step to fix
test_sockmap.

John, could you help take a look at the patch?


[1]:https://lore.kernel.org/bpf/89f767bb44005d6b4dd1f42038c438f76b3ebfad.1644601294.git.fmaurer@xxxxxxxxxx/

The rest of test improvements can come later.


Acked-by: Yonghong Song <yhs@xxxxxx>

Thanks!






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux