On 2/10/22 7:45 AM, Felix Maurer wrote:
On 09.02.22 18:06, Yonghong Song wrote:
On 2/9/22 7:55 AM, Felix Maurer wrote:
If bpf_msg_push_data is called with len 0 (as it happens during
selftests/bpf/test_sockmap), we do not need to do anything and can
return early.
Calling bpf_msg_push_data with len 0 previously lead to a wrong ENOMEM
error: we later called get_order(copy + len); if len was 0, copy + len
was also often 0 and get_order returned some undefined value (at the
moment 52). alloc_pages caught that and failed, but then
bpf_msg_push_data returned ENOMEM. This was wrong because we are most
probably not out of memory and actually do not need any additional
memory.
v2: Add bug description and Fixes tag
Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: Felix Maurer <fmaurer@xxxxxxxxxx>
LGTM. I am wondering why bpf CI didn't catch this problem. Did you
modified the test with length 0 in order to trigger that? If this
is the case, it would be great you can add such a test to the
test_sockmap.
I did not modify the tests to trigger that. The state of the selftests
around that is unfortunately not very good. There is no explicit test
with length 0 but bpf_msg_push_data is still called with length 0,
because of what I consider to be bugs in the test. On the other hand,
explicit tests with other lengths are sometimes not called as well. I'll
elaborate on that in a bit.
Something easy to fix is that the tests do not check the return value of
bpf_msg_push_data which they probably should. That may have helped find
the problem earlier.
Now to the issue mentioned in the beginning: Only some of the BPF
programs used in test_sockmap actually call bpf_msg_push_data. However,
they are not always attached, just for particular scenarios:
txmsg_pass==1, txmsg_redir==1, or txmsg_drop==1. If none of those apply,
bpf_msg_push_data is never called. This happens for example in
test_txmsg_push. Out of the four defined tests only one actually calls
the helper.
But after a test, the parameters in the map are reset to 0 (instead of
being removed). Therefore, when the maps are reused in a subsequent test
which is one of the scenarios above, the values are present and
bpf_msg_push_data is called, albeit with the parameters set to 0. This
is also what triggered the wrong behavior fixed in the patch.
Unfortunately, I do not have the time to fix these issues in the test at
the moment.
Thanks for detailed explanation. Maybe for the immediate case, can you
just fix this in the selftest,
> Something easy to fix is that the tests do not check the return
value of
> bpf_msg_push_data which they probably should. That may have helped find
> the problem earlier.
This will be enough to verify your kernel change as without it the
test will fail.
The rest of test improvements can come later.
Acked-by: Yonghong Song <yhs@xxxxxx>
Thanks!