Hi Martin, On Fri, 2024-03-29 at 10:27 -0700, Martin KaFai Lau wrote: > > > > On 3/28/24 10:57 PM, Geliang Tang wrote: > > > > > > > > Hi Martin, > > > > > > > > > > > > > > > > On Thu, 2024-03-28 at 09:55 -0700, Martin KaFai Lau > > > > > > > > wrote: > > > > > > > > > > > > On 3/28/24 3:23 AM, Geliang Tang wrote: > > > > > > > > > > > > > > > > From: Geliang Tang > > > > > > > > > > > > > > > > <tanggeliang@xxxxxxxxxx> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > bpf_tcp_ca tests may emit EAGAIN > > > > > > > > > > > > > > > > sometimes. In that > > > > > > > > > > > > > > > > case, tests > > > > > > > > > > > > > > > > fail with > > > > > > > > > > > > > > > > "bytes != total_bytes" errors. Sending > > > > > > > > > > > > > > > > should continue, > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > break > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > errno is EAGAIN. This patch can make > > > > > > > > > > > > > > > > bpf_tcp_ca tests > > > > > > > > > > > > > > > > stable. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Geliang Tang > > > > > > > > > > > > > > > > <tanggeliang@xxxxxxxxxx> > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > tools/testing/selftests/bpf/prog_tests/ > > > > > > > > > > > > > > > > bpf_tcp_ca.c > > > > > > > > > > > > > > > > | 4 ++-- > > > > > > > > > > > > > > > > 1 file changed, 2 insertions(+), 2 > > > > > > > > > > > > > > > > deletions(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > > > a/tools/testing/selftests/bpf/prog_test > > > > > > > > > > > > > > > > s/bpf_tcp_ca.c > > > > > > > > > > > > > > > > b/tools/testing/selftests/bpf/prog_test > > > > > > > > > > > > > > > > s/bpf_tcp_ca.c > > > > > > > > > > > > > > > > index 077b107130f6..fbc219c2d53b 100644 > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > a/tools/testing/selftests/bpf/prog_test > > > > > > > > > > > > > > > > s/bpf_tcp_ca.c > > > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > > > b/tools/testing/selftests/bpf/prog_test > > > > > > > > > > > > > > > > s/bpf_tcp_ca.c > > > > > > > > > > > > > > > > @@ -56,7 +56,7 @@ static void > > > > > > > > > > > > > > > > *server(void *arg) > > > > > > > > > > > > > > > > while (bytes < total_bytes && > > > > > > > > > > > > > > > > !READ_ONCE(stop)) { > > > > > > > > > > > > > > > > nr_sent = send(fd, &batch, > > > > > > > > > > > > > > > > MIN(total_bytes - bytes, > > > > > > > > > > > > > > > > sizeof(batch)), 0); > > > > > > > > > > > > > > > > - if (nr_sent == -1 && errno == EINTR) > > > > > > > > > > > > > > > > + if (nr_sent == -1 && (errno == EINTR > > > > > > > > > > > > > > > > || errno == > > > > > > > > > > > > > > > > EAGAIN)) > > > > > > > > > > > > > > > > > > > > > > > > This is a non blocking socket. EAGAIN is > > > > > > > > > > > > hitting the > > > > > > > > > > > > timeout > > > > > > > > > > > > situation? > > > > > > > > > > > > > > > > > > > > > > > > The default timeout is 3s and it has not been > > > > > > > > > > > > changed after > > > > > > > > > > > > the > > > > > > > > > > > > recent > > > > > > > > > > > > connect_fd_to_fd and start_server > > > > > > > > > > > > simplifications. I don't > > > > > > > > > > > > find > > > > > > > > > > > > bpf > > > > > > > > > > > > CI failing > > > > > > > > > > > > in this test in the last month also. > > > > > > > > > > > > > > > > > > > > > > > > I would prefer to fail after timeout instead of > > > > > > > > > > > > keep > > > > > > > > > > > > retrying. Do > > > > > > > > > > > > you > > > > > > > > > > > > really hit > > > > > > > > > > > > that in your environment for this specific > > > > > > > > > > > > bpf_tcp_ca test? > > > > > > > > > > > > There > > > > > > > > > > > > are > > > > > > > > > > > > many tests > > > > > > > > > > > > using this timeout value also. > > > > > > > > > > > > > > > > This is the 2nd patch of "refactor mptcp bpf tests" > > > > > > > > series: > > > > > > > > > > > > > > > > https://patchwork.kernel.org/project/mptcp/cover/cover.1711688054.git.tanggeliang@xxxxxxxxxx/ > > > > > > > > > > > > > > > > > I didn't get the mentioned EAGAIN errors in bpf_tcp_ca > > > > > > > > tests, > > > > > > > > but > > > > > > > > got > > > > > > > > them in MPTCP BPF sched tests (see patch 1). MPTCP BPF > > > > > > > > sched > > > > > > > > tests > > > > > > > > (not > > > > > > > > upstream yet) use the same sending and receiving > > > > > > > > functions as > > > > > > > > bpf_tcp_ca tests (patch 15). So it makes sense to add > > > > > > > > this fix > > > > > > > > for > > > > > > > > bpf_tcp_ca tests too. > > > > > > > > It sounds like the EAGAIN is specific to mptcp sched test and > > > > is > > > > not > > > > due to a > > > > timeout? Did you try to increase the timeout and see if it > > > > resolves > > > > the issue? > > > > > > > > afaik, there is no fix needed for the bpf_tcp_ca test. bpf CI > > > > expects > > > > the test > > > > to finish. If the default 3s turns out to be too flaky for most > > > > tests, it could > > > > be increased instead of loop testing EAGAIN because the bpf CI > > > > expects the test > > > > to finish. However, so far I don't see this (i.e. 3s is too > > > > short) > > > > to > > > > be the > > > > case from looking at one month old data in bpf CI. Thanks for your suggestions. I finally fixed this issue by explicitly setting the server sockets with nonblock flags in MPTCP BPF tests. Then EAGAIN will not appear in the tests anymore. The fix is here: https://patchwork.kernel.org/project/mptcp/patch/f0b66813ae8274e5653988d80d16171508f05796.1712042049.git.tanggeliang@xxxxxxxxxx/ That means this patch "selftests/bpf: Handle EAGAIN in bpf_tcp_ca" can be dropped now. > > > > > > > > And here's another reason. I want to move these > > > > > > > > functions from > > > > > > > > bpf_tcp_ca into network_helpers as public ones (patch > > > > > > > > 4), which > > > > > > > > can > > > > > > > > be > > > > > > > > used by both bpf_tcp_ca and MPTCP BPF sched tests. So > > > > > > > > we must > > > > > > > > add > > > > > > > > this > > > > > > > > fix to the public ones too. > > > > > > > > Lets understand why mptcp sched test hits EAGAIN first. The patches "export send_byte and send_recv_data into network_helpers" have been sent to bpf-next, please review them when you have time: https://patchwork.kernel.org/project/netdevbpf/cover/cover.1712039441.git.tanggeliang@xxxxxxxxxx/ > > > > > > > > Maybe the commit log of this patch needs to be updated. > > > > > > > > Or I > > > > > > > > should > > > > > > > > send patches 2, 3 and 4 together to bpf-next? > > > > > > > > All selftests/bpf changes have to pass bpf CI. It is always a > > > > good > > > > idea to > > > > target bpf-next to kick off the bpf CI to bar any surprise > > > > later > > > > when > > > > it got > > > > merged into bpf-next. > > > > > > > > Unrelated, since we are in the mptcp bpf test, one thing needs > > > > to > > > > fix > > > > in > > > > test_mptcpify(). The mptcpify test is upgrading a IPPROTO_TCP > > > > socket > > > > to > > > > IPPROTO_MPTCP socket. This could break other tests when the > > > > test_progs is run in > > > > parallel (test_progs "-j" which fork() to run prog_tests/* in > > > > parallel). It > > > > could unexpectedly upgrade the tcp socket of another test. One > > > > option > > > > is to > > > > limit the upgrade by checking the pid in the mptcpify.c bpf > > > > prog. I just sent a patch named "selftests/bpf: Add pid limit for mptcpify prog" to fix this and added your "suggested-by" tag in it. Thanks, -Geliang > > > > > > > >