[PATCH bpf-next 0/4] monitor network traffic for flaky test cases

Kui-Feng Lee <thinker.li@xxxxxxxxx> · Fri, 12 Jul 2024 22:55:48 -0700

Run tcpdump in the background for flaky test cases related to network
features.

We have some flaky test cases that are difficult to debug without
knowing what the traffic looks like. With the log printed by tcpdump,
the CI log may help developers to fix these flaky test cases.

This patch set monitors a few test cases. Recently, they have been
showing flaky behavior. If these test cases fail, they will report a
traffic log.

At the beginning and the end of a traffic log, there are additional
traffic packets used for synchronization between the test cases and
the tcpdump process. These packets consist of UDP packets sent to
127.0.0.241:4321 and ICMP unreachable messages for this
destination. For instance, the first two and the last two packets
serve as synchronization packets in the following log.

    15:04:08.586368 lo    In  IP 127.0.0.1.58904 > 127.0.0.241.4321: UDP, length 5
    15:04:08.586435 lo    In  IP 127.0.0.241 > 127.0.0.1: ICMP 127.0.0.241 udp port 4321 unreachable, length 41
    15:04:08.704526 lo    In  IP6 ::1.52053 > ::1.45070: UDP, length 8
    15:04:08.722785 lo    In  IP 127.0.0.1.51863 > 127.0.0.241.4321: UDP, length 15
    15:04:08.722856 lo    In  IP 127.0.0.241 > 127.0.0.1: ICMP 127.0.0.241 udp port 4321 unreachable, length 51 

The IP address 127.0.0.241 is used for synchronization, so the
loopback interface "lo" should be up in the network namespace where
the test is being conducted. While not ideal, this should suffice for
testing purposes.

The following block is an example that monitors the network traffic of
a test case. This test is running in the network namespace
"testns". You can pass NULL to traffic_monitor_start() if the entire
test, from traffic_monitor_start() to traffic_monitor_stop(), is
running in the same namespace.

    struct tmonitor_ctx *tmon;

    ...
    tmon = traffic_monitor_start("testns");
    ASSERT_TRUE(tmon, "traffic_monitor_start");

    ... test ...

    /* Report the traffic log only if there is one or more errors. */
    if (env.subtest_state->error_cnt)
        traffic_monitor_report(tmon);
    traffic_monitor_stop(tmon);

traffic_monitor_start() may fail, but we just ignore it since the
failure doesn't affect the following test.  This tracking feature
takes another 60ms for each test with qemu on my test environment.

Kui-Feng Lee (4):
  selftests/bpf: Add traffic monitor functions.
  selftests/bpf: Monitor traffic for tc_redirect/tc_redirect_dtime.
  selftests/bpf: Monitor traffic for sockmap_listen.
  selftests/bpf: Monitor traffic for select_reuseport.

 tools/testing/selftests/bpf/network_helpers.c | 244 ++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |   5 +
 .../bpf/prog_tests/select_reuseport.c         |   9 +
 .../selftests/bpf/prog_tests/sockmap_listen.c |  10 +
 .../selftests/bpf/prog_tests/tc_redirect.c    |   7 +
 5 files changed, 275 insertions(+)

-- 
2.34.1