On Tue, 18 Feb 2025 08:45:55 -0800 Kevin Krakauer wrote: > GRO tests are timing dependent and can easily flake. This is partially > mitigated in gro.sh by giving each subtest 3 chances to pass. However, > this still flakes on some machines. To be clear - are you running this over veth or a real device? > Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely > to immediately flush. This already happened in setup_loopback.sh, but > wasn't added to setup_veth.sh. This accounts for most of the reduction > in flakiness. That doesn't make intuitive sense to me. If we already defer flushes why do we need to also defer IRQs? > We also increase the number of chances for success from 3 to 6. > > `gro.sh -t <test>` now returns a passing/failing exit code as expected. > > gro.c:main no longer erroneously claims a test passes when running as a > server. > > Tested: Ran `gro.sh -t large` 100 times with and without this change. > Passed 100/100 with and 64/100 without. Ran inside strace to increase > flakiness. > > Signed-off-by: Kevin Krakauer <krakauer@xxxxxxxxxx> > --- > tools/testing/selftests/net/gro.c | 8 +++++--- > tools/testing/selftests/net/gro.sh | 5 +++-- > tools/testing/selftests/net/setup_veth.sh | 1 + > 3 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c > index b2184847e388..d5824eadea10 100644 > --- a/tools/testing/selftests/net/gro.c > +++ b/tools/testing/selftests/net/gro.c > @@ -1318,11 +1318,13 @@ int main(int argc, char **argv) > read_MAC(src_mac, smac); > read_MAC(dst_mac, dmac); > > - if (tx_socket) > + if (tx_socket) { > gro_sender(); > - else > + } else { > + /* Only the receiver exit status determines test success. */ > gro_receiver(); > + fprintf(stderr, "Gro::%s test passed.\n", testname); > + } > > - fprintf(stderr, "Gro::%s test passed.\n", testname); That seems quite separate to the stability fix? > return 0; > } > diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh > index 02c21ff4ca81..703173f8c8a9 100755 > --- a/tools/testing/selftests/net/gro.sh > +++ b/tools/testing/selftests/net/gro.sh > @@ -21,7 +21,7 @@ run_test() { > # Each test is run 3 times to deflake, because given the receive timing, > # not all packets that should coalesce will be considered in the same flow > # on every try. > - for tries in {1..3}; do > + for tries in {1..6}; do > # Actual test starts here > ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \ > 1>>log.txt & > @@ -100,5 +100,6 @@ trap cleanup EXIT > if [[ "${test}" == "all" ]]; then > run_all_tests > else > - run_test "${proto}" "${test}" > + exit_code=$(run_test "${proto}" "${test}") > + exit $exit_code Also separate from stability? Let's split the patch up into logically separate changes. > fi; > diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh > index 1f78a87f6f37..9882ad730c24 100644 > --- a/tools/testing/selftests/net/setup_veth.sh > +++ b/tools/testing/selftests/net/setup_veth.sh > @@ -12,6 +12,7 @@ setup_veth_ns() { > > [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" > echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout" > + echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs" > ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 > ip -netns "${ns_name}" link set dev "${ns_dev}" up > -- pw-bot: cr