Re: [PATCH net 4/7] selftests/net: Add mixed select()+polling mode to TCP-AO tests

Paolo Abeni <pabeni@xxxxxxxxxx> · Tue, 18 Mar 2025 15:47:13 +0100

On 3/12/25 10:10 AM, Dmitry Safonov wrote:
> Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and
> TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one,
> TEST_RETRANSMIT_SEC is used for operations that are expected to succeed
> in order for a test to pass. It is usually not consumed and exists only
> to avoid indefinite test run if the operation didn't complete.
> The second one, TEST_RETRANSMIT_SEC exists for the tests that checking
> operations, that are expected to fail/timeout. It is shorter as it is
> fully consumed, with an expectation that if operation didn't succeed
> during that period, it will timeout. And the related test that expects
> the timeout is passing. The actual operation failure is then
> cross-verified by other means like counters checks.
> 
> The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact
> initial TCP timeout. So, in case the initial segment gets lost (quite
> unlikely on local veth interface between two net namespaces, yet happens
> in slow VMs), the retransmission never happens and as a result, the test
> is not actually testing the functionality. Which in the end fails
> counters checks.
> 
> As I want tcp_ao selftests to be fast and finishing in a reasonable
> amount of time on manual run, I didn't consider increasing
> TEST_RETRANSMIT_SEC.
> 
> Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever
> to make the initial TCP timeout shorter. But as it's not a socket bpf
> attached thing, but sock_ops (attaches to cgroups), the selftests would
> have to use libbpf, which I wanted to avoid if not absolutely required.
> 
> Instead, use a mixed select() and counters polling mode with the longer
> TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It
> actually not only allows losing segments and succeeding after
> the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes
> the tests expecting timeout/failure pass faster.
> 
> The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny
> "wrong snd id", which checks for no key on SYN-ACK for which there is no
> counter in the kernel (see tcp_make_synack()). Yet it can be speed up
> by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).
> 
> Reported-by: Jakub Kicinski <kuba@xxxxxxxxxx>
> Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@xxxxxxxxxx/
> Signed-off-by: Dmitry Safonov <0x7f454c46@xxxxxxxxx>

Could you please provide a suitable Fixes tag here?

Also given a good slices of the patches here are refactor, I think the
whole series could land on net-next - so that we avoid putting a bit of
stuff in the last 6.14-net PR - WDYT?

Thanks,

Paolo