On 3/12/25 10:10 AM, Dmitry Safonov wrote: > Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and > TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one, > TEST_RETRANSMIT_SEC is used for operations that are expected to succeed > in order for a test to pass. It is usually not consumed and exists only > to avoid indefinite test run if the operation didn't complete. > The second one, TEST_RETRANSMIT_SEC exists for the tests that checking > operations, that are expected to fail/timeout. It is shorter as it is > fully consumed, with an expectation that if operation didn't succeed > during that period, it will timeout. And the related test that expects > the timeout is passing. The actual operation failure is then > cross-verified by other means like counters checks. > > The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact > initial TCP timeout. So, in case the initial segment gets lost (quite > unlikely on local veth interface between two net namespaces, yet happens > in slow VMs), the retransmission never happens and as a result, the test > is not actually testing the functionality. Which in the end fails > counters checks. > > As I want tcp_ao selftests to be fast and finishing in a reasonable > amount of time on manual run, I didn't consider increasing > TEST_RETRANSMIT_SEC. > > Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever > to make the initial TCP timeout shorter. But as it's not a socket bpf > attached thing, but sock_ops (attaches to cgroups), the selftests would > have to use libbpf, which I wanted to avoid if not absolutely required. > > Instead, use a mixed select() and counters polling mode with the longer > TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It > actually not only allows losing segments and succeeding after > the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes > the tests expecting timeout/failure pass faster. > > The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny > "wrong snd id", which checks for no key on SYN-ACK for which there is no > counter in the kernel (see tcp_make_synack()). Yet it can be speed up > by poking skpair from the trace event (see trace_tcp_ao_synack_no_key). > > Reported-by: Jakub Kicinski <kuba@xxxxxxxxxx> > Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@xxxxxxxxxx/ > Signed-off-by: Dmitry Safonov <0x7f454c46@xxxxxxxxx> Could you please provide a suitable Fixes tag here? Also given a good slices of the patches here are refactor, I think the whole series could land on net-next - so that we avoid putting a bit of stuff in the last 6.14-net PR - WDYT? Thanks, Paolo