On 2/1/24 22:25, Dmitry Safonov wrote: > Hi Jakub, > > On 2/1/24 21:21, Jakub Kicinski wrote: >> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: >>> Please, let me know if there will be other issues with tcp-ao tests :) >>> >>> Going to work on tracepoints and some other TCP-AO stuff for net-next. >> >> Since you're being nice and helpful I figured I'll try testing TCP-AO >> with debug options enabled :) (kernel/configs/debug.config and >> kernel/configs/x86_debug.config included), > > Haha :) > >> that slows things down >> and causes a bit of flakiness in unsigned-md5-* tests: >> >> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao >> >> This has links to outputs: >> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 >> >> If it's a timing thing - FWIW we started exporting >> KSFT_MACHINE_SLOW=yes on the slow runners. > > I think, I know what happens here: > > # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4 > => 6 > # ok 9 AO server (AO_REQUIRED): unsigned client > # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired > increased 1 => 2 > # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter > netns_ao_good was not expected to increase 7 => 8 > > for each of tests the server listens at a new port, but re-uses the same > namespaces+veth. If the node/machine is quite slow, I guess a segment > might have been retransmitted and the test that initiated it had already > finished. > And as result, the per-namespace counters are incremented, which makes > the test fail (IOW, the test expects all segments in ns being dropped). > > So, I should do one of the options: > > 1. relax per-namespace checks (the per-socket and per-key counters are > checked) > 2. unshare(net) + veth setup for each test > 3. split the selftest on smaller ones (as they create new net-ns in > initialization) Actually, I think there may be an easier fix: 4. Make sure that client close()s TCP-AO first, making it twsk. And also make sure that net-ns counters read post server's close(). Will do this, let's see if this fixes the flakiness on the netdev bot :) > I'd probably prefer (2), albeit it slows down that slow machine even > more, but I don't think creating 2 net-ns + veth pair per each test > would add a lot more overhead even on some rpi board. But let's see, > maybe I'll just go with (1) as that's really easy. > > I'll cook a patch this week. Thanks, Dmitry