Re: [PATCH v2 0/3] selftests/net: A couple of typos fixes in key-management/rst tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/24 23:37, Dmitry Safonov wrote:
> On 2/1/24 22:25, Dmitry Safonov wrote:
>> Hi Jakub,
>>
>> On 2/1/24 21:21, Jakub Kicinski wrote:
>>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>>
>>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>>
>>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>>> with debug options enabled :) (kernel/configs/debug.config and
>>> kernel/configs/x86_debug.config included),
>>
>> Haha :)
>>
>>> that slows things down 
>>> and causes a bit of flakiness in unsigned-md5-* tests:
>>>
>>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>>
>>> This has links to outputs:
>>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>>
>>> If it's a timing thing - FWIW we started exporting
>>> KSFT_MACHINE_SLOW=yes on the slow runners.
>>
>> I think, I know what happens here:
>>
>> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
>> => 6
>> # ok 9 AO server (AO_REQUIRED): unsigned client
>> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
>> increased 1 => 2
>> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
>> netns_ao_good was not expected to increase 7 => 8
>>
>> for each of tests the server listens at a new port, but re-uses the same
>> namespaces+veth. If the node/machine is quite slow, I guess a segment
>> might have been retransmitted and the test that initiated it had already
>> finished.
>> And as result, the per-namespace counters are incremented, which makes
>> the test fail (IOW, the test expects all segments in ns being dropped).
>>
>> So, I should do one of the options:
>>
>> 1. relax per-namespace checks (the per-socket and per-key counters are
>>    checked)
>> 2. unshare(net) + veth setup for each test
>> 3. split the selftest on smaller ones (as they create new net-ns in
>>    initialization)
> 
> Actually, I think there may be an easier fix:
> 
> 4. Make sure that client close()s TCP-AO first, making it twsk.
>    And also make sure that net-ns counters read post server's close().
> 
> Will do this, let's see if this fixes the flakiness on the netdev bot :)

FWIW, I ended up with this:
https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@xxxxxxxxxx/

I reproduced the issue once, running unsigned-md5* in a loop, while in
another terminal building linux-next with all cores.
With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
so far. So, there is a chance it fixes the issue :)

Thanks,
           Dmitry





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux