Re: [PATCH net-next] selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2024-02-02 at 19:31 -0500, Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > On Thu,  1 Feb 2024 11:21:19 -0500 Willem de Bruijn wrote:
> > > This test is time sensitive. It may fail on virtual machines and for
> > > debug builds.
> > > 
> > > Continue to run in these environments to get code coverage. But
> > > optionally suppress failure for timing errors (only). This is
> > > controlled with environment variable KSFT_MACHINE_SLOW.
> > > 
> > > The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL
> > > as previously discussed. Because making so_txtime.c return that and
> > > then making so_txtime.sh capture runs that pass that vs KSFT_FAIL
> > > and pass it on added a bunch of (fragile bash) boilerplate, while the
> > > result is interpreted the same as KSFT_PASS anyway.
> > 
> > FWIW another idea that came up when talking to Matthieu -
> > isolate the VMs which run time-sensitive tests to dedicated
> > CPUs. Right now we kick off around 70 4 CPU VMs and let them 
> > battle for 72 cores. The machines don't look overloaded but
> > there can be some latency spikes (CPU use diagram attached).
> > 
> > So the idea would be to have a handful of special VMs running 
> > on dedicated CPUs without any CPU time competition. That could help 
> > with latency spikes. But we'd probably need to annotate the tests
> > which need some special treatment.
> > 
> > Probably too much work both to annotate tests and set up env,
> > but I thought I'd bring it up here in case you had an opinion.
> 
> I'm not sure whether the issue with timing in VMs is CPU affinity.
> Variance may just come from expensive hypercalls, even with a
> dedicated CPU. Though tests can tell.

FTR, I think the CPU affinity setup is a bit too complex, and hard to
reproduce for 3rd parties willing to investigate eventual future CI
failures, I think the current env-variable-based approach would help
with reproducibility.

> There's still the debug builds, as well.

I understand/hope you are investigating it? 

Cheers,

Paolo






[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux