On Fri, 2024-02-02 at 19:31 -0500, Willem de Bruijn wrote: > Jakub Kicinski wrote: > > On Thu, 1 Feb 2024 11:21:19 -0500 Willem de Bruijn wrote: > > > This test is time sensitive. It may fail on virtual machines and for > > > debug builds. > > > > > > Continue to run in these environments to get code coverage. But > > > optionally suppress failure for timing errors (only). This is > > > controlled with environment variable KSFT_MACHINE_SLOW. > > > > > > The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL > > > as previously discussed. Because making so_txtime.c return that and > > > then making so_txtime.sh capture runs that pass that vs KSFT_FAIL > > > and pass it on added a bunch of (fragile bash) boilerplate, while the > > > result is interpreted the same as KSFT_PASS anyway. > > > > FWIW another idea that came up when talking to Matthieu - > > isolate the VMs which run time-sensitive tests to dedicated > > CPUs. Right now we kick off around 70 4 CPU VMs and let them > > battle for 72 cores. The machines don't look overloaded but > > there can be some latency spikes (CPU use diagram attached). > > > > So the idea would be to have a handful of special VMs running > > on dedicated CPUs without any CPU time competition. That could help > > with latency spikes. But we'd probably need to annotate the tests > > which need some special treatment. > > > > Probably too much work both to annotate tests and set up env, > > but I thought I'd bring it up here in case you had an opinion. > > I'm not sure whether the issue with timing in VMs is CPU affinity. > Variance may just come from expensive hypercalls, even with a > dedicated CPU. Though tests can tell. FTR, I think the CPU affinity setup is a bit too complex, and hard to reproduce for 3rd parties willing to investigate eventual future CI failures, I think the current env-variable-based approach would help with reproducibility. > There's still the debug builds, as well. I understand/hope you are investigating it? Cheers, Paolo