[no subject]

**Date** **Thread**

Otherwise who is going to be alarmed when a regression happens and
things actually get slower?

> 
> > The mere fact that we don't flush TLB already necessitated hacks like the 90%
> > check, which for example doesn't work nested so another hack was needed, to
> > skip the check completely when hypervisor is detected, etc, etc.
> 
> 100% agreed here.
> 
> > And now as of 6.13, we don't propagate accessed bit when KVM zaps the SPTE at
> > all, which can happen at least in theory due to other reasons than NUMA balancing.
> > 
> > Tomorrow there will be something else that will cause KVM to zap the SPTEs,
> > and the test will fail again, and again...
> > 
> > What do you think?
> 
> What if we make the assertion user controllable?  I.e. let the user opt-out (or
> off-by-default and opt-in) via command line?  We did something similar for the
> rseq test, because the test would run far fewer iterations than expected if the
> vCPU task was migrated to CPU(s) in deep sleep states.
> 
> 	TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
> 		    "Only performed %d KVM_RUNs, task stalled too much?\n\n"
> 		    "  Try disabling deep sleep states to reduce CPU wakeup latency,\n"
> 		    "  e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0',\n"
> 		    "  or run with -u to disable this sanity check.", i);
> 
> This is quite similar, because as you say, it's impractical for the test to account
> for every possible environmental quirk.

No objections in principle, especially if sanity check is skipped by default, 
although this does sort of defeats the purpose of the check. 
I guess that the check might still be used for developers.

> 
> > > Aha!  I wonder if in the failing case, the vCPU gets migrated to a pCPU on a
> > > different node, and that causes NUMA balancing to go crazy and zap pretty much
> > > all of guest memory.  If that's what's happening, then a better solution for the
> > > NUMA balancing issue would be to affine the vCPU to a single NUMA node (or hard
> > > pin it to a single pCPU?).
> > 
> > Nope. I pinned main thread to  CPU 0 and VM thread to  CPU 1 and the problem
> > persists.  On 6.13, the only way to make the test consistently work is to
> > disable NUMA balancing.
> 
> Well that's odd.  While I'm quite curious as to what's happening, my stance is
> that enabling NUMA balancing with KVM is a terrible idea, so my vote is to sweep
> it under the rug and let the user disable the sanity check.
> 

One thing for sure, with NUMA balancing off, the test passes well (shows on average
around 100-200 idle pages) and I have run it for a long time.

Best regards,
	Maxim Levitsky