Re: [PATCH 2/3] KVM: arm64: selftests: add arch_timer_edge_cases

Marc Zyngier <maz@xxxxxxxxxx> · Fri, 04 Mar 2022 07:52:00 +0000

On Wed, 02 Mar 2022 21:25:28 +0000,
Ricardo Koller <ricarkol@xxxxxxxxxx> wrote:
> 
> Hi Oliver,
> 
> On Wed, Mar 02, 2022 at 08:45:53PM +0000, Oliver Upton wrote:
> > Hi Ricardo,
> > 
> > On Wed, Mar 02, 2022 at 09:21:43AM -0800, Ricardo Koller wrote:
> > > Add an arch_timer edge-cases selftest. For now, just add some basic
> > > sanity checks, and some stress conditions (like waiting for the timers
> > > while re-scheduling the vcpu). The next commit will add the actual edge
> > > case tests.
> > > 
> > > This test fails without a867e9d0cc1 "KVM: arm64: Don't miss pending
> > > interrupts for suspended vCPU".
> > > 
> > 
> > Testing timer correctness is extremely challenging to do without
> > inherent flakiness. I have some concerns about the expectations that a
> > timer IRQ should fire in a given amount of time, as it is possible to
> > flake for any number of benign reasons (such as high CPU load in the
> > host).
> > 
> > While the architecture may suggest that the timer should fire as soon as
> > CVAL is met:
> > 
> >   TimerConditionMet = (((Counter[63:0] – Offset[63:0])[63:0] - CompareValue[63:0]) >= 0)
> > 
> > However, the architecture is extremely imprecise as to when an interrupt
> > should be taken:
> > 
> >   In the absence of a specific requirement to take an interrupt, the
> >   architecture only requires that unmasked pending interrupts are taken
> >   in finite time. [DDI0487G.b D1.13.4 "Prioritization and recognition of
> >   interrupts"]
> > 
> > It seems to me that the only thing we can positively assert is that a
> > timer interrupt should never be taken early. Now -- I agree that there
> > is value in testing that the interrupt be taken in bounded time, but its
> > hard to pick a good value for it.
> 
> Yes, a timer that never fires passes the test, but it's not very useful.
> 
> I saw delay issues immediately after testing with QEMU. I've been played
> with values and found that 1ms is enough for all of my runs (QEMU
> included) to pass (10000 iterations concurrently on all my 64 cpus). I
> just checked in the fast model and 1ms seems to be enough as well
> (although I didn't check for so long).
> 
> 	/* 1ms sounds a bit excessive, but QEMU-TCG is slow. */
> 	#define TEST_MARGIN_US			1000ULL

I'm not sure that's even realistic. I can arbitrary delay those by
oversubscribing the system.

> 
> > 
> > Perhaps documenting the possibility of flakes in the test is warranted,
> > along with some knobs to adjust these values for any particularly bad
> > implementation.
> 
> What about having a cmdline arg to enable those tests?

How is that handled in kvm-unit-tests? I'd rather avoid special
arguments, as they will never be set. All tests should run by default.

	M.

-- 
Without deviation from the norm, progress is not possible.