Re: [PATCH 2/3] KVM: arm64: selftests: add arch_timer_edge_cases

Ricardo Koller <ricarkol@xxxxxxxxxx> · Fri, 4 Mar 2022 11:01:40 -0800

On Fri, Mar 04, 2022 at 07:52:00AM +0000, Marc Zyngier wrote:
> On Wed, 02 Mar 2022 21:25:28 +0000,
> Ricardo Koller <ricarkol@xxxxxxxxxx> wrote:
> > 
> > Hi Oliver,
> > 
> > On Wed, Mar 02, 2022 at 08:45:53PM +0000, Oliver Upton wrote:
> > > Hi Ricardo,
> > > 
> > > On Wed, Mar 02, 2022 at 09:21:43AM -0800, Ricardo Koller wrote:
> > > > Add an arch_timer edge-cases selftest. For now, just add some basic
> > > > sanity checks, and some stress conditions (like waiting for the timers
> > > > while re-scheduling the vcpu). The next commit will add the actual edge
> > > > case tests.
> > > > 
> > > > This test fails without a867e9d0cc1 "KVM: arm64: Don't miss pending
> > > > interrupts for suspended vCPU".
> > > > 
> > > 
> > > Testing timer correctness is extremely challenging to do without
> > > inherent flakiness. I have some concerns about the expectations that a
> > > timer IRQ should fire in a given amount of time, as it is possible to
> > > flake for any number of benign reasons (such as high CPU load in the
> > > host).
> > > 
> > > While the architecture may suggest that the timer should fire as soon as
> > > CVAL is met:
> > > 
> > >   TimerConditionMet = (((Counter[63:0] – Offset[63:0])[63:0] - CompareValue[63:0]) >= 0)
> > > 
> > > However, the architecture is extremely imprecise as to when an interrupt
> > > should be taken:
> > > 
> > >   In the absence of a specific requirement to take an interrupt, the
> > >   architecture only requires that unmasked pending interrupts are taken
> > >   in finite time. [DDI0487G.b D1.13.4 "Prioritization and recognition of
> > >   interrupts"]
> > > 
> > > It seems to me that the only thing we can positively assert is that a
> > > timer interrupt should never be taken early. Now -- I agree that there
> > > is value in testing that the interrupt be taken in bounded time, but its
> > > hard to pick a good value for it.
> > 
> > Yes, a timer that never fires passes the test, but it's not very useful.
> > 
> > I saw delay issues immediately after testing with QEMU. I've been played
> > with values and found that 1ms is enough for all of my runs (QEMU
> > included) to pass (10000 iterations concurrently on all my 64 cpus). I
> > just checked in the fast model and 1ms seems to be enough as well
> > (although I didn't check for so long).
> > 
> > 	/* 1ms sounds a bit excessive, but QEMU-TCG is slow. */
> > 	#define TEST_MARGIN_US			1000ULL
> 
> I'm not sure that's even realistic. I can arbitrary delay those by
> oversubscribing the system.
> 
> > 
> > > 
> > > Perhaps documenting the possibility of flakes in the test is warranted,
> > > along with some knobs to adjust these values for any particularly bad
> > > implementation.
> > 
> > What about having a cmdline arg to enable those tests?
> 
> How is that handled in kvm-unit-tests? I'd rather avoid special
> arguments, as they will never be set. All tests should run by default.

There's this latency test that checks that the latency for a 10ms timer
is not delayed by more than 10ms (after the first 10ms):

	report(test_cval_10msec(info), "latency within 10 ms");

Just to be safe I will just remove the checks for timers firing before
some margin (not even with a special argument).

Thanks,
Ricardo

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.