On Fri, Mar 04, 2022 at 07:52:00AM +0000, Marc Zyngier wrote: > On Wed, 02 Mar 2022 21:25:28 +0000, > Ricardo Koller <ricarkol@xxxxxxxxxx> wrote: > > > > Hi Oliver, > > > > On Wed, Mar 02, 2022 at 08:45:53PM +0000, Oliver Upton wrote: > > > Hi Ricardo, > > > > > > On Wed, Mar 02, 2022 at 09:21:43AM -0800, Ricardo Koller wrote: > > > > Add an arch_timer edge-cases selftest. For now, just add some basic > > > > sanity checks, and some stress conditions (like waiting for the timers > > > > while re-scheduling the vcpu). The next commit will add the actual edge > > > > case tests. > > > > > > > > This test fails without a867e9d0cc1 "KVM: arm64: Don't miss pending > > > > interrupts for suspended vCPU". > > > > > > > > > > Testing timer correctness is extremely challenging to do without > > > inherent flakiness. I have some concerns about the expectations that a > > > timer IRQ should fire in a given amount of time, as it is possible to > > > flake for any number of benign reasons (such as high CPU load in the > > > host). > > > > > > While the architecture may suggest that the timer should fire as soon as > > > CVAL is met: > > > > > > TimerConditionMet = (((Counter[63:0] – Offset[63:0])[63:0] - CompareValue[63:0]) >= 0) > > > > > > However, the architecture is extremely imprecise as to when an interrupt > > > should be taken: > > > > > > In the absence of a specific requirement to take an interrupt, the > > > architecture only requires that unmasked pending interrupts are taken > > > in finite time. [DDI0487G.b D1.13.4 "Prioritization and recognition of > > > interrupts"] > > > > > > It seems to me that the only thing we can positively assert is that a > > > timer interrupt should never be taken early. Now -- I agree that there > > > is value in testing that the interrupt be taken in bounded time, but its > > > hard to pick a good value for it. > > > > Yes, a timer that never fires passes the test, but it's not very useful. > > > > I saw delay issues immediately after testing with QEMU. I've been played > > with values and found that 1ms is enough for all of my runs (QEMU > > included) to pass (10000 iterations concurrently on all my 64 cpus). I > > just checked in the fast model and 1ms seems to be enough as well > > (although I didn't check for so long). > > > > /* 1ms sounds a bit excessive, but QEMU-TCG is slow. */ > > #define TEST_MARGIN_US 1000ULL > > I'm not sure that's even realistic. I can arbitrary delay those by > oversubscribing the system. > > > > > > > > > Perhaps documenting the possibility of flakes in the test is warranted, > > > along with some knobs to adjust these values for any particularly bad > > > implementation. > > > > What about having a cmdline arg to enable those tests? > > How is that handled in kvm-unit-tests? I'd rather avoid special > arguments, as they will never be set. All tests should run by default. There's this latency test that checks that the latency for a 10ms timer is not delayed by more than 10ms (after the first 10ms): report(test_cval_10msec(info), "latency within 10 ms"); Just to be safe I will just remove the checks for timers firing before some margin (not even with a special argument). Thanks, Ricardo > > M. > > -- > Without deviation from the norm, progress is not possible.