On Wed, 02 Mar 2022 21:25:28 +0000, Ricardo Koller <ricarkol@xxxxxxxxxx> wrote: > > Hi Oliver, > > On Wed, Mar 02, 2022 at 08:45:53PM +0000, Oliver Upton wrote: > > Hi Ricardo, > > > > On Wed, Mar 02, 2022 at 09:21:43AM -0800, Ricardo Koller wrote: > > > Add an arch_timer edge-cases selftest. For now, just add some basic > > > sanity checks, and some stress conditions (like waiting for the timers > > > while re-scheduling the vcpu). The next commit will add the actual edge > > > case tests. > > > > > > This test fails without a867e9d0cc1 "KVM: arm64: Don't miss pending > > > interrupts for suspended vCPU". > > > > > > > Testing timer correctness is extremely challenging to do without > > inherent flakiness. I have some concerns about the expectations that a > > timer IRQ should fire in a given amount of time, as it is possible to > > flake for any number of benign reasons (such as high CPU load in the > > host). > > > > While the architecture may suggest that the timer should fire as soon as > > CVAL is met: > > > > TimerConditionMet = (((Counter[63:0] – Offset[63:0])[63:0] - CompareValue[63:0]) >= 0) > > > > However, the architecture is extremely imprecise as to when an interrupt > > should be taken: > > > > In the absence of a specific requirement to take an interrupt, the > > architecture only requires that unmasked pending interrupts are taken > > in finite time. [DDI0487G.b D1.13.4 "Prioritization and recognition of > > interrupts"] > > > > It seems to me that the only thing we can positively assert is that a > > timer interrupt should never be taken early. Now -- I agree that there > > is value in testing that the interrupt be taken in bounded time, but its > > hard to pick a good value for it. > > Yes, a timer that never fires passes the test, but it's not very useful. > > I saw delay issues immediately after testing with QEMU. I've been played > with values and found that 1ms is enough for all of my runs (QEMU > included) to pass (10000 iterations concurrently on all my 64 cpus). I > just checked in the fast model and 1ms seems to be enough as well > (although I didn't check for so long). > > /* 1ms sounds a bit excessive, but QEMU-TCG is slow. */ > #define TEST_MARGIN_US 1000ULL I'm not sure that's even realistic. I can arbitrary delay those by oversubscribing the system. > > > > > Perhaps documenting the possibility of flakes in the test is warranted, > > along with some knobs to adjust these values for any particularly bad > > implementation. > > What about having a cmdline arg to enable those tests? How is that handled in kvm-unit-tests? I'd rather avoid special arguments, as they will never be set. All tests should run by default. M. -- Without deviation from the norm, progress is not possible.