On Mon, Dec 4, 2023 at 7:32 PM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 04, 2023 at 10:42:24AM +0800, Haibo Xu wrote: > > On Fri, Sep 15, 2023 at 2:21 PM Haibo Xu <xiaobo55x@xxxxxxxxx> wrote: > > > > > > On Thu, Sep 14, 2023 at 5:52 PM Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, Sep 14, 2023 at 09:37:03AM +0800, Haibo Xu wrote: > > > > > Add a KVM selftests to validate the Sstc timer functionality. > > > > > The test was ported from arm64 arch timer test. > > > > > > > > I just tried this test out. Running it over and over again on QEMU I see > > > > it works sometimes, but it frequently fails with the > > > > GUEST_ASSERT_EQ(config_iter + 1, irq_iter) assert and at least once I > > > > also saw the __GUEST_ASSERT(xcnt >= cmp) assert. > > > > > > > > > > Good catch! > > > > > > I can also reproduce this issue and it is a common problem for both > > > arm64 and riscv because it also happens in a arm64 Qemu VM. > > > > > > It seems like a synchronization issue between host and guest shared > > > variables. Will double check the test code. > > > > > > > Thanks, > > > > drew > > > > Hi Andrew, > > > > After several rounds of regression testing, some findings: > > 1. The intermittent failure also happened on ARM64 Qemu VM, and even > > in the initial arch_timer commit(4959d8650e9f4). > > 2. it didn't happen on a ARM64 HW(but a different failure occured > > during stress test) > > 3. The failure have a close relationship with > > TIMER_TEST_ERR_MARGIN_US(default 100), and after increasing > > the macro to 300, the failure couldn't reproduced in 1000 loops > > stress test in RISC-V Qemu VM > > > > So my suggestion is we can expose the TIMER_TEST_ERR_MARGIN_US > > parameter as an arch_timer test arg parameter > > and tune it based on a specific test environment. > > > > What's your opinion? > > The concept of "timeout for an interrupt to arrive" is always going to > leave us exposed to random failures. Your suggestion of making the > timeout user configurable is probably the best we can do. I would > suggest also adding more descriptive failure text and a hint about > trying to adjust the timeout. > > Or, one thing we do in kvm-unit-tests, is to reduce typical delays while > allowing expected delays to be longer by looping over a shorter delay and > a non-fatal check, i.e. > > pass = false; > for (i = 0; i < 10; i++) { > udelay(100); > if (check(...)) { > pass = true; > break; > } > } > assert(pass); > > We could try that approach here too. > > Thanks, > drew Thanks for the feedback, I will send out patch set v4 soon!