On Fri, 10 Mar 2023 19:26:47 +0000, Colton Lewis <coltonlewis@xxxxxxxxxx> wrote: > > Marc Zyngier <maz@xxxxxxxxxx> writes: > > >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm# > >> ./aarch64/arch_timer -O 0xffff > >> ==== Test Assertion Failure ==== > >> aarch64/arch_timer.c:239: false > >> pid=48094 tid=48095 errno=4 - Interrupted system call > >> 1 0x4010fb: test_vcpu_run at arch_timer.c:239 > >> 2 0x42a5bf: start_thread at pthread_create.o:0 > >> 3 0x46845b: thread_start at clone.o:0 > >> Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151 > >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2 > > > The fun part is that you can see similar things without the series: > > > ==== Test Assertion Failure ==== > > aarch64/arch_timer.c:239: false > > pid=647 tid=651 errno=4 - Interrupted system call > > 1 0x00000000004026db: test_vcpu_run at arch_timer.c:239 > > 2 0x00007fffb13cedd7: ?? ??:0 > > 3 0x00007fffb1437e9b: ?? ??:0 > > Failed guest assert: config_iter + 1 == irq_iter at > > aarch64/arch_timer.c:188 > > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3 > > > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run > > without any argument in a loop. After a few iterations, it blows. > > These things are different failures. The first I've only ever found when > setting the -O option. What command did you use to trigger the second if > there were any non-default options? As I already said: "without any argument". maz@babette:~$ ./arch_timer ==== Test Assertion Failure ==== aarch64/arch_timer.c:239: false pid=1110 tid=1113 errno=4 - Interrupted system call 1 0x000000000040268b: test_vcpu_run at arch_timer.c:239 2 0x00007fff9c48edd7: ?? ??:0 3 0x00007fff9c4f7e9b: ?? ??:0 Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188 values: 3, 4; 0, vcpu 1; stage; 4; iter: 4 As simple as it gets. So either KVM is terminally buggy (quite possible), or this test is. My money is on the second one. > Another interesting finding is that I can't reproduce any problems using > ARM's emulated platform. There is a possibility these errors are > ultimately down to individual hardware quirks, but that's still worth > understanding since everyone uses hardware and not emulators. > > > The problem is that I don't understand enough of the test to make a > > judgement call. I hardly get *what* it is testing. Do you? > > My understanding is the test validates timer interrupts are occuring > when the ARM manual says they should. It sets a comparison value (cval) > at some point a few miliseconds into the future and waits for the > counter (xcnt) to be greater than or equal to the comparison value, at > which point an interrupt should fire. > > The failure I posted occurs at a line that says > > GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us); > > The counter was less than the comparison value, which implies the > interrupt fired early. Do we care? I don't know. I think it's weird that > this occurs when I set a physical offset with -O and no other time. The thing is, you say nothing about your hardware. What is it? does it have ECV? Does it have CNTPOFF? If it has any of those, does it help if you disable this support? > I've also noticed that the greater the offset I set, the greater the > difference between xcnt and cval. I think the physical offset is not > being accounted for every place it should. At the very least, that > indicates change is required in the test. > > The failure you posted occurs at a line that says > > GUEST_ASSERT_2(config_iter + 1 == irq_iter, > config_iter + 1, irq_iter); > > I gather from context that the values were unequal because an expected > interrupt never fired or was not counted. Do we care? I don't know. I > think someone should. What is the point of a test that fails randomly without anyone understanding what it is supposed to do? If that's the state of the selftests, maybe I should just go and remove the aarch64 directory. M. -- Without deviation from the norm, progress is not possible.