Hi Sean,
On 6/11/24 3:03 PM, Sean Christopherson wrote:
On Tue, Jun 11, 2024, Reinette Chatre wrote:
Heh, the docs are stale. KVM hasn't returned an error since commit cc578287e322
("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
again predates selftests by many years (6+ in this case). To make our lives
much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
throw in a GUEST_ASSERT(thz_khz) in udelay()?
I added the GUEST_ASSERT() but I find that it comes with a caveat (more below).
I plan an assert as below that would end up testing the same as what a
GUEST_ASSERT(tsc_khz) would accomplish:
r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
TEST_ASSERT(r > 0, "KVM_GET_TSC_KHZ did not provide a valid TSC freq.");
tsc_khz = r;
Caveat is: the additional GUEST_ASSERT() requires all tests that use udelay() in
the guest to now subtly be required to implement a ucall (UCALL_ABORT) handler.
I did a crude grep check to see and of the 69 x86_64 tests there are 47 that do
indeed have a UCALL_ABORT handler. If any of the other use udelay() then the
GUEST_ASSERT() will of course still trigger, but will be quite cryptic. For
example, "Unhandled exception '0xe' at guest RIP '0x0'" vs. "tsc_khz".
Yeah, we really need to add a bit more infrastructure, there is way, way, waaaay
too much boilerplate needed just to run a guest and handle the basic ucalls.
Reporting guest asserts should Just Work for 99.9% of tests.
Anyways, is it any less cryptic if ucall_assert() forces a failure? I forget if
the problem with an unhandled GUEST_ASSERT() is that the test re-enters the guest,
or if it's something else.
I don't think we need a perfect solution _now_, as tsc_khz really should never
be 0, just something to not make life completely miserable for future developers.
diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
index 42151e571953..1116bce5cdbf 100644
--- a/tools/testing/selftests/kvm/lib/ucall_common.c
+++ b/tools/testing/selftests/kvm/lib/ucall_common.c
@@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
+ ucall_arch_do_ucall(GUEST_UCALL_FAILED);
+
ucall_free(uc);
}
Thank you very much.
With your suggestion an example unhandled GUEST_ASSERT() looks as below.
It does not guide on what (beyond vcpu_run()) triggered the assert but it
indeed provides a hint that adding ucall handling may be needed.
[SNIP]
==== Test Assertion Failure ====
lib/ucall_common.c:154: addr != (void *)GUEST_UCALL_FAILED
pid=16002 tid=16002 errno=4 - Interrupted system call
1 0x000000000040da91: get_ucall at ucall_common.c:154
2 0x0000000000410142: assert_on_unhandled_exception at processor.c:614
3 0x0000000000406590: _vcpu_run at kvm_util.c:1718
4 (inlined by) vcpu_run at kvm_util.c:1729
5 0x00000000004026cf: test_apic_bus_clock at apic_bus_clock_test.c:115
6 (inlined by) run_apic_bus_clock_test at apic_bus_clock_test.c:164
7 (inlined by) main at apic_bus_clock_test.c:201
8 0x00007fb1d8429d8f: ?? ??:0
9 0x00007fb1d8429e3f: ?? ??:0
10 0x00000000004027a4: _start at ??:?
Guest failed to allocate ucall struct
[SNIP]
Is this acceptable? I can add a new preparatory patch with your
suggestion that has as its goal to provide slightly better error message
when there is an unhandled ucall.
Reinette