On Mon, Mar 11, 2024, mlevitsk@xxxxxxxxxx wrote: > Hi, > > Recently I debugged a failure of this selftest and this is what is happening: > > For each vCPU this test runs the guest till it does the ucall, then it resets > all the vCPU registers to their initial values (including RIP) and runs the guest again. > I don't know if this is needed. > > What happens however is that ucall code allocates the ucall struct prior to calling the host, > and then expects the host to resume the guest, at which point the guest frees the struct. > > However since the host manually resets the guest registers, the code that frees the ucall struct > is never reached and thus the ucall struct is leaked. > > Currently ucall code has a pool of KVM_MAX_VCPUS (512) objects, thus if the test is run with more > than 256 vCPUs, the pool is exhausted and the test fails. > > So either we need to: > - add a way to manually free the ucall struct for such tests from the host side. Part of me wants to do something along these lines, as every GUEST_DONE() and failed GUEST_ASSERT() is "leaking" a ucall structure. But practically speaking, freeing a ucall structure from anywhere except the vCPU context is bound to cause more problems than it solves. > - remove the manual reset of the vCPUs register state from this test and > instead put the guest code in while(1) {} loop. Definitely this one. IIRC, the only reason I stuffed registers in the test was because I was trying to force MMU reloads. I can't think of any reason why a simple infinite loop in the guest wouldn't work. I'm pretty sure this is all that's needed? diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c index 6628dc4dda89..5f9950f41313 100644 --- a/tools/testing/selftests/kvm/max_guest_memory_test.c +++ b/tools/testing/selftests/kvm/max_guest_memory_test.c @@ -22,10 +22,12 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride) { uint64_t gpa; - for (gpa = start_gpa; gpa < end_gpa; gpa += stride) - *((volatile uint64_t *)gpa) = gpa; + for (;;) { + for (gpa = start_gpa; gpa < end_gpa; gpa += stride) + *((volatile uint64_t *)gpa) = gpa; - GUEST_DONE(); + GUEST_DONE(); + } } struct vcpu_info { @@ -64,17 +66,12 @@ static void *vcpu_worker(void *data) struct kvm_vcpu *vcpu = info->vcpu; struct kvm_vm *vm = vcpu->vm; struct kvm_sregs sregs; - struct kvm_regs regs; vcpu_args_set(vcpu, 3, info->start_gpa, info->end_gpa, vm->page_size); - - /* Snapshot regs before the first run. */ - vcpu_regs_get(vcpu, ®s); rendezvous_with_boss(); run_vcpu(vcpu); rendezvous_with_boss(); - vcpu_regs_set(vcpu, ®s); vcpu_sregs_get(vcpu, &sregs); #ifdef __x86_64__ /* Toggle CR0.WP to trigger a MMU context reset. */ > - refactor the ucall code to not rely on a fixed pool of structs, making it > possible to tolerate small memory leaks like that (I don't like this to be > honest). Heh, me neither.