On Thu, 2022-11-03 at 00:46 +0000, Sean Christopherson wrote: > On Wed, Nov 02, 2022, Robert Hoo wrote: > > vDSO getcpu() has been in Kernel since 2.6.19, which we can assume > > generally available. > > Use vDSO getcpu() to reduce the overhead, so that vcpu thread > > stalls less > > therefore can have more odds to hit the race condition. > > > > Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of > > sched_getcpu() in rseq_test") > > Signed-off-by: Robert Hoo <robert.hu@xxxxxxxxxxxxxxx> > > --- > > ... > > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[]) > > * across the seq_cnt reads. > > */ > > smp_rmb(); > > - sys_getcpu(&cpu); > > + vdso_getcpu(&cpu, NULL, NULL); > > rseq_cpu = rseq_current_cpu_raw(); > > smp_rmb(); > > } while (snapshot != atomic_read(&seq_cnt)); > > Something seems off here. Half of the iterations in the migration > thread have a > delay of 5+us, which should be more than enough time to complete a > few getcpu() > syscalls to stabilize the CPU. > The migration thread delay time is for the whole vcpu thread loop, not just vcpu_run(), I think. for (i = 0; !done; i++) { vcpu_run(vcpu); TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC, "Guest failed?"); ... do { ... vdso_getcpu(&cpu, NULL, NULL); rseq_cpu = rseq_current_cpu_raw(); ... } while (snapshot != atomic_read(&seq_cnt)); ... } > Has anyone tried to figure out why the vCPU thread is apparently > running slow? > E.g. is KVM_RUN itself taking a long time, is the task not getting > scheduled in, > etc... I can see how using vDSO would make the vCPU more efficient, > but I'm > curious as to why that's a problem in the first place. Yes, it should be the first-place problem. But firstly, it's the whole for(){} loop taking more time than before, that increment can be attributed to those key sub-calls, e.g. vcpu_run(), get_ucall(), getcpu(), rseq_current_cpu_raw(). Though vcpu_run() should have first attention, reduce others' time spending also helps. BTW, I find that x86 get_ucall() have a more vcpu ioctl (vcpu_regs_get()) than aarch64's, this perhaps explains a little why the for(){} loop is heavier than aarch64. uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc) @@ -43,12 +95,14 @@ if (uc) memset(uc, 0, sizeof(*uc)); - if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) { - struct kvm_regs regs; - - vcpu_regs_get(vcpu, ®s); - memcpy(&ucall, addr_gva2hva(vcpu->vm, (vm_vaddr_t)regs.rdi), - sizeof(ucall)); + if (run->exit_reason == KVM_EXIT_MMIO && + run->mmio.phys_addr == (uint64_t)ucall_exit_mmio_addr) { + vm_vaddr_t gva; + + TEST_ASSERT(run->mmio.is_write && run->mmio.len == 8, + "Unexpected ucall exit mmio address access"); + memcpy(&gva, run->mmio.data, sizeof(gva)); + memcpy(&ucall, addr_gva2hva(vcpu->vm, gva), sizeof(ucall)); > > Anyways, assuming there's no underlying problem that can be solved, > the easier > solution is to just bump the delay in the migration thread. As per > its gigantic > comment, the original bug reproduced with up to 500us delays, so > bumping the min > delay to e.g. 5us is acceptable. If that doesn't guarantee the vCPU > meets its > quota, then something else is definitely going on.