On 19/08/2019 17:40, Marc Zyngier wrote: > Hi Steven, > > On 19/08/2019 15:04, Steven Price wrote: >> Implement the service call for configuring a shared structure between a >> VCPU and the hypervisor in which the hypervisor can write the time >> stolen from the VCPU's execution time by other tasks on the host. >> >> The hypervisor allocates memory which is placed at an IPA chosen by user >> space. The hypervisor then uses WRITE_ONCE() to update the shared >> structure ensuring single copy atomicity of the 64-bit unsigned value >> that reports stolen time in nanoseconds. >> >> Whenever stolen time is enabled by the guest, the stolen time counter is >> reset. >> >> The stolen time itself is retrieved from the sched_info structure >> maintained by the Linux scheduler code. We enable SCHEDSTATS when >> selecting KVM Kconfig to ensure this value is meaningful. >> >> Signed-off-by: Steven Price <steven.price@xxxxxxx> >> --- >> arch/arm/include/asm/kvm_host.h | 15 +++++++ >> arch/arm64/include/asm/kvm_host.h | 16 ++++++- >> arch/arm64/kvm/Kconfig | 1 + >> include/linux/kvm_types.h | 2 + >> virt/kvm/arm/arm.c | 19 +++++++++ >> virt/kvm/arm/hypercalls.c | 3 ++ >> virt/kvm/arm/pvtime.c | 71 +++++++++++++++++++++++++++++++ >> 7 files changed, 126 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h >> index 369b5d2d54bf..14d61a84c270 100644 >> --- a/arch/arm/include/asm/kvm_host.h >> +++ b/arch/arm/include/asm/kvm_host.h >> @@ -39,6 +39,7 @@ >> KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >> #define KVM_REQ_IRQ_PENDING KVM_ARCH_REQ(1) >> #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) >> +#define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) >> >> DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); >> >> @@ -77,6 +78,12 @@ struct kvm_arch { >> >> /* Mandated version of PSCI */ >> u32 psci_version; >> + >> + struct kvm_arch_pvtime { >> + struct gfn_to_hva_cache st_ghc; >> + gpa_t st_base; >> + u64 st_size; >> + } pvtime; > > It'd be good if we could avoid having this in the 32bit vcpu structure, > given that it serves no real purpose (other than being able to compile > things). Good point - I think I can fix that with a couple more static inline functions... It's a little tricky due to header file include order, but I think I can make it work. [...] >> +int kvm_update_stolen_time(struct kvm_vcpu *vcpu, bool init) >> +{ >> + struct kvm *kvm = vcpu->kvm; >> + struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime; >> + u64 steal; >> + u64 steal_le; >> + u64 offset; >> + int idx; >> + const int stride = sizeof(struct pvclock_vcpu_stolen_time); >> + >> + if (pvtime->st_base == GPA_INVALID) >> + return -ENOTSUPP; >> + >> + /* Let's do the local bookkeeping */ >> + steal = vcpu->arch.steal.steal; >> + steal += current->sched_info.run_delay - vcpu->arch.steal.last_steal; >> + vcpu->arch.steal.last_steal = current->sched_info.run_delay; >> + vcpu->arch.steal.steal = steal; >> + >> + offset = stride * kvm_vcpu_get_idx(vcpu); >> + >> + if (unlikely(offset + stride > pvtime->st_size)) >> + return -EINVAL; >> + >> + steal_le = cpu_to_le64(steal); >> + pagefault_disable(); > > What's the reason for doing a pagefault_disable()? What I'd expect is > for the userspace page to be faulted in and written to, and doing a > pagefault_disable() seems to be going against this idea. Umm... this is me screwing up the locking... The current code is very confused about which locks should/can be held when kvm_update_stolen_time() is called. vcpu_req_record_steal() explicitly takes the kvm->srcu read lock - which is then taken again here. But kvm_hypercall_stolen_time doesn't hold any lock. And obviously at some point in time I expected this to be called in atomic context... In general the page is likely to be faulted in (as a guest which is using stolen time is surely looking at the numbers there). But there's no need for the pagefault_disable(). It also shouldn't be the callers responsibility to hold kvm->srcu. Steve