KVM's current means of saving/restoring system counters is plagued with temporal issues. On x86, we migrate the guest's system counter by-value through the respective guest's IA32_TSC value. Restoring system counters by-value is brittle as the state is not idempotent: the host system counter is still oscillating between the attempted save and restore. Furthermore, VMMs may wish to transparently live migrate guest VMs, meaning that they include the elapsed time due to live migration blackout in the guest system counter view. The VMM thread could be preempted for any number of reasons (scheduler, L0 hypervisor under nested) between the time that it calculates the desired guest counter value and when KVM actually sets this counter state. Despite the value-based interface that we present to userspace, KVM actually has idempotent guest controls by way of the TSC offset. We can avoid all of the issues associated with a value-based interface by abstracting these offset controls in a new device attribute. This series introduces new vCPU device attributes to provide userspace access to the vCPU's system counter offset. Patch 1 addresses a possible race in KVM_GET_CLOCK where use_master_clock is read outside of the pvclock_gtod_sync_lock. Patch 2 is a cleanup, moving the implementation of KVM_{GET,SET}_CLOCK into helper methods. Patch 3 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK ioctls to provide userspace with a (host_tsc, realtime) instant. This is essential for a VMM to perform precise migration of the guest's system counters. Patches 4-5 are some preparatory changes for exposing the TSC offset to userspace. Patch 6 provides a vCPU attribute to provide userspace access to the TSC offset. This series was tested with the new KVM selftests for the KVM clock and system counter offset controls on Haswell hardware. Note that these tests are mailed as a separate series due to the dependencies in both x86 and arm64. Applies cleanly to kvm/queue. Parent commit: a3e0b8bd99ab ("KVM: MMU: change tracepoints arguments to kvm_page_fault") v6: https://lore.kernel.org/r/20210804085819.846610-1-oupton@xxxxxxxxxx v6 -> v7: - Separated x86, arm64, and selftests into different series - Rebased on top of kvm/queue Oliver Upton (6): KVM: x86: Fix potential race in KVM_GET_CLOCK KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK KVM: x86: Take the pvclock sync lock behind the tsc_write_lock KVM: x86: Refactor tsc synchronization code KVM: x86: Expose TSC offset controls to userspace Documentation/virt/kvm/api.rst | 42 ++- Documentation/virt/kvm/devices/vcpu.rst | 57 ++++ Documentation/virt/kvm/locking.rst | 11 + arch/x86/include/asm/kvm_host.h | 4 + arch/x86/include/uapi/asm/kvm.h | 4 + arch/x86/kvm/x86.c | 362 +++++++++++++++++------- include/uapi/linux/kvm.h | 7 +- 7 files changed, 378 insertions(+), 109 deletions(-) -- 2.33.0.rc1.237.g0d66db33f3-goog