On 04/08/21 10:57, Oliver Upton wrote:
KVM's current means of saving/restoring system counters is plagued with temporal issues. At least on ARM64 and x86, we migrate the guest's system counter by-value through the respective guest system register values (cntvct_el0, ia32_tsc). Restoring system counters by-value is brittle as the state is not idempotent: the host system counter is still oscillating between the attempted save and restore. Furthermore, VMMs may wish to transparently live migrate guest VMs, meaning that they include the elapsed time due to live migration blackout in the guest system counter view. The VMM thread could be preempted for any number of reasons (scheduler, L0 hypervisor under nested) between the time that it calculates the desired guest counter value and when KVM actually sets this counter state. Despite the value-based interface that we present to userspace, KVM actually has idempotent guest controls by way of system counter offsets. We can avoid all of the issues associated with a value-based interface by abstracting these offset controls in new ioctls. This series introduces new vCPU device attributes to provide userspace access to the vCPU's system counter offset. Patch 1 addresses a possible race in KVM_GET_CLOCK where use_master_clock is read outside of the pvclock_gtod_sync_lock. Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK ioctls to provide userspace with a (host_tsc, realtime) instant. This is essential for a VMM to perform precise migration of the guest's system counters. Patches 3-4 are some preparatory changes for exposing the TSC offset to userspace. Patch 5 provides a vCPU attribute to provide userspace access to the TSC offset. Patches 6-7 implement a test for the new additions to KVM_{GET,SET}_CLOCK. Patch 8 fixes some assertions in the kvm device attribute helpers. Patches 9-10 implement at test for the tsc offset attribute introduced in patch 5.
The x86 parts look good, except that patch 3 is a bit redundant with my idea of altogether getting rid of the pvclock_gtod_sync_lock. That said I agree that patches 1 and 2 (and extracting kvm_vm_ioctl_get_clock and kvm_vm_ioctl_set_clock) should be done before whatever locking changes have to be done.
Time is ticking for 5.15 due to my vacation, I'll see if I have some time to look at it further next week.
I agree that arm64 can be done separately from x86. Paolo