On Wed, Aug 5, 2020 at 1:46 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 05/08/20 18:06, Oliver Upton wrote: > > On Tue, Jul 28, 2020 at 11:33 AM Oliver Upton <oupton@xxxxxxxxxx> wrote: > >> > >> On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@xxxxxxxxxx> wrote: > >>> > >>> To date, VMMs have typically restored the guest's TSCs by value using > >>> the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by > >>> value introduces some challenges with synchronization as the TSCs > >>> continue to tick throughout the restoration process. As such, KVM has > >>> some heuristics around TSC writes to infer whether or not the guest or > >>> host is attempting to synchronize the TSCs. > >>> > >>> Instead of guessing at the intentions of a VMM, it'd be better to > >>> provide an interface that allows for explicit synchronization of the > >>> guest's TSCs. To that end, this series introduces the > >>> KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to > >>> userspace. > >>> > >>> v2 => v3: > >>> - Mark kvm_write_tsc_offset() as static (whoops) > >>> > >>> v1 => v2: > >>> - Added clarification to the documentation of KVM_SET_TSC_OFFSET to > >>> indicate that it can be used instead of an IA32_TSC MSR restore > >>> through KVM_SET_MSRS > >>> - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC > >>> synchronization heuristics, thereby enabling the KVM masterclock when > >>> all vCPUs are in phase. > >>> > >>> Oliver Upton (4): > >>> kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc > >>> kvm: vmx: check tsc offsetting with nested_cpu_has() > >>> selftests: kvm: use a helper function for reading cpuid > >>> selftests: kvm: introduce tsc_offset_test > >>> > >>> Peter Hornyack (1): > >>> kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls > >>> > >>> Documentation/virt/kvm/api.rst | 31 ++ > >>> arch/x86/include/asm/kvm_host.h | 1 + > >>> arch/x86/kvm/vmx/vmx.c | 2 +- > >>> arch/x86/kvm/x86.c | 147 ++++--- > >>> include/uapi/linux/kvm.h | 5 + > >>> tools/testing/selftests/kvm/.gitignore | 1 + > >>> tools/testing/selftests/kvm/Makefile | 1 + > >>> .../testing/selftests/kvm/include/test_util.h | 3 + > >>> .../selftests/kvm/include/x86_64/processor.h | 15 + > >>> .../selftests/kvm/include/x86_64/svm_util.h | 10 +- > >>> .../selftests/kvm/include/x86_64/vmx.h | 9 + > >>> tools/testing/selftests/kvm/lib/kvm_util.c | 1 + > >>> tools/testing/selftests/kvm/lib/x86_64/vmx.c | 11 + > >>> .../selftests/kvm/x86_64/tsc_offset_test.c | 362 ++++++++++++++++++ > >>> 14 files changed, 550 insertions(+), 49 deletions(-) > >>> create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c > >>> > >>> -- > >>> 2.28.0.rc0.142.g3c755180ce-goog > >>> > >> > >> Ping :) > > > > Ping > > Hi Oliver, > > I saw these on vacation and decided I would delay them to 5.10. However > they are definitely on my list. > Hope you enjoyed vacation! > I have one possibly very stupid question just by looking at the cover > letter: now that you've "fixed KVM_SET_TSC_OFFSET to participate in the > existing TSC synchronization heuristics" what makes it still not > "guessing the intentions of a VMM"? (No snark intended, just quoting > the parts that puzzled me a bit). Great point. I'd still posit that this series disambiguates userspace control/synchronization of the TSCs. If a VMM wants the TSCs to be in sync, it can write identical offsets to all vCPUs That said, participation in TSC synchronization is presently necessary due to issues migrating a guest that was in the middle of a TSC sync. In doing so, we still accomplish synchronization on the other end of migration with a well-timed mix of host and guest writes. > > My immediate reaction was that we should just migrate the heuristics > state somehow Yeah, I completely agree. I believe this series fixes the userspace-facing issues and your suggestion would address the guest-facing issues. > but perhaps I'm missing something obvious. Not necessarily obvious, but I can think of a rather contrived example where the sync heuristics break down. If we're running nested and get migrated in the middle of a VMM setting up TSCs, it's possible that enough time will pass that we believe subsequent writes to not be of the same TSC generation. > Paolo >