Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 12, 2020 at 8:41 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 06/08/20 00:01, Jim Mattson wrote:
> >>> but perhaps I'm missing something obvious.
> >> Not necessarily obvious, but I can think of a rather contrived example
> >> where the sync heuristics break down. If we're running nested and get
> >> migrated in the middle of a VMM setting up TSCs, it's possible that
> >> enough time will pass that we believe subsequent writes to not be of
> >> the same TSC generation.
> >
> > An example that has been biting us frequently in self-tests: migrate a
> > VM with less than a second accumulated in its TSC. At the destination,
> > the TSCs are zeroed.
>
> Yeah, good point about selftests.  But this would be about the sync
> heuristics messing up, and I don't understand how these ioctls improve
> things.

The improvement would be that userspace has final say over the TSC.
The ioctl only participates in match tracking and isn't subject to
value overrides.

>
> >> My immediate reaction was that we should just migrate the heuristics
> >> state somehow
> >
> > Yeah, I completely agree. I believe this series fixes the
> > userspace-facing issues and your suggestion would address the
> > guest-facing issues.
>
> I still don't understand how these ioctls are any better for userspace
> than migrating MSR_IA32_TSC.  The host TSC is different between source
> and destination, so the TSC offset will be different.

Indeed. We do not migrate the TSC offsets, but guest TSC values
constructed from them. Otherwise, the values are complete nonsense on
the other end of the migration.

We allow our userspace to decide the host TSC / wall clock pair at
which the vCPUs were paused. From that host TSC value we reconstruct
the guest TSCs using the offsets and migrate that info. On the
destination we grab another host TSC / clock pair and recalculate
guest TSC offsets, which we then pass to KVM_SET_TSC_OFFSET. This is
desirable over a per-vCPU read of MSR_IA32_TSC because we've
effectively recorded all TSCs at the exact same moment in time.
Otherwise, we inadvertently add skew between guest TSCs by reading
each vCPU at different times. It seems that the sync heuristics
address this issue along with any guest TSC coordination.

Not only that, depending on the heuristics to detect a sync from
userspace gets a bit tricky if we (KVM) are running nested. Couldn't
more than a second of time elapse between successive KVM_SET_MSRS when
running in L1 if L0 decides to pause our vCPUs (suspend/resume,
migration)? It seems to me that in this case we will fail to detect a
sync condition and configure the L2 TSCs to be out-of-phase.

Setting the guest TSCs by offset doesn't have these complications.
Even if L0 were to pause L1 for some inordinate amount of time, the
relation of L1 -> L2 TSC is never disturbed.

>
> I am all for improving migration of TSC state, but I think we should do
> it right, so we should migrate a host clock / TSC pair: then the
> destination host can use TSC frequency and host clock to compute the new
> TSC value.  In fact, such a pair is exactly the data that the syncing
> heuristics track for each "generation" of syncing.
>
> To migrate the synchronization state, instead, we only need to migrate
> the "already_matched" (vcpu->arch.this_tsc_generation ==
> kvm->arch.cur_tsc_generation) state.
>
> Putting all of this together, it would be something like this:
>
> - a VM-wide KVM_CLOCK/KVM_SET_CLOCK needs to migrate
> vcpu->arch.cur_tsc_{nsec,write} in addition to the current kvmclock
> value (it's unrelated, but I don't think it's worth creating a new
> ioctl).  A new flag is set if these fields are set in the struct.  If
> the flag is set, KVM_SET_CLOCK copies the fields back, bumps the
> generation and clears kvm->arch.nr_vcpus_matched_tsc.
>
> - a VCPU-wide KVM_GET_TSC_INFO returns a host clock / guest TSC pair
> plus the "already matched" state.  KVM_SET_TSC_INFO will only use the
> host clock / TSC pair if "already matched" is false, to compute the
> destination-side TSC offset but not otherwise doing anything with it; or
> if "already matched" is true, it will ignore the pair, compute the TSC
> offset from the data in kvm->arch, and update
> kvm->arch.nr_vcpus_matched_tsc.
>

It seems to me that a per-vCPU ioctl (like you've suggested above) is
necessary to uphold the guest-facing side of our sync scheme,
regardless of what we do on the userspace-facing side.

> Paolo
>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux