Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 9, 2021 at 12:05 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 09/06/21 17:11, Oliver Upton wrote:
> > Perhaps this will clarify the motivation for my approach: what if the
> > kernel wasn't the authoritative source for wall time in a system?
> > Furthermore, VMMs may wish to define their own heuristics for counter
> > migration (e.g. we only allow the counter to 'jump' by X seconds
> > during migration blackout). If a VMM tried to assert its whims on the
> > TSC state before handing it down to the kernel, we would inadvertently
> > be sampling the host counter twice again. And, anything can happen
> > between the time we assert elapsed time is within SLO and KVM
> > computing the TSC offset (scheduling, L0 hypervisor preemption).
> >
> > So, Maxim's changes would address my concerns in the general case, but
> > maybe not as much in edge cases where an operator may make decisions
> > about how much time can elapse while the guest hasn't had CPU time.
>
> I think I understand.  We still need a way to get a consistent
> (host_TSC, nanosecond) pair on the source, the TSC offset is not enough.
>   This is arguably not a KVM issue, but we're still the one having to
> provide a solution, so we would need a slightly more complicated interface.

Ah, right, good luck doing that without some help from the kernel. +1
to this _not_ being a KVM issue, but anyways...

> 1) In the kernel:
>
> * KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and
> host_TSC.  It should set two flags in struct kvm_clock_data saying that
> the respective fields are valid.
>
> * KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set,
> it looks at the kvmclock_ns - realtime_ns field and disregards the
> kvmclock_ns field.

Yeah, these additions all make sense to me.

>
> 2) On the source, userspace will:
>
> * per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and
> kvmclock_ns.  Stash host_TSC for subsequent use.
>
> * per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it
> to the stashed host_TSC value; migrate the resulting value (a guest TSC).
>
> 3) On the destination:
>
> * per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK.
>   Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS"
> below) and host TSC ("newHostTSC").  Stash them for subsequent use,
> together with the migrated kvmclock_ns value ("sourceNS") that you
> haven't used yet.
>
> * per-vCPU: using the data of the previous step, and the sourceGuestTSC
> in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) *
> freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.
>
> Your approach still needs to use the "quirky" approach to host-initiated
> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
> VMCS offset.  This is just a documentation issue.

I think I follow what you're saying. To confirm:

My suggested ioctl for the vCPU will still exist, and it will still
affect the VMCS tsc offset, right? However, we need to do one of the
following:

- Stash the guest's MSR_IA32_TSC_ADJUST value in the
kvm_system_counter_state structure. During
KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
so, treat it as a dumb value (i.e. show it to the guest but don't fold
it into the offset).
- Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
continue to our quirky behavior around host-initiated writes of the
MSR.

This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?
Doing so ensures we don't have any guest-observable consequences due
to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
MSR into the new counter state structure is probably best. No strong
opinions in either direction on this point, though :)

--
Thanks,
Oliver

>
> Does this make sense?
>
> Paolo
>



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux