Re: x86: kvm: Revert "remove sched notifier for cross-cpu migrations"

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Thu, 26 Mar 2015 08:29:07 -0300

On Wed, Mar 25, 2015 at 04:22:03PM -0700, Andy Lutomirski wrote:
> On Wed, Mar 25, 2015 at 4:13 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> > On Wed, Mar 25, 2015 at 03:48:02PM -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 25, 2015 at 3:41 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> >> > On Wed, Mar 25, 2015 at 03:33:10PM -0700, Andy Lutomirski wrote:
> >> >> On Mar 25, 2015 2:29 PM, "Marcelo Tosatti" <mtosatti@xxxxxxxxxx> wrote:
> >> >> >
> >> >> > On Wed, Mar 25, 2015 at 01:52:15PM +0100, Radim Krčmář wrote:
> >> >> > > 2015-03-25 12:08+0100, Radim Krčmář:
> >> >> > > > Reverting the patch protects us from any migration, but I don't think we
> >> >> > > > need to care about changing VCPUs as long as we read a consistent data
> >> >> > > > from kvmclock.  (VCPU can change outside of this loop too, so it doesn't
> >> >> > > > matter if we return a value not fit for this VCPU.)
> >> >> > > >
> >> >> > > > I think we could drop the second __getcpu if our kvmclock was being
> >> >> > > > handled better;  maybe with a patch like the one below:
> >> >> > >
> >> >> > > The second __getcpu is not neccessary, but I forgot about rdtsc.
> >> >> > > We need to either use rtdscp, know the host has synchronized tsc, or
> >> >> > > monitor VCPU migrations.  Only the last one works everywhere.
> >> >> >
> >> >> > The vdso code is only used if host has synchronized tsc.
> >> >> >
> >> >> > But you have to handle the case where host goes from synchronized tsc to
> >> >> > unsynchronized tsc (see the clocksource notifier in the host side).
> >> >> >
> >> >>
> >> >> Can't we change the host to freeze all vcpus and clear the stable bit
> >> >> on all of them if this happens?  This would simplify and speed up
> >> >> vclock_gettime.
> >> >>
> >> >> --Andy
> >> >
> >> > Seems interesting to do on 512-vcpus, but sure, could be done.
> >> >
> >>
> >> If you have a 512-vcpu system that switches between stable and
> >> unstable more than once per migration, then I expect that you have
> >> serious problems and this is the least of your worries.
> >>
> >> Personally, I'd *much* rather we just made vcpu 0's pvti authoritative
> >> if we're stable.  If nothing else, I'm not even remotely convinced
> >> that the current scheme gives monotonic timing due to skew between
> >> when the updates happen on different vcpus.
> >
> > Can you write down the problem ?
> >
> 
> I can try.
> 
> Suppose we start out with all vcpus agreeing on their pvti and perfect
> invariant TSCs.  Now the host updates its frequency (due to NTP or
> whatever).  KVM updates vcpu 0's pvti.  Before KVM updates vcpu 1's
> pvti, guest code on vcpus 0 and 1 see synced TSCs but different pvti.
> They'll disagree on the time, and one of them will be ahead until vcpu
> 1's pvti gets updated.

The masterclock scheme enforces the same system_timestamp/tsc_timestamp pairs
to be visible at one time, for all vcpus.

 * That is, when timespec0 != timespec1, M < N. Unfortunately that is
 * not
 * always the case (the difference between two distinct xtime instances
 * might be smaller then the difference between corresponding TSC reads,
 * when updating guest vcpus pvclock areas).
 *
 * To avoid that problem, do not allow visibility of distinct
 * system_timestamp/tsc_timestamp values simultaneously: use a master
 * copy of host monotonic time values. Update that master copy
 * in lockstep.

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html