On Wed, Mar 25, 2015 at 04:22:03PM -0700, Andy Lutomirski wrote: > On Wed, Mar 25, 2015 at 4:13 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Wed, Mar 25, 2015 at 03:48:02PM -0700, Andy Lutomirski wrote: > >> On Wed, Mar 25, 2015 at 3:41 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > >> > On Wed, Mar 25, 2015 at 03:33:10PM -0700, Andy Lutomirski wrote: > >> >> On Mar 25, 2015 2:29 PM, "Marcelo Tosatti" <mtosatti@xxxxxxxxxx> wrote: > >> >> > > >> >> > On Wed, Mar 25, 2015 at 01:52:15PM +0100, Radim Krčmář wrote: > >> >> > > 2015-03-25 12:08+0100, Radim Krčmář: > >> >> > > > Reverting the patch protects us from any migration, but I don't think we > >> >> > > > need to care about changing VCPUs as long as we read a consistent data > >> >> > > > from kvmclock. (VCPU can change outside of this loop too, so it doesn't > >> >> > > > matter if we return a value not fit for this VCPU.) > >> >> > > > > >> >> > > > I think we could drop the second __getcpu if our kvmclock was being > >> >> > > > handled better; maybe with a patch like the one below: > >> >> > > > >> >> > > The second __getcpu is not neccessary, but I forgot about rdtsc. > >> >> > > We need to either use rtdscp, know the host has synchronized tsc, or > >> >> > > monitor VCPU migrations. Only the last one works everywhere. > >> >> > > >> >> > The vdso code is only used if host has synchronized tsc. > >> >> > > >> >> > But you have to handle the case where host goes from synchronized tsc to > >> >> > unsynchronized tsc (see the clocksource notifier in the host side). > >> >> > > >> >> > >> >> Can't we change the host to freeze all vcpus and clear the stable bit > >> >> on all of them if this happens? This would simplify and speed up > >> >> vclock_gettime. > >> >> > >> >> --Andy > >> > > >> > Seems interesting to do on 512-vcpus, but sure, could be done. > >> > > >> > >> If you have a 512-vcpu system that switches between stable and > >> unstable more than once per migration, then I expect that you have > >> serious problems and this is the least of your worries. > >> > >> Personally, I'd *much* rather we just made vcpu 0's pvti authoritative > >> if we're stable. If nothing else, I'm not even remotely convinced > >> that the current scheme gives monotonic timing due to skew between > >> when the updates happen on different vcpus. > > > > Can you write down the problem ? > > > > I can try. > > Suppose we start out with all vcpus agreeing on their pvti and perfect > invariant TSCs. Now the host updates its frequency (due to NTP or > whatever). KVM updates vcpu 0's pvti. Before KVM updates vcpu 1's > pvti, guest code on vcpus 0 and 1 see synced TSCs but different pvti. > They'll disagree on the time, and one of them will be ahead until vcpu > 1's pvti gets updated. The masterclock scheme enforces the same system_timestamp/tsc_timestamp pairs to be visible at one time, for all vcpus. * That is, when timespec0 != timespec1, M < N. Unfortunately that is * not * always the case (the difference between two distinct xtime instances * might be smaller then the difference between corresponding TSC reads, * when updating guest vcpus pvclock areas). * * To avoid that problem, do not allow visibility of distinct * system_timestamp/tsc_timestamp values simultaneously: use a master * copy of host monotonic time values. Update that master copy * in lockstep. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html