Re: [PATCH] x86/kvm: fix condition to update kvm master clocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 09, 2016 at 03:25:02PM -0300, Marcelo Tosatti wrote:
> On Thu, Jun 09, 2016 at 03:09:46PM +0300, Roman Kagan wrote:
> > On Thu, Jun 09, 2016 at 12:27:10AM -0300, Marcelo Tosatti wrote:
> > > Ok, its not feasible to keep both REF_CLOCK (MSR) and shared memory
> > > (REF_PAGE) clocks in sync. Even if the frequency correction is the same
> > > for both, the kernel updates monotonic clock differently than the
> > > stated frequency that is:
> > > 
> > >     monotonic clock (advertised via vsyscall notifier to use mult/freq pair) != tsc*freq 
> > > 
> > > So the best solution IMO is to: 
> > > 
> > >     reads of guest clock = max(shared memory clock, get_kernel_ns() +
> > >                                kvmclock_offset)
> > > 
> > > Where reads of guest clock include: 1) kvm_get_time_and_clockread
> > > (updates to kvmclock areas), 2) rdmsr(REF_CLOCK).
> > > 
> > > Unless someone has a better idea, Roman, can you update your patch to
> > > include such solution? for kvm_get_time_and_clockread, can fast forward
> > > kvmclock_offset so that 
> > > 
> > > kvmclock_offset + get_kernel_ns() = shared memory clock
> > 
> > I'm not sure I understand what you mean.
> > 
> > ->system_time in pvclock *is* assigned kernel_ns + kvmclock_offset, i.e.
> > at the time kvm_get_time_and_clockread() runs they are in sync by
> > definition.  They can diverge later due to different whole number math
> > applied.
> 
> Sync kvmclock_offset + get_kernel_ns() = shared memory clock (what you
> read from the guest). 
> 
> 
> Add wrapper around get_kernel_ns + kvmclock_offset reads:
> 
> Record somewhere the last returned (last_returned_guestclock).
> 
> u64 read_guest_clock(struct kvm *kvm)
> {
>       mutex_lock(&guest_clock_mutex);
>       ret = get_kernel_ns() + kvmclock_offset;
>       kvm->arch.last_returned_guestclock = ret;
>       mutex_unlock(&guest_clock_mutex);
> }
> 
> 
> Sync code (to be executed at masterclock updates and rdmsr(REF_CLOCK)):
> 
> 1. read guest shared memory = smval.
> 2. read guest clock = get_kernel_ns() + kvmclock_offset = kclock
> 3. if (kclock < smval) 
>         kvmclock_offset += smval - kclock
> 
> 
> Two possibilites for clocks state:
> 
> P1. shared memory clock > get_kernel_ns() + kvmclock_offset
> P2. shared memory clock < get_kernel_ns() + kvmclock_offset
> 
> Two possibilites for guest behaviour:
> G1. a = shared memory clock read;
>     b = get_kernel_ns() + kvmclock_offset
> 
> G1/P1:
> 
>     a = shared memory clock read (previous read, stored in memory)
>     b = get_kernel_ns() + kvmclock_offset
> 
> After sync code above: Note smval > a, so b = smval > a
> 
> G1/P2:
> 
>     a = shared memory clock read (previous read, stored in memory)
>     b = get_kernel_ns() + kvmclock_offset
> 
> a < b, fine.
> 
> G2 (second possibility for guest behaviour)
>     a = get_kernel_ns() + kvmclock_offset
>     b = shared memory clock read
> 
> G2/P1: fine, b > a.
> 
> G2/P2: 
>     a = get_kernel_ns() + kvmclock_offset >
>     b = shared memory clock read
> 
> So we have to either reduce get_kernel_ns() + kvmclock_offset so that 
> b is larger or 'stop get_kernel_ns() + kvmclock_offset'.
> 
> Can make get_kernel_ns() + kvmclock_offset be as small as
> last_returned_guestclock (otherwise users of get_kernel_ns() +
> kvmclock_offset can see time backwards).
> 
>     0. mutex_lock(&guest_clock_mutex);
>     01. getkernelns = get_kernel_ns();
>     1. read guest shared memory = smval.
>     2. kclock = getkernelns + kvmclock_offset
>     3. if (kclock > smval) 
>         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
>                               kclock - smval)
>     4. kclock = getkernelns + kvmclock_offset
>     5. if (kclock > smval) {
>         schedule_timeout(kclock - smval);
>         kvmclock_offset -= min(kvmclock_offset - last_returned_guestclock,
>                               kclock - smval)
>        }
>     6. mutex_unlock(&guest_clock_mutex);
> 
> That works, right?

I wouldn't say so.

First, I don't think changing kvmclock_offset is allowed other than
through ioctl(KVM_SET_CLOCK); it violates everybody's expectation that
this is the difference between the host and the guest clocks.

Second, since masterclock updates essentially do that synchronization, I
think that instead of all that compilcation we can just return
host-side-calculated value of REFERENCE_TSC_PAGE clock from
rdmsr(REF_CLOCK) if masterclock is enabled.

> > There's also a problem that there can be arbitrary amount of time
> > between assigning the return value for guest's rdmsr and actually
> > entering the guest to deliver that value.
> 
> Don't think that matters, see the 4 cases above.

It does in the sense that between the point where we calculate the value
we're about to return from rdmsr(REF_CLOCK), and the time the guest will
actually see it, another vCPU can read and even update the reference tsc
page.  However, as I wrote in another message in this thread, there's
no way to guarantee clock reads to be monotonic across several vCPUs;
OTOH that doesn't violate the monotonicity on a specific vCPU.

So I'll probably just add another patch adjusting rdmsr(REF_CLOCK) to
return shared memory clock if it's enabled.

Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux